數據科學的發展_數據科學的發展與發展

數據科學的發展

There’s perhaps nothing that sets the 21st century apart from others more than the concept of data. Every interaction we have with a connected device creates a data record, and beams it back to some data store for tracking and analysis. Internet-connected devices are ubiquitous and growing. In 2018, there were approximately 8 connected devices per person in the United States. That number is expected to grow to 13.6 by 2023.1

也許沒有什么比數據的概念更能使21世紀與眾不同。 我們與連接的設備進行的每次交互都會創建一個數據記錄,并將其發送回某個數據存儲以進行跟蹤和分析。 連接互聯網的設備無處不在并且正在增長。 在2018年,美國每人大約有8臺連接的設備。 預計到2023年,該數字將增長到13.6。1

The vast amounts of data that are being collected by organizations and individuals have enabled ever more powerful — and transformational — machine learning algorithms. Machine learning and artificial intelligence (AI) shape our experience when we use a search engine, visit a social media website, or interact with a large company’s customer service. AI enables SpaceX to safely land its rockets back on Earth for reuse. It fuels a growing population of robots in manufacturing, generates novel chemical compositions for drug research, and brings the possibility of fully autonomous vehicles closer every day.

組織和個人正在收集的大量數據使機器學習算法變得更加強大,并且具有變革性。 當我們使用搜索引擎,訪問社交媒體網站或與大公司的客戶服務進行交互時,機器學習和人工智能(AI)會影響我們的體驗。 人工智能使SpaceX能夠安全地將其火箭降落到地球上以供重復使用。 它推動了制造業中不斷增長的機器人數量的增長,產生了用于藥物研究的新穎化學成分,并每天都使全自動駕駛汽車的可能性越來越近。

Yes, advances in compute power and better algorithms have also been a critical part of this advancement. But without good data, hardware and mathematical equations can only do so much. “Garbage in, garbage out” as the old adage goes.

是的,計算能力的提高和更好的算法也是這一進步的關鍵部分。 但是,如果沒有良好的數據,硬件和數學方程式只能做很多事情。 就像古老的諺語所說的那樣,“垃圾進,垃圾出”。

數據科學與機器學習與人工智能 (Data Science vs. Machine Learning vs. Artificial Intelligence)

It’s probably useful at this point to discuss what we mean when we talk about data science, machine learning, and artificial intelligence(AI).

在這一點上討論我們在談論數據科學,機器學習和人工智能(AI)時的含義可能是有用的。

Historically, data science has involved the process of analyzing data to gain insights, typically business insights. As Andrew Ng explains in his Coursera course, AI for Everyone, the output of a data science analysis would typically be a PowerPoint presentation (though this isn’t necessarily the case anymore — more on that in a moment).2 Such an output would typically serve key stakeholders in an organization or on a project.

從歷史上看,數據科學涉及分析數據以獲取見解(通常是業務見解)的過程。 正如吳安德(Andrew Ng)在他的Coursera課程“人人享有AI”中所解釋的那樣,數據科學分析的輸出通常是PowerPoint演示文稿(盡管情況已不再是這種情況了,稍后再討論)。2這樣的輸出將通常為組織或項目中的關鍵利益相關者服務。

One of its pioneers, Arthur Samuel defined machine learning as “the field of study that gives computers the ability to learn without being explicitly programmed”. The output of a machine learning project is typically some type of software, for example an algorithm that automatically optimizes listings you see on a job search site based on a variety of factors. Such an output could serve thousands, millions, or even billions of users.

它的先驅之一,亞瑟·塞繆爾(Arthur Samuel)將機器學習定義為“ 使計算機無需明確編程即可學習的研究領域” 。 機器學習項目的輸出通常是某種類型的軟件,例如,一種算法會根據各種因素自動優化您在求職網站上看到的清單。 這樣的輸出可以為數千,數百萬甚至數十億用戶提供服務。

Artificial intelligence is the field of study involving how to build intelligent machines, typically with at least human-level performance on a given task (narrow AI) or on a diverse set of tasks (artificial general intelligence — AGI). We don’t know when we will reach AGI, or how we might know when we reach it.3 But in recent years, researchers and practitioners have achieved human-level or better performance on a variety of tasks using a specific type of machine learning called deep learning. Deep learning leverages an artificial neural network architecture, so you might see deep learning, neural networks, and AI used interchangeably in some settings.

人工智能是一個研究領域,涉及如何構建智能機器,通常在給定任務(狹窄的AI)或一組不同的任務(人工通用智能-AGI)上至少具有人類水平的性能。 我們不知道什么時候可以到達AGI,或者我們怎么知道何時可以到達AGI。3但是,近年來,研究人員和從業人員已經通過使用特定類型的機器學習在各種任務上達到了人類水平或更好的性能稱為深度學習。 深度學習利用人工神經網絡架構,因此您可能會看到深度學習,神經網絡和AI在某些情況下可以互換使用。

演進:數據科學與礦山 (Evolution: Data Science’s and Mine)

Advances in deep learning are being increasingly leveraged by data scientists to develop both useful insights and products. Take for example the analyst who uses a a natural language processing algorithm to analyze customer sentiment regarding a new product, and presents the findings to an executive team. Or the data scientist who builds a recommendation engine and delivers this software to an engineering team for back-end integration.

數據科學家越來越多地利用深度學習的進展來開發有用的見解和產品。 以使用自然語言處理算法分析新產品的客戶情緒并將分析結果呈現給執行團隊的分析師為例。 或構建推薦引擎并將此軟件提供給工程團隊進行后端集成的數據科學家。

The rapid evolution of these fields, easy access to powerful compute platforms, and ubiquity of high-quality technical MOOCs (Massive Online Open Courses) contribute to the blurring of lines between data scientists, machine learning engineers, and even deep learning engineers.

這些領域的快速發展,易于訪問的強大計算平臺以及高質量的技術MOOC(大規模在線公開課程)的普及,導致數據科學家,機器學習工程師乃至深度學習工程師之間的界線越來越模糊。

Google’s search algorithm is probably the most widely used and under-recognized machine learning technology of the past 20 years. I began my career at Google and spent six years working in a variety of roles including on search and analytics teams. A lot of this work came down to helping customers optimize their usage of Google’s algorithms. Even during these early days (2008–2014), we were actively using machine learning-powered tools to provide both insights for our customers and automated campaign solutions. But the truth was this was only the infancy of the AI revolution.

Google的搜索算法可能是過去20年中使用最廣泛且認識不足的機器學習技術。 我的職業生涯始于Google,并在六年中擔任過各種職務,包括在搜索和分析團隊中工作。 很多工作歸結為幫助客戶優化對Google算法的使用。 即使在初期(2008-2014年),我們仍在積極使用機器學習支持的工具來為我們的客戶提供見解和自動化的營銷活動解決方案。 事實是,這只是AI革命的嬰兒。

Deep learning took off in the public sphere after deep convolutional neural networks started smashing performance records.? I took notice of the disruption in industry. While working as a consultant, I spoke with folks in the field, and embarked on an self-study journey to transition into a machine learning career, absorbing Andrew Ng’s Deeplearning.ai Coursera specialization, among other courses, research papers, and texts. As I started to work with clients in the space through a consulting firm, the experience was extremely rewarding and interesting.

深度卷積神經網絡開始破壞性能記錄后,深度學習在公共領域開始興起。?我注意到了行業的混亂。 在擔任顧問期間,我與該領域的人們進行了交談,并開始了自學之旅,以過渡到機器學習的職業,吸收了Andrew Ng的Deeplearning.ai Coursera專業知識,以及其他課程,研究論文和文章。 當我開始通過一家咨詢公司與該領域的客戶合作時,這種經歷是非常有益和有趣的。

COVID-19和大會 (COVID-19 and General Assembly)

Enter COVID-19.

輸入COVID-19。

Though I was grateful to be in a better position than many folks out there, COVID-19 still led to some non-negligible disruption. But instead of thinking about this thing happening TO me, I wanted to flip the script and do something with the flexibility that came with working from home. As a lifelong learner in the machine learning and analytics space I had always felt like I was missing the data science portion of the puzzle. Back at Google I loved helping clients understand what was going on and what they should do using analytics, but I had gotten pretty far away from that, not to mention the cornucopia of new tools that are being used now to conduct analysis and relay the information in useful ways. After a lot of different conversations with colleagues and many late nights searching for the right solution to upgrade my data science skills, I settled on General Assembly. Specifically, I enrolled in General Assembly’s 12-week Data Science Immersive.

盡管我很高興自己處于比其他人更好的位置,但是COVID-19仍然導致了一些不可忽視的干擾。 但是,我沒有想到這件事發生在我身上,而是想翻轉腳本并以在家工作時帶來的靈活性來做一些事情。 作為機器學習和分析領域的終生學習者,我始終覺得自己好像錯過了難題的數據科學部分。 回到Google之前,我很樂意幫助客戶了解分析的過程以及應該使用的方法,但我與之相距甚遠,更不用說現在正在使用新工具進行分析和傳遞信息的聚寶盆以有用的方式。 在與同事進行了許多不同的交談并且深夜搜尋了正確的解決方案以提升我的數據科學技能之后,我決定參加大會。 具體來說,我參加了大會為期12周的“沉浸式數據科學”課程。

My goals with this course are:

本課程的目標是:

  1. Become a data wrangling master

    成為數據爭用大師
  2. Build a solid foundation in statistics

    為統計打下堅實的基礎
  3. Enhance my machine learning knowledge

    增強我的機器學習知識

I’m excited to bring data science skills to my machine learning work in the future. Deep learning isn’t always feasible or necessary in a project depending on the data set and goal — this is where having a robust machine learning toolkit comes in handy. A solid statistics foundation can also be a boon when collecting and evaluating data quality, or when examining the impact labeling errors have on machine learning algorithm performance.

我很高興將來能將數據科學技能帶入我的機器學習工作中。 根據數據集和目標,深度學習在項目中并不總是可行或必要的-在這里,擁有強大的機器學習工具非常有用。 當收集和評估數據質量,或者檢查標記錯誤對機器學習算法性能的影響時,扎實的統計基礎也可以成為福音。

I’ll be sharing some of my journey on this blog over the coming months. If you’re interested, give me a follow.

在接下來的幾個月中,我將在此博客上分享我的一些旅程。 如果您有興趣,請跟我來。

1 https://www.cisco.com/c/en/us/solutions/executive-perspectives/annual-internet-report/air-highlights.html#2 https://www.coursera.org/learn/ai-for-everyone3 For more on the challenges AGI presents, see Max Tegmark’s book, Life 3.0.?https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

1https : //www.cisco.com/c/en/us/solutions/executive-perspectives/annual-internet-report/air-highlights.html#2https ://www.coursera.org/learn/ai-所有人 3有關AGI所面臨挑戰的更多信息,請參閱Max Tegmark的書《 Life 3.0》 。https : //papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

翻譯自: https://medium.com/@caroline_clark/data-sciences-evolution-and-mine-fb12ce3156ba

數據科學的發展

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390010.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390010.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390010.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Polling 、Long Polling 和 WebSocket

最近在學習研究WebSocket,了解到Polling 和Long Polling,翻閱了一些博文,根據自己的理解,做個學習筆記 Polling (輪詢): 這種方式就是客戶端定時向服務器發送http的Get請求,服務器收到請求后,就…

慣性張量的推理_選擇合適的intel工作站處理器進行張量流推理和開發

慣性張量的推理With the increasing number of data scientists using TensorFlow, it might be a good time to discuss which workstation processor to choose from Intel’s lineup. You have several options to choose from:隨著使用TensorFlow的數據科學家數量的增加&am…

MongoDB數據庫查詢性能提高40倍

MongoDB數據庫查詢性能提高40倍 大家在使用 MongoDB 的時候有沒有碰到過性能問題呢?下面這篇文章主要給大家分享了MongoDB數據庫查詢性能提高40倍的經歷,需要的朋友可以參考借鑒,下面來一起看看吧。 前言 數據庫性能對軟件整體性能有著至關重…

通過Ajax方式上傳文件(input file),使用FormData進行Ajax請求

<script type"text/jscript">$(function () {$("#btn_uploadimg").click(function () {var fileObj document.getElementById("FileUpload").files[0]; // js 獲取文件對象if (typeof (fileObj) "undefined" || fileObj.size …

并發插入數據庫會導致失敗嗎_會導致業務失敗的數據分析方法

并發插入數據庫會導致失敗嗎The true value of data depends on business insight.Data analysis is one of the most powerful resources an enterprise has. However, if the tools and processes used are not friendly and widely available to the business users who nee…

434. 字符串中的單詞數

434. 字符串中的單詞數 統計字符串中的單詞個數&#xff0c;這里的單詞指的是連續的不是空格的字符。 請注意&#xff0c;你可以假定字符串里不包括任何不可打印的字符。 示例: 輸入: “Hello, my name is John” 輸出: 5 解釋: 這里的單詞是指連續的不是空格的字符&#x…

zooland 新開源的RPC項目,希望大家在開發的微服務的時候多一種選擇,讓微服務開發簡單,并且容易上手。...

zooland 我叫它動物園地&#xff0c;一個構思很長時間的一個項目。起初只是覺得各種通信框架都封裝的很好了&#xff0c;但是就是差些兼容&#xff0c;防錯&#xff0c;高可用。同時在使用上&#xff0c;不希望有多余的代碼&#xff0c;像普通接口一樣使用就可以了。 基于這些想…

187. 重復的DNA序列

187. 重復的DNA序列 所有 DNA 都由一系列縮寫為 ‘A’&#xff0c;‘C’&#xff0c;‘G’ 和 ‘T’ 的核苷酸組成&#xff0c;例如&#xff1a;“ACGAATTCCG”。在研究 DNA 時&#xff0c;識別 DNA 中的重復序列有時會對研究非常有幫助。 編寫一個函數來找出所有目標子串&am…

牛客網_Go語言相關練習_選擇題(2)

注&#xff1a;題目來源均出自牛客網。 一、選擇題 Map&#xff08;集合&#xff09;屬于Go的內置類型&#xff0c;不需要引入其它庫即可使用。 Go-Map_菜鳥教程 在函數聲明中&#xff0c;返回的參數要么都有變量名&#xff0c;要么都沒有。 C選項函數聲明語法有錯誤&#xff0…

機器學習模型部署_9月版部署機器學習模型

機器學習模型部署每月版 (MONTHLY EDITION) Often, the last step of a Data Science task is deployment. Let’s say you’re working at a big corporation. You’re building a project for a customer of the corporation and you’ve created a model that performs well…

352. 將數據流變為多個不相交區間

352. 將數據流變為多個不相交區間 給你一個由非負整數 a1, a2, …, an 組成的數據流輸入&#xff0c;請你將到目前為止看到的數字總結為不相交的區間列表。 實現 SummaryRanges 類&#xff1a; SummaryRanges() 使用一個空數據流初始化對象。void addNum(int val) 向數據流中…

Java常用的八種排序算法與代碼實現

排序問題一直是程序員工作與面試的重點&#xff0c;今天特意整理研究下與大家共勉&#xff01;這里列出8種常見的經典排序&#xff0c;基本涵蓋了所有的排序算法。 1.直接插入排序 我們經常會到這樣一類排序問題&#xff1a;把新的數據插入到已經排好的數據列中。將第一個數和第…

熊貓ai智能機器人量化_機器學習中的熊貓是什么

熊貓ai智能機器人量化Machine learning is a complex discipline. The implementation of machine learning models is now far much easier than it used to be, this is as a result of Machine learning frameworks such as pandas. Wait!! isnt panda an animal? As I rec…

441. 排列硬幣

441. 排列硬幣 你總共有 n 枚硬幣&#xff0c;并計劃將它們按階梯狀排列。對于一個由 k 行組成的階梯&#xff0c;其第 i 行必須正好有 i 枚硬幣。階梯的最后一行 可能 是不完整的。 給你一個數字 n &#xff0c;計算并返回可形成 完整階梯行 的總行數。 示例 1&#xff1a;…

調用百度 Echarts 顯示重慶市地圖

因為 Echarts 官方不再提供地圖數據的下載&#xff0c;在這里保存一份&#xff0c;供日后使用&#xff0c;重慶地圖數據的 JSON 文件在 CSDN 上下載。 <!DOCTYPE html> <html style"height: 100%"><head><meta charset"utf-8"><…

JEESZ-SSO解決方案

2019獨角獸企業重金招聘Python工程師標準>>> 第一節&#xff1a;單點登錄簡介 第一步&#xff1a;了解單點登錄 SSO主要特點是: SSO應用之間使用Web協議(如HTTPS)&#xff0c;并且只有一個登錄入口. SSO的體系中有下面三種角色: 1) User(多個) 2) Web應用(多個) 3) …

女朋友天天氣我怎么辦_關于我的天氣很奇怪

女朋友天天氣我怎么辦帶有扭曲的天氣應用 (A Weather App with a Twist) Is My Weather Weird?? is a weather app with a twist — it offers a simple answer to a common question we’ve all asked. To do this we look at how often weather like today’s used to happ…

Java中length,length(),size()的區別

&#xff08;一&#xff09;區別&#xff1a; ①length&#xff1a;用于算出數組的長度。 ②length&#xff08;&#xff09;&#xff1a;用于找出字符串的長度。 ③size&#xff08;&#xff09;&#xff1a;用于找出泛型集合的元素個數。轉載于:https://www.cnblogs.com/not-…

5895. 獲取單值網格的最小操作數

5895. 獲取單值網格的最小操作數 給你一支股票價格的數據流。數據流中每一條記錄包含一個 時間戳 和該時間點股票對應的 價格 。 不巧的是&#xff0c;由于股票市場內在的波動性&#xff0c;股票價格記錄可能不是按時間順序到來的。某些情況下&#xff0c;有的記錄可能是錯的…

為什么要用Redis

最近閱讀了《Redis開發與運維》&#xff0c;非常不錯。這里對書中的知識整理一下&#xff0c;方便自己回顧一下Redis的整個體系&#xff0c;來對相關知識點查漏補缺。我按照五點把書中的內容進行一下整理&#xff1a;為什么要選擇Redis&#xff1a;介紹Redis的使用場景與使用Re…