數據科學的發展

There’s perhaps nothing that sets the 21st century apart from others more than the concept of data. Every interaction we have with a connected device creates a data record, and beams it back to some data store for tracking and analysis. Internet-connected devices are ubiquitous and growing. In 2018, there were approximately 8 connected devices per person in the United States. That number is expected to grow to 13.6 by 2023.1

也許沒有什么比數據的概念更能使21世紀與眾不同。我們與連接的設備進行的每次交互都會創建一個數據記錄，并將其發送回某個數據存儲以進行跟蹤和分析。連接互聯網的設備無處不在并且正在增長。在2018年，美國每人大約有8臺連接的設備。預計到2023年，該數字將增長到13.6。1

The vast amounts of data that are being collected by organizations and individuals have enabled ever more powerful — and transformational — machine learning algorithms. Machine learning and artificial intelligence (AI) shape our experience when we use a search engine, visit a social media website, or interact with a large company’s customer service. AI enables SpaceX to safely land its rockets back on Earth for reuse. It fuels a growing population of robots in manufacturing, generates novel chemical compositions for drug research, and brings the possibility of fully autonomous vehicles closer every day.

組織和個人正在收集的大量數據使機器學習算法變得更加強大，并且具有變革性。當我們使用搜索引擎，訪問社交媒體網站或與大公司的客戶服務進行交互時，機器學習和人工智能(AI)會影響我們的體驗。人工智能使SpaceX能夠安全地將其火箭降落到地球上以供重復使用。它推動了制造業中不斷增長的機器人數量的增長，產生了用于藥物研究的新穎化學成分，并每天都使全自動駕駛汽車的可能性越來越近。

Yes, advances in compute power and better algorithms have also been a critical part of this advancement. But without good data, hardware and mathematical equations can only do so much. “Garbage in, garbage out” as the old adage goes.

是的，計算能力的提高和更好的算法也是這一進步的關鍵部分。但是，如果沒有良好的數據，硬件和數學方程式只能做很多事情。就像古老的諺語所說的那樣，“垃圾進，垃圾出”。

數據科學與機器學習與人工智能 (Data Science vs. Machine Learning vs. Artificial Intelligence)

It’s probably useful at this point to discuss what we mean when we talk about data science, machine learning, and artificial intelligence(AI).

在這一點上討論我們在談論數據科學，機器學習和人工智能(AI)時的含義可能是有用的。

Historically, data science has involved the process of analyzing data to gain insights, typically business insights. As Andrew Ng explains in his Coursera course, AI for Everyone, the output of a data science analysis would typically be a PowerPoint presentation (though this isn’t necessarily the case anymore — more on that in a moment).2 Such an output would typically serve key stakeholders in an organization or on a project.

從歷史上看，數據科學涉及分析數據以獲取見解(通常是業務見解)的過程。正如吳安德(Andrew Ng)在他的Coursera課程“人人享有AI”中所解釋的那樣，數據科學分析的輸出通常是PowerPoint演示文稿(盡管情況已不再是這種情況了，稍后再討論)。2這樣的輸出將通常為組織或項目中的關鍵利益相關者服務。

One of its pioneers, Arthur Samuel defined machine learning as “the field of study that gives computers the ability to learn without being explicitly programmed”. The output of a machine learning project is typically some type of software, for example an algorithm that automatically optimizes listings you see on a job search site based on a variety of factors. Such an output could serve thousands, millions, or even billions of users.

它的先驅之一，亞瑟·塞繆爾(Arthur Samuel)將機器學習定義為“ 使計算機無需明確編程即可學習的研究領域” 。機器學習項目的輸出通常是某種類型的軟件，例如，一種算法會根據各種因素自動優化您在求職網站上看到的清單。這樣的輸出可以為數千，數百萬甚至數十億用戶提供服務。

Artificial intelligence is the field of study involving how to build intelligent machines, typically with at least human-level performance on a given task (narrow AI) or on a diverse set of tasks (artificial general intelligence — AGI). We don’t know when we will reach AGI, or how we might know when we reach it.3 But in recent years, researchers and practitioners have achieved human-level or better performance on a variety of tasks using a specific type of machine learning called deep learning. Deep learning leverages an artificial neural network architecture, so you might see deep learning, neural networks, and AI used interchangeably in some settings.

人工智能是一個研究領域，涉及如何構建智能機器，通常在給定任務(狹窄的AI)或一組不同的任務(人工通用智能-AGI)上至少具有人類水平的性能。我們不知道什么時候可以到達AGI，或者我們怎么知道何時可以到達AGI。3但是，近年來，研究人員和從業人員已經通過使用特定類型的機器學習在各種任務上達到了人類水平或更好的性能稱為深度學習。深度學習利用人工神經網絡架構，因此您可能會看到深度學習，神經網絡和AI在某些情況下可以互換使用。

演進：數據科學與礦山 (Evolution: Data Science’s and Mine)

Advances in deep learning are being increasingly leveraged by data scientists to develop both useful insights and products. Take for example the analyst who uses a a natural language processing algorithm to analyze customer sentiment regarding a new product, and presents the findings to an executive team. Or the data scientist who builds a recommendation engine and delivers this software to an engineering team for back-end integration.

數據科學家越來越多地利用深度學習的進展來開發有用的見解和產品。以使用自然語言處理算法分析新產品的客戶情緒并將分析結果呈現給執行團隊的分析師為例。或構建推薦引擎并將此軟件提供給工程團隊進行后端集成的數據科學家。

The rapid evolution of these fields, easy access to powerful compute platforms, and ubiquity of high-quality technical MOOCs (Massive Online Open Courses) contribute to the blurring of lines between data scientists, machine learning engineers, and even deep learning engineers.

這些領域的快速發展，易于訪問的強大計算平臺以及高質量的技術MOOC(大規模在線公開課程)的普及，導致數據科學家，機器學習工程師乃至深度學習工程師之間的界線越來越模糊。

Google’s search algorithm is probably the most widely used and under-recognized machine learning technology of the past 20 years. I began my career at Google and spent six years working in a variety of roles including on search and analytics teams. A lot of this work came down to helping customers optimize their usage of Google’s algorithms. Even during these early days (2008–2014), we were actively using machine learning-powered tools to provide both insights for our customers and automated campaign solutions. But the truth was this was only the infancy of the AI revolution.

Google的搜索算法可能是過去20年中使用最廣泛且認識不足的機器學習技術。我的職業生涯始于Google，并在六年中擔任過各種職務，包括在搜索和分析團隊中工作。很多工作歸結為幫助客戶優化對Google算法的使用。即使在初期(2008-2014年)，我們仍在積極使用機器學習支持的工具來為我們的客戶提供見解和自動化的營銷活動解決方案。事實是，這只是AI革命的嬰兒。

Deep learning took off in the public sphere after deep convolutional neural networks started smashing performance records.? I took notice of the disruption in industry. While working as a consultant, I spoke with folks in the field, and embarked on an self-study journey to transition into a machine learning career, absorbing Andrew Ng’s Deeplearning.ai Coursera specialization, among other courses, research papers, and texts. As I started to work with clients in the space through a consulting firm, the experience was extremely rewarding and interesting.

深度卷積神經網絡開始破壞性能記錄后，深度學習在公共領域開始興起。?我注意到了行業的混亂。在擔任顧問期間，我與該領域的人們進行了交談，并開始了自學之旅，以過渡到機器學習的職業，吸收了Andrew Ng的Deeplearning.ai Coursera專業知識，以及其他課程，研究論文和文章。當我開始通過一家咨詢公司與該領域的客戶合作時，這種經歷是非常有益和有趣的。

COVID-19和大會 (COVID-19 and General Assembly)

Enter COVID-19.

輸入COVID-19。

Though I was grateful to be in a better position than many folks out there, COVID-19 still led to some non-negligible disruption. But instead of thinking about this thing happening TO me, I wanted to flip the script and do something with the flexibility that came with working from home. As a lifelong learner in the machine learning and analytics space I had always felt like I was missing the data science portion of the puzzle. Back at Google I loved helping clients understand what was going on and what they should do using analytics, but I had gotten pretty far away from that, not to mention the cornucopia of new tools that are being used now to conduct analysis and relay the information in useful ways. After a lot of different conversations with colleagues and many late nights searching for the right solution to upgrade my data science skills, I settled on General Assembly. Specifically, I enrolled in General Assembly’s 12-week Data Science Immersive.

盡管我很高興自己處于比其他人更好的位置，但是COVID-19仍然導致了一些不可忽視的干擾。但是，我沒有想到這件事發生在我身上，而是想翻轉腳本并以在家工作時帶來的靈活性來做一些事情。作為機器學習和分析領域的終生學習者，我始終覺得自己好像錯過了難題的數據科學部分。回到Google之前，我很樂意幫助客戶了解分析的過程以及應該使用的方法，但我與之相距甚遠，更不用說現在正在使用新工具進行分析和傳遞信息的聚寶盆以有用的方式。在與同事進行了許多不同的交談并且深夜搜尋了正確的解決方案以提升我的數據科學技能之后，我決定參加大會。具體來說，我參加了大會為期12周的“沉浸式數據科學”課程。

My goals with this course are:

本課程的目標是：

Become a data wrangling master
成為數據爭用大師
Build a solid foundation in statistics
為統計打下堅實的基礎
Enhance my machine learning knowledge
增強我的機器學習知識

I’m excited to bring data science skills to my machine learning work in the future. Deep learning isn’t always feasible or necessary in a project depending on the data set and goal — this is where having a robust machine learning toolkit comes in handy. A solid statistics foundation can also be a boon when collecting and evaluating data quality, or when examining the impact labeling errors have on machine learning algorithm performance.

我很高興將來能將數據科學技能帶入我的機器學習工作中。根據數據集和目標，深度學習在項目中并不總是可行或必要的-在這里，擁有強大的機器學習工具非常有用。當收集和評估數據質量，或者檢查標記錯誤對機器學習算法性能的影響時，扎實的統計基礎也可以成為福音。

I’ll be sharing some of my journey on this blog over the coming months. If you’re interested, give me a follow.

在接下來的幾個月中，我將在此博客上分享我的一些旅程。如果您有興趣，請跟我來。

1 https://www.cisco.com/c/en/us/solutions/executive-perspectives/annual-internet-report/air-highlights.html#2 https://www.coursera.org/learn/ai-for-everyone3 For more on the challenges AGI presents, see Max Tegmark’s book, Life 3.0.?https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

1https : //www.cisco.com/c/en/us/solutions/executive-perspectives/annual-internet-report/air-highlights.html#2https ://www.coursera.org/learn/ai-所有人 3有關AGI所面臨挑戰的更多信息，請參閱Max Tegmark的書《 Life 3.0》 。https : //papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

翻譯自: https://medium.com/@caroline_clark/data-sciences-evolution-and-mine-fb12ce3156ba

數據科學的發展

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/390010.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/390010.shtml
英文地址，請注明出處：http://en.pswp.cn/news/390010.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！