hadoop將消亡_數據科學家:適應還是消亡!

hadoop將消亡

Harvard Business Review marked the boom of Data Scientists in their famous 2012 article “Data Scientist: Sexiest Job”, followed by untenable demand in the past decade. [3]

《哈佛商業評論 》在2012年著名的文章“數據科學家:最性感的工作”中標志著數據科學家的蓬勃發展,隨后十年來需求持續不振。 [3]

“..demand has raced ahead of supply. Indeed, the shortage of data scientists is becoming a serious constraint in some sectors.”

“ ..需求已經超越了供應。 實際上,在某些領域,數據科學家的短缺正在成為嚴重的制約因素。”

McKinsey & Co just published an article (Aug 2020) suggesting we rethink how many Data Scientists we really need in light of newer automation technologies (AutoML).[4]

麥肯錫公司 ( McKinsey&Co)剛剛發表了一篇文章(2020年8月),建議我們根據更新的自動化技術(AutoML)重新考慮真正需要多少數據科學家。[4]

“Over the long term, purely technical data scientists will still be needed, but simply far fewer than most currently predict.”

“從長遠來看,仍將需要純技術數據科學家,但遠遠少于目前大多數人的預測。”

Image for post
https://quanthub.com/data-scientist-shortage-2020/https://quanthub.com/data-scientist-shortage-2020/

In every boom cycle you have a shortage of talent and an influx of imposters or just less qualified people (eg, dot.com y2k if you could spell Java you were a software engineer). As domains mature, tools and automation weed out those who aren’t really qualified or aren’t doing high value work. Data Science is no different.

在每個繁榮周期中,您都會缺乏人才,冒名頂替的人或缺乏資格的人會涌入(例如,如果您可以拼寫Java,那么dot.com y2k就是您是一名軟件工程師)。 隨著領域的成熟,工具和自動化將淘汰那些沒有真正資格或沒有從事高價值工作的人。 數據科學也是如此。

骯臟的秘密 (The Dirty Secret)

Image for post
Photo by Kristina Flour on Unsplash
Kristina Flour在Unsplash上拍攝的照片

Data Science secrets are not as exciting as celebrity sex secrets unfortunately. Behind this “sexy” job is the large amount of grunt work required of Data Science projects— some of which include:

不幸的是,數據科學的秘密并不像名人性秘密那樣令人興奮。 這項“性感”工作的背后是數據科學項目所需的大量繁瑣工作,其中包括:

  • Data sourcing, validation and cleanup

    數據來源,驗證和清理
  • Trying feature combinations and engineered features

    嘗試功能組合和工程功能
  • Testing different models and model parameters

    測試不同的模型和模型參數

Most agree that data-prep work is 80% of any ML/DS project [1] which has given rise to the Data Engineer specialty [2]. The remaining time is spent trying out features and testing models to squeeze out a few % pt’s of accuracy. It simply takes a lot of time — and while experience, intuition and luck allow a scientist to narrow down the scenarios, sometimes the best solution requires trying many extra atypical (almost random) scenarios. One solution is automation and utilizing brute-force compute cycles using the new breed of tools named AutoML.

大多數人都認為數據準備工作是任何ML / DS項目的80%[1],這引起了數據工程師的專長[2]。 剩下的時間用于測試功能和測試模型,以減少百分之幾點的準確性。 它僅花費大量時間 ,而經驗,直覺和運氣使科學家可以縮小方案的范圍, 有時最好的解決方案需要嘗試許多額外的非典型(幾乎隨機)方案。 一種解決方案是自動化,并使用名為AutoML的新型工具利用蠻力計算周期。

AutoML —就像天網嗎? (AutoML — Is it like Skynet ?)

Automated Machine Learning (AutoML) is software that automates of the repetitive work for you in an organized way. (Get a demo of H2O or DataRobot and see for yourself). Feed it the data, set the goal, and take a nap while it grinds thru iterations of features, models, and parameters. While it lacks domain expertise and precision, it makes up for it with brute force and superb bookkeeping/reporting (with some logic and heuristics of course) .

自動化機器學習(AutoML)是一種軟件,可以有組織地自動執行重復性工作。 (獲取H2O或DataRobot的演示,然后親自看看)。 在通過要素,模型和參數的迭代進行研磨時,向其提供數據,設定目標并小睡一會。 盡管它缺乏領域專業知識和準確性,但它用蠻力和出色的簿記/報告(當然有一些邏輯和啟發式)來彌補它。

When and if it replaces Scientists was polled on KDNuggets 5yrs ago?—?recent thinking is that time for some of us is?very?soon.

什么時候以及是否取代它,五年前就在KDNuggets上對《科學家》進行了調查-最近的想法是,對于我們中的某些人來說,這是很快的事情。

Image for post
https://www.kdnuggets.com/2020/03/poll-automl-replace-data-scientists-results.htmlhttps://www.kdnuggets.com/2020/03/poll-automl-replace-data-scientists-results.html

Not everyone agrees of course.

當然,并非所有人都同意。

Rachel Thomas of Fast.AI: There are frequent media headlines about both the scarcity of machine learning talent and about the promises of companies claiming their products automate machine learning and eliminate the need for ML expertise altogether.” [7]

Fast.AI的Rachel Thomas: 關于機器學習人才稀缺以及關于聲稱其產品實現機器學習自動化并完全消除ML專業知識需求的公司的承諾的媒體頭條經常出現。” [7]

Dr. Thomas seems to feel AutoML is misconstrued and a fair amount of hype. She makes compelling points to help us understand the full ML cycle and what AutoML is and what it isn’t. It does not replace the work of experts but it does highly augments their work — not yet Skynet but give it some time...

托馬斯博士似乎覺得AutoML被誤解了,并且大肆宣傳。 她讓引人注目分,幫助我們理解全ML周期,什么AutoML 什么,它不是 。 它不能代替專家的工作,但是可以極大地增強他們的工作-還不是天網,但要花點時間...

那我的工作要走了嗎? (So Is My Job Going Away ?)

Google Brain co-founder Andrew Ng often states concern of imminent jobs losses caused by AI and ML [5]— however most analysis has been focused on operational and blue collar work. What about our cushy Data Science jobs? McKinsey describes the possible future awaiting us:

Google Brain的聯合創始人安德魯·伍(Andrew Ng)經常表示擔心由AI和ML造成的即將失業的工作[5],但是大多數分析都集中在運營和藍領工作上。 那我們輕松的數據科學工作呢? 麥肯錫描述了等待我們的未來:

Image for post
Rethinking AI talent重新思考AI人才

The bright side is that Data Scientists are not being fully replaced (graphic shows 29% … )— but let’s focus on McKinsey’s point to rethink the number and skillset of scientists needed. The number of scientists may drop per project as you add AutoML to your team (bots like TARS, R2D2 or HAL), but most research still suggest that aggregate demand for humans (scientists) will continue to increase for the next 5yrs+ at least.

好的一面是,數據科學家還沒有被完全取代(圖形顯示為29%…),但是讓我們關注麥肯錫的觀點,重新考慮所需的科學家數量和技能。 當您向團隊中添加AutoML(像TARS,R2D2或HAL之類的機器人)時,每個項目的科學家人數可能會減少,但是大多數研究仍然表明,至少在接下來的5年以上,對人類(科學家)的總需求將繼續增長。

The bulk of online articles [9] make it clear Data Scientists are not dead after all. But most agree AutoML has come of age and is changing the makeup of projects and staffing even today. We all need to evolve, and as a Data Scientist you need to learn to leverage AutoML and related tech improvement or risk falling behind.

大量在線文章[9]清楚地表明,數據科學家畢竟還沒有死。 但是,大多數人都同意AutoML已經成熟,并且即使在今天也正在改變項目和人員配置。 我們每個人都需要發展,作為數據科學家,您需要學習利用AutoML和相關的技術改進,否則風險就會落伍。

Automation is a good thing — we can focus on higher value work and eliminate boring and repetitive tasks (albeit the the boring, repetitive work paid pretty well …). I think we know it makes sense, why pay us when they can pay a cheaper robot? Thus next time you’re on a project, ask yourself am I doing expert Data Scientist work, an impostor, or are my days numbered ?

自動化是一件好事—我們可以專注于更高價值的工作,并消除無聊的重復性工作(盡管無聊的重復性工作的報酬很好……)。 我認為我們知道這是有道理的,為什么當他們可以付錢購買更便宜的機器人時,為什么要付錢給我們呢? 因此,下次您進行項目時,請問自己是我在做數據科學家方面的專家工作,是騙子,還是我的工作日已過?

“Will the real data scientist please stand up?”

“請真正的數據科學家站起來嗎?”

The net takeaway — the future of DS/ML is bright but you need to embrace changes or you’ll go from Data Scientist to Dead Scientist. “Resistance is Futile” — but in this case assimilating will pay off.

最終的結果-DS / ML未來是光明的,但是您需要擁抱變化,否則您將從數據科學家到死去的科學家。 “ 抵抗是徒勞的 ”-但在這種情況下,同化將奏效

參考和啟示 (References and Inspirations)

[1] Ruiz, “The 80/20 data science dilemna” — https://www.infoworld.com/article/3228245/the-80-20-data-science-dilemma.html

[1] Ruiz,“ 80/20數據科學難題” — https://www.infoworld.com/article/3228245/the-80-20-data-science-dilemma.html

[2] Angelov, “Rise of the Data Engineer” — https://towardsdatasciencte.com/the-rise-of-the-data-strategist-2402abd62866?_branch_match_id=764068755630717009

[2] Angelov ,“數據工程師的崛起” — https://towardsdatasciencte.com/the-rise-of-the-data-strategist-2402abd62866?_branch_match_id=764068755630717009

[3] HBR’s Sexiest job article— https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

[3] HBR上最性感的工作文章-https : //hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-the-21st世紀

[4] McKinsey on Rethinking AI Talent — https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/rethinking-ai-talent-strategy-as-automated-machine-learning-comes-of-age

[4]麥肯錫(McKinsey)關于對AI人才的重新思考— https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/rethinking-ai-talent-strategy-as-automated-machine-learning-comes-of -年齡

[5] Andrew Ng’s thoughts on Jobs and AI — https://www.youtube.com/watch?v=aU4RQD--Lec

[5]吳安德(Andrew Ng)關于喬布斯和人工智能的思想-https: //www.youtube.com/watch?v= aU4RQD-- Lec

[6] Looking back at the 2015 Poll on AutoML — https://www.kdnuggets.com/2020/03/poll-automl-replace-data-scientists-results.html

[6]綜觀2015輪詢上AutoML背面- https://www.kdnuggets.com/2020/03/poll-automl-replace-data-scientists-results.html

[7] FastAI’s Rachel Thomas on the AutoML hype, what ML Scientists do and what AutoML can do — https://www.fast.ai/2018/07/12/auto-ml-1/

[7] FastAI的Rachel Thomas對AutoML的炒作,ML科學家做什么以及AutoML可以做什么— https://www.fast.ai/2018/07/12/auto-ml-1/

[8] Various references to Sci-Fi AI/robots — TARS from Interstellar, HAL from 2001, Borg assimilation from Star Trek, and of course Terminator’s Skynet.

[8]關于科幻AI /機器人的各種參考文獻:《星際穿越》中的TARS,2001年以來的HAL,《星際迷航》中的博格同化,當然還有終結者的天網。

[9] Various articles on AutoML vs Humans KDNuggets, Wired, and Medium.

[9]有關AutoML與人的KDNuggets的各種文章, Wired和Medium 。

翻譯自: https://towardsdatascience.com/data-scientists-adapt-or-die-2f009ebe4935

hadoop將消亡

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390600.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390600.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390600.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

劍指 Offer 15. 二進制中1的個數 and leetcode 1905. 統計子島嶼

題目 請實現一個函數,輸入一個整數(以二進制串形式),輸出該數二進制表示中 1 的個數。例如,把 9 表示成二進制是 1001,有 2 位是 1。因此,如果輸入 9,則該函數輸出 2。 示例 1&…

[轉]kafka介紹

轉自 https://www.cnblogs.com/hei12138/p/7805475.html kafka介紹1.1. 主要功能 根據官網的介紹,ApacheKafka是一個分布式流媒體平臺,它主要有3種功能: 1:It lets you publish and subscribe to streams of records.發布和訂閱消…

如何開始android開發_如何開始進行Android開發

如何開始android開發Android開發簡介 (An intro to Android Development) Android apps can be a great, fun way to get into the world of programming. Officially programmers can use Java, Kotlin, or C to develop for Android. Though there may be API restrictions, …

httpd2.2的配置文件常見設置

目錄 1、啟動報錯:提示沒有名字fqdn2、顯示服務器版本信息3、修改監聽的IP和Port3、持久連接4 、MPM( Multi-Processing Module )多路處理模塊5 、DSO:Dynamic Shared Object6 、定義Main server (主站點) …

leetcode 149. 直線上最多的點數

題目 給你一個數組 points ,其中 points[i] [xi, yi] 表示 X-Y 平面上的一個點。求最多有多少個點在同一條直線上。 示例 1: 輸入:points [[1,1],[2,2],[3,3]] 輸出:3 示例 2: 輸入:points [[1,1],[3,…

solidity開發以太坊代幣智能合約

智能合約開發是以太坊編程的核心之一,而代幣是區塊鏈應用的關鍵環節,下面我們來用solidity語言開發一個代幣合約的實例,希望對大家有幫助。 以太坊的應用被稱為去中心化應用(DApp),DApp的開發主要包括兩大部…

2019大數據課程_根據數據,2019年最佳免費在線課程

2019大數據課程As we do each year, Class Central has tallied the best courses of the previous year, based on thousands of learner reviews. (Here are the rankings from 2015, 2016, 2017, and 2018.) 與我們每年一樣,根據數千名學習者的評論, …

2017-12-07 socket 讀取問題

1.用socke阻塞方式讀取服務端發送的數據時會出現讀取一直阻塞的情況,如果設置了超時時間會在超時時間后讀取到數據: 原因:在不確定服務器會不會發送 socket發送的數據不會返回null 或者-1 所以用常規的判斷方法是不行的。 解決辦法有兩個:1 …

靜態代理設計與動態代理設計

靜態代理設計模式 代理設計模式最本質的特質:一個真實業務主題只完成核心操作,而所有與之輔助的功能都由代理類來完成。 例如,在進行數據庫更新的過程之中,事務處理必須起作用,所以此時就可以編寫代理設計模式來完成。…

svm機器學習算法_SVM機器學習算法介紹

svm機器學習算法According to OpenCVs "Introduction to Support Vector Machines", a Support Vector Machine (SVM):根據OpenCV“支持向量機簡介”,支持向量機(SVM): ...is a discriminative classifier formally defined by a separating …

6.3 遍歷字典

遍歷所有的鍵—值對 遍歷字典時,鍵—值對的返回順序也與存儲順序不同。 6.3.2 遍歷字典中的所有鍵 在不需要使用字典中的值時,方法keys() 很有用。 6.3.3 按順序遍歷字典中的所有鍵 要以特定的順序返回元素,一種辦法是在for 循環中對返回的鍵…

Google Guava新手教程

以下資料整理自網絡 一、Google Guava入門介紹 引言 Guavaproject包括了若干被Google的 Java項目廣泛依賴 的核心庫,比如:集合 [collections] 、緩存 [caching] 、原生類型支持 [primitives support] 、并發庫 [concurrency libraries] 、通用注解 [comm…

HTML DOM方法

querySelector() (querySelector()) The Document method querySelector() returns the first element within the document that matches the specified selector, or group of selectors. If no matches are found, null is returned.Document方法querySelector()返回文檔中與…

leetcode 773. 滑動謎題

題目 在一個 2 x 3 的板上(board)有 5 塊磚瓦,用數字 1~5 來表示, 以及一塊空缺用 0 來表示. 一次移動定義為選擇 0 與一個相鄰的數字(上下左右)進行交換. 最終當板 board 的結果是 [[1,2,3],[4,5,0]] 謎板被解開。…

數據科學領域有哪些技術_領域知識在數據科學中到底有多重要?

數據科學領域有哪些技術Jeremie Harris: “In a way, it’s almost like a data scientist or a data analyst has to be like a private investigator more than just a technical person.”杰里米哈里斯(Jeremie Harris) :“ 從某種意義上說,這就像是數…

python 算術運算

1. 算術運算符與優先級 # -*- coding:utf-8 -*-# 運算符含有,-,*,/,**,//,% # ** 表示^ , 也就是次方 a 2 ** 4 print 2 ** 4 , aa 16 / 5 print 16 / 5 , aa 16.0 / 5 print 16.0 / 5 , a# 結果再進行一次floor a 16.0 // 5.0 print 16.0 // 5.0 , aa 16 // 5 print …

c語言編程時碰到取整去不了_碰到編程墻時如何解開

c語言編程時碰到取整去不了Getting stuck is part of being a programmer, no matter the level. The so-called “easy” problem is actually pretty hard. You’re not exactly sure how to move forward. What you thought would work doesn’t.無論身在何處,陷…

初創公司怎么做銷售數據分析_為什么您的初創企業需要數據科學來解決這一危機...

初創公司怎么做銷售數據分析The spread of coronavirus is delivering a massive blow to the global economy. The lockdown and work from home restrictions have forced thousands of startups to halt expansion plans, cancel services, and announce layoffs.冠狀病毒的…

leetcode 909. 蛇梯棋

題目 N x N 的棋盤 board 上,按從 1 到 N*N 的數字給方格編號,編號 從左下角開始,每一行交替方向。 例如,一塊 6 x 6 大小的棋盤,編號如下: r 行 c 列的棋盤,按前述方法編號,棋盤格…

Python基礎之window常見操作

一、window的常見操作: cd c:\ #進入C盤d: #從C盤切換到D盤 cd python #進入目錄cd .. #往上走一層目錄dir #查看目錄文件列表cd ../.. #往上上走一層目錄 二、常見的文件后綴名: .txt 記事本文本文件.doc word文件.xls excel文件.ppt PPT文件.exe 可執行…