數據科學家編程能力需要多好_我們不需要這么多的數據科學家

數據科學家編程能力需要多好

I have held the title of data scientist in two industries. I’ve interviewed for more than 30 additional data science positions. I’ve been the CTO of a data-centric startup. I’ve done many hours of data science consulting.

我曾擔任過兩個行業的數據科學家。 我已經面試了30多個其他數據科學職位。 我曾擔任以數據為中心的初創公司的CTO。 我已經完成了許多小時的數據科學咨詢。

With that background, you will hopefully realize that I’m not a data denier. I’m a firm believer in the power of statistics, machine learning, and all the tools in a data scientist’s toolbox. I know that data science is a powerhouse field filled with amazing people that are changing the world.

有這樣的背景,您將有希望認識到我不是拒絕數據的人。 我堅信統計,機器學習以及數據科學家工具箱中的所有工具的強大功能。 我知道數據科學是一個強大的領域,充滿著改變世界的杰出人士。

That being said, many companies don’t need a data scientist.

話雖這么說,許多公司并不需要數據科學家。

No, that wasn’t strong enough. Let me try again.

不,那還不夠強大。 讓我再試一遍。

The vast majority of companies that are looking for a data scientist don’t need one.

尋找數據科學家的絕大多數公司都不需要。

Of all the companies I’ve worked or interviewed with as a data scientist, I’d say 80% of them were looking for the wrong role.

在我作為數據科學家工作或采訪過的所有公司中,我要說其中80%都在尋找錯誤的角色。

Some of them just needed a data analyst. Others needed a data engineer or a data architect. The rest didn’t have a data need at all.

其中一些只需要一個數據分析師。 其他人則需要數據工程師或數據架構師。 其余的完全沒有數據需求。

您想解決什么問題? (What problem are you looking to solve?)

I always ask this question when someone is looking to hire me. Originally, I asked what they were looking to do with their data, but I’ve since realized that the answer to that latter question doesn’t matter. The focus needs to be on the problem, not the solution. Companies hire to solve problems.

當有人要雇用我時,我總是問這個問題。 最初,我問他們想如何處理他們的數據,但后來我意識到對后一個問題的答案并不重要。 重點需要放在問題上,而不是解決方案上。 公司雇用來解決問題。

Good companies don’t hire a position because it’s trendy to have around. They hire because — for every dollar that employee costs them — they are getting more than a dollar in return. It’s that simple. It’s all about ROI.

好的公司不會雇用職位,因為這很時髦。 他們之所以雇用,是因為-員工每花費1美元,他們就會獲得超過1美元的回報。 就這么簡單。 都是關于投資回報率的。

All companies understand that when it comes to positions like accounting and sales because they know how ROI works for accounting or sales. They know what problem needs to be solved and they know who can do it.

所有公司都了解會計和銷售等職位,因為他們知道投資回報率如何用于會計或銷售。 他們知道需要解決什么問題,并且知道誰可以解決。

But data confuses companies. It especially confuses older companies, but startups are not immune. We’ve all been told that there’s gold in them thar data.

但是數據使公司感到困惑。 它尤其使較老的公司感到困惑,但是初創公司并非無法幸免。 我們都被告知這些數據中有黃金。

And who doesn’t love a good gold rush?

還有誰不喜歡淘金熱呢?

Just like the gold rush of old, most people don’t know where to look for the gold, many of them have fallen for fool’s gold, and no matter how much a vein has been picked clean, people keep coming back looking for scraps.

就像古老的淘金熱一樣,大多數人都不知道在哪里尋找黃金,其中許多人已經淪為傻瓜的黃金,而且無論清理了多少靜脈,人們都不斷回來尋找廢料。

The underlying issue is that companies have been told their data is valuable. And it might be. But whether packaged for sale or used internally, data is a part of a solution, and every solution’s value is determined by the cost of the problem it is solving.

根本問題是,公司被告知其數據很有價值。 可能是這樣。 但是,無論是打包出售還是內部使用,數據都是解決方案的一部分,每個解決方案的價值都取決于解決方案的成本。

Without a problem, a solution is just an idea. And, as I’ve mentioned in multiple previous posts, ideas are worthless.

沒有問題,解決方案只是一個想法。 而且,正如我在之前的多篇文章中提到的那樣,想法毫無價值。

Data rushes happen because companies have a solution — data — and they are looking for a problem to apply it to. It’s a completely backward approach. You don’t decide to use screws because you have a screwdriver handy. You decide to use a screwdriver because you need to tighten a screw.

出現數據高峰是因為公司擁有解決方案-數據-并且他們正在尋找將其應用的問題。 這是一種完全落后的方法。 由于螺絲刀很方便,因此您不決定使用螺釘。 您決定使用螺絲刀,因為您需要擰緊螺絲。

Data is a resource. So why is data not treated like any other resource?

數據是一種資源。 那么為什么數據沒有像其他資源一樣被對待呢?

Data is inherently different than other resources in one important way.

數據在一種重要方式上與其他資源固有地不同。

Let’s look at oil, a pretty standard resource. Unless you are The Beverly Hillbillies, you don’t just find oil lying around in your backyard. If you have thousands of tons of oil, you have it because you planned to have it for a specific purpose. And once you use it for that purpose, it’s gone.

讓我們看一下石油,這是一種非常標準的資源。 除非您是The Beverly Hillbillies ,否則您不僅會發現后院周圍散布著石油。 如果您有數千噸的石油,那么就擁有它是因為您計劃將其用于特定目的。 一旦將其用于此目的,它就消失了。

But companies have exabytes of data. Maybe they had it for a purpose. Maybe there was a regulatory requirement for them to keep it. Maybe it was just easier to keep than to throw away.

但是公司擁有EB級的數據。 也許他們有目的。 也許他們有保留的監管要求。 也許保留起來比扔掉要容易。

Whatever the reason, they have it now, and they want to use it. They just don’t know what to use it for. And they often assume data scientists are the answer. After all, data is right there in the title, and scientists are smart.

無論出于何種原因,他們現在都擁有它,并且想要使用它。 他們只是不知道用它做什么。 他們通常認為數據科學家就是答案。 畢竟,數據就在標題中,科學家是聰明的。

科學家不是你拼寫工程師的方式 (S-c-i-e-n-t-i-s-t is not how you spell engineer)

Image for post
Photo by NeONBRAND on Unsplash
NeONBRAND在Unsplash上拍攝的照片

Let me give these companies the benefit of the doubt and say they actually do have problems that their data could solve. That still doesn’t necessarily make hiring a data scientist the correct next step.

讓我給這些公司帶來疑問的好處,并說他們確實確實存在其數據可以解決的問題。 但這并不一定使下一步聘請數據科學家成為正確的選擇。

Data scientists solve puzzles. They take billions of pieces of data and turn them into a single, cohesive picture. But they can’t do that if you don’t give them all the pieces.

數據科學家解決難題。 他們獲取數十億條數據,并將它們轉變為單一的,有凝聚力的圖像。 但是,如果您不給他們所有的東西,他們將無法做到這一點。

If your data streams into ten different systems that don’t talk to each other, you are setting your data scientist up for failure. You need someone that can bridge those systems, bringing the data into a single place. That’s the job of a data engineer, not a data scientist. Depending on the situation, you may also need data architecture, data modeling, and database administration.

如果您的數據流到十個彼此不通信的不同系統中,那么您將使數據科學家面臨失敗的準備。 您需要可以橋接這些系統的人員,將數據放在一個地方。 那是數據工程師的工作,而不是數據科學家的工作。 根據情況,您可能還需要數據體系結構,數據建模和數據庫管理。

If you really want to, you can find a data scientist that can handle everything from the engineering to the DB admin work. I’ve been that data scientist. But my rate was much higher than what they would have paid to just hire the correct person for the job.

如果確實需要,您可以找到一個數據科學家,可以處理從工程到數據庫管理員的所有工作。 我一直是那個數據科學家。 但是我的薪水比他們僅僅雇用合適的人所付出的薪水要高得多。

Why did they overpay? Because they didn’t yet understand the current status of their data or what a data scientist actually does.

他們為什么多付錢? 因為他們還不了解數據的當前狀態或數據科學家的實際行為。

Why did I take the job? Because I was too naive to know better.

我為什么要這份工作? 因為我太天真,無法更好地了解。

Everyone would have been better off if the company had hired a data engineer, waited 6–12 months, then brought on a data scientist when they were fully prepared.

如果公司聘請了一位數據工程師,等待了6到12個月,然后在他們做好充分準備的情況下請來了一位數據科學家,那么每個人都會過得更好。

準備? 有目標嗎? 聘請! (Ready? Have an aim? Hire!)

Has your company identified problems that you need data science to solve?

您的公司是否已確定需要數據科學解決的問題?

Is your data in a state that a data scientist can work with?

您的數據處于數據科學家可以使用的狀態嗎?

If you answered both of these with a definitive ‘yes’, then you may need a data scientist. Congratulations, your company is doing things right. Pat yourselves on the back no more than three times then go do some amazing things.

如果您用肯定的“是”回答了這兩個問題,那么您可能需要一位數據科學家。 恭喜,您的公司做對了。 拍拍自己的背部不超過三遍,然后去做一些令人驚奇的事情。

If you answered either question with a ‘no’ or a general look of confusion, then save your money and a data scientist’s sanity by taking down that job posting you just put up. Maybe replace it with a posting for a data engineer or data analyst. Or maybe just be happy not to have to go through the hiring process.

如果您回答“否”或普遍感到困惑,則可以通過刪除剛提出的工作來節省金錢和數據科學家的理智。 也許將其替換為數據工程師或數據分析師的帖子。 或者也許只是高興地不必經歷整個招聘過程。

Not sure what you need? Talk to a data consultant before you waste your money.

不確定你需要什么? 在浪費金錢之前,請與數據顧問聯系。

Like this advice? Take 0.001% of the money you just saved and buy me a drink someday.

喜歡這個建議嗎? 拿走您剛存的錢的0.001%,有一天再給我喝一杯。

翻譯自: https://medium.com/swlh/do-we-need-data-scientists-8d8e8062688a

數據科學家編程能力需要多好

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389058.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389058.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389058.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

基于xtrabackup GDIT方式不鎖庫作主從同步(主主同步同理,反向及可)

1.安裝數據同步工具 注:xtrabackup 在數據恢復的時候比mysqldump要快很多,特別是大數據庫的時候,但網絡傳輸的內容要多,壓縮需要占用時間。 yum install https://www.percona.com/downloads/XtraBackup/Percona-XtraBackup-2.4.12…

excel表格行列顯示十字定位_WPS表格:Excel表格打印時,如何每頁都顯示標題行?...

電子表格數據很多的時候,要分很多頁打印,如何每頁都能顯示標題行呢?以下表為例,我們在WPS2019中演示如何每頁都顯示前兩行標題行?1.首先點亮頂部的頁面布局選項卡。然后點擊打印標題或表頭按鈕。2.在彈出的頁面設置對話…

opencv(二) 圖片處理

opencv 圖片處理 opencv 圖片像素操作 取像素點操作設置像素點取圖片塊分離,合并 b, g, r import numpy as np import cv2 as cvimg cv.imread(/Users/guoyinhuang/Desktop/G77.jpeg)# 獲取像素值 px img[348, 120] # 0 是y, 1 是x print(px)blue img[100, 1…

【NLP】語言模型和遷移學習

10.13 Update:最近新出了一個state-of-the-art預訓練模型,傳送門:李入魔:【NLP】Google BERT詳解?zhuanlan.zhihu.com1. 簡介長期以來,詞向量一直是NLP任務中的主要表征技術。隨著2017年底以及2018年初的一系列技術突…

TCPIP傳送協議

以下代碼實現在客戶端查詢成績(數據庫在服務器端): 客戶端: static void Main(string[] args) { string str null; while (str ! Convert.ToString(0)) { Console.WriteLine("…

sql優化技巧_使用這些查詢優化技巧成為SQL向導

sql優化技巧成為SQL向導! (Become an SQL Wizard!) It turns out storing data by rows and columns is convenient in a lot of situations, so relational databases have remained a cornerstone of data management in businesses across the globe. Structured…

Day 4:集合——迭代器與List接口

Collection-迭代方法 1、toArray() 返回Object類型數據,接收也需要Object對象! Object[] toArray(); Collection c new ArrayList(); Object[] arr c.toArray(); 2、iterator() Collection的方法,返回實現Iterator接口的對象,…

oem是代工還是貼牌_代加工和貼牌加工的區別是什么

展開全部代加工就是替別人加工,貼別人的牌子。貼牌加工即商家自己不生產,而是委托其他生產企e68a8462616964757a686964616f31333365663431業生產,而品牌是自己的。拓展資料:OEM(Original Equipment Manufacture)的基本含義是定牌生…

KNN 算法--圖像分類算法

KNN 算法–圖像分類算法 找到最近的K個鄰居,在前k個最近樣本中選擇最近的占比最高的類別作為預測類別。 給定測試對象,計算它與訓練集中每個對象的距離。圈定距離最近的k個訓練對象,作為測試對象的鄰居。根據這k個緊鄰對象所屬的類別&#xf…

java核心技術-NIO

1、reactor(反應器)模式 使用單線程模擬多線程,提高資源利用率和程序的效率,增加系統吞吐量。下面例子比較形象的說明了什么是反應器模式: 一個老板經營一個飯店, 傳統模式 - 來一個客人安排一個服務員招呼…

物種分布模型_減少物種分布建模中的空間自相關

物種分布模型Species distribution models (SDM; for review and definition see, e.g., Peterson et al., 2011) are a dominant paradigm to quantify the relationship between environmental dynamics and several manifestations of species biogeography. These statisti…

BZOJ1014: [JSOI2008]火星人prefix

BZOJ1014: [JSOI2008]火星人prefix Description 火星人最近研究了一種操作:求一個字串兩個后綴的公共前綴。 比方說,有這樣一個字符串:madamimadam,我們將這個字符串的各個字符予以標號: 序號: 1 2 3 4 5 6…

redis將散裂中某個值自增_這些Redis命令你都掌握了沒?

本章主要內容字符串命令、列表命令和集合命令散列命令和有序集合命令發布命令與訂閱命令其他命令本章將介紹一些沒有在第1章和第2章出現過的Redis命令,學習這些命令有助于讀者在已有示例的基礎上構建更為復雜的程序,并學會如何更好地去解決自己遇到的問題…

asp.net的MessageBox

public class MessageBox{ public enum MsgButton { /// <summary> /// 只是OK按鈕 /// </summary> OK 1, /// <summary> /// 提示是否確定 /// </summary> OKCancel 2 } publ…

深入理解激活函數

為什么需要非線性激活函數&#xff1f; 說起神經網絡肯定會降到神經函數&#xff0c;看了很多資料&#xff0c;也許你對激活函數這個名詞會感覺很困惑&#xff0c; 它為什么叫激活函數&#xff1f;它有什么作用呢&#xff1f; 看了很多書籍上的講解說會讓神經網絡變成很豐富的…

如何一鍵部署項目、代碼自動更新

為什么80%的碼農都做不了架構師&#xff1f;>>> 摘要&#xff1a;my-deploy:由nodejs寫的一個自動更新工具,理論支持所有語言(php、java、c#)的項目,支持所有git倉庫(bitbucket、github等)。github效果如何?如果你的后端項目放在github、bitbucket等git倉庫中管理…

Kettle7.1在window啟動報錯

實驗環境&#xff1a; window10 x64 kettle7.1 pdi-ce-7.1.0.0-12.zip 錯誤現象&#xff1a; a java exception has occurred 問題解決&#xff1a; 運行調試工具 data-integration\SpoonDebug.bat //調試錯誤的&#xff0c;根據錯誤明確知道為何啟動不了&#xff0c;Y--Y-…

opa847方波放大電路_電子管放大電路當中陰極電阻的作用和選擇

膽機制作知識視頻&#xff1a;6P14單端膽機用示波器方波測試輸出波形詳細步驟演示完整版自制膽機試聽視頻&#xff1a;膽機播放《猛士的士高》經典舞曲 熟悉的旋律震撼的效果首先看下面這一張300B電子管電路圖&#xff1a;300B單端膽機原理圖圖紙里面畫圓圈的電阻就是放大電路當…

鍵盤鉤子

C#鍵盤鉤子//*************************鍵盤鉤子********************** //定義變量 public delegate int HookProc(int nCode, Int32 wParam, IntPtr lParam); static int hKeyboardHook 0; HookProc KeyboardHookProcedure; /************************* * 聲明API函數 * ***…

matplotlib基礎函數函數 plot, figure

matplotlib.pyplot.plot(*args, scalexTrue, scaleyTrue,dataNone,**kwargs) 用線段和標記去繪制x和y。調用簽名&#xff1a; plot([x], y, [fmt], *, dataNone, **kwargs) plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)點或線的坐標由x, y給出 操作參數 fmt 是為了…