黑客獨角獸
Preface
前言
Last week my friend and colleague Srivastan Srivsan’s note on LinkedIn about Mathematics and Data Science opened an excellent discussion. Well, it is not something new; there were debates in the tech domain such as vim v.s emacs to many others. The debate about Math and Data science has elevated to new areas every year since 2013. Above all, the industry notion (or confusion) about Unicorn Data Scientist remains as a catalyst to the debate. And the HR is in search of ‘Purple Squirrel.’ Why are we debating? That is an interesting question to ask ourselves.
上周,我的朋友和同事Srivastan Srivsan在LinkedIn上有關數學和數據科學的筆記開始了精彩的討論。 好吧,這不是新事物。 在技??術領域有很多其他的辯論,例如vim vs emacs。 自2013年以來,關于數學和數據科學的爭論每年都有新的發展。最重要的是,有關Unicorn Data Scientist的行業概念(或困惑)仍然是辯論的催化劑。 人力資源部正在尋找“紫色松鼠”。 我們為什么要辯論? 這是一個有趣的問題。
Problem of Definition
定義問題
A definition suffers from three types of problems they are defect (or narrow), over-application, and impossible (mismatch) (Borrowed from Indian Philosophy). The debate of Mathematics Specialist and Data Scientist is all about definition. The term ‘Data Scientist’ appears in a Job description for various job roles. Still, the title is Data Scientist, and we search for a person who can do everything.
定義存在三種類型的問題,即缺陷(或狹窄),應用過度和不可能(不匹配)(來自印度哲學)。 數學專家和數據科學家的辯論都與定義有關。 術語“數據科學家”出現在職位描述中,表示各種職位。 標題仍然是數據科學家,我們正在尋找一個可以做所有事情的人。
KDD2020 had an exciting session on Training Data Scientists of the Future. Eminent personalities in the area, such as Thomas Davenport, Usama, and Keith, were leading the discussion. One of the suggestions from Davenport was;
KDD2020舉辦了關于未來培訓數據科學家的精彩會議。 該地區的知名人士,例如托馬斯·達文波特(Thomas Davenport),烏薩馬(Usama)和基思(Keith),正在主持討論。 達文波特的建議之一是:
“They should circulate a draft list of job types, ask for commentary, and then finalize the list. Then ask those who practice each job what the necessary skills are. Again, send out a draft list, ask for comments, and finalize the skill list too.”
“他們應該散發一份工作類型清單草案,征求評論意見,然后最終確定清單。 然后問那些從事每項工作的人必備的技能是什么。 再次,發送一份草稿清單,征求意見,并最終確定技能清單。”
I would say Davenport was spot on point. There are thousands of recruitments Job Description out on the internet. Most of them are trying to find the purple squirrel or the unicorn in Data Science and Machine Learning. Lack of uniformity in the JD with-in industry and within the same organization is a significant gap in Data Science, Machine Learning, and AI recruitment. What we need is a rule of thumb to write a JD based on what we are going to achieve. Let’s discuss this in detail later.
我會說達文波特是關鍵點。 互聯網上有成千上萬的招聘職位描述。 他們中的大多數人都試圖在數據科學和機器學習中找到紫色的松鼠或獨角獸。 JD嵌入式行業和同一組織內部缺乏統一性,這在數據科學,機器學習和AI招聘方面存在巨大差距。 我們需要的經驗法則是根據我們要實現的目標編寫JD。 讓我們稍后詳細討論。
Changing Industry Patterns
改變行業格局
Well, what is the relation to Mathematics and JD? The role of the Data Scientist evolved over a period of time. It is almost ten years since the term Data Science started appearing in JD. From 2010 to date, many technologies evolved, died, and resurrected. From sklearn and ‘R’ to Mahout to Spark to H20 and TensorFlow and ocean of frameworks. There was a time (pre-2010) NLTK was the only Natural Language Processing framework in Python (yes! we had MontyLingua RIP). Perl was a swiss army knife for many NLP tasks to start with. Above all, theoretical advances, including Deep Learning and Reinforcement Learning, is commendable. Early 2000’s when we used to go for Computer Science faculty development programs; we mention theoretical aspects of RL. Now students in the same college will show RL demo with OpenAI Gym! That is about change in technology and learning.
那么,與數學和法學博士有什么關系? 數據科學家的角色在一段時間內得到了發展。 從“數據科學”這個術語開始出現在京東以來已經有近十年了。 從2010年至今,許多技術得到了發展,死亡和復活。 從sklearn和'R'到Mahout到Spark到H20和TensorFlow以及眾多的框架。 曾經有一段時間(2010年前),NLTK是Python中唯一的自然語言處理框架(是的,我們有MontyLingua RIP)。 Perl是瑞士軍刀,可以完成許多NLP任務。 首先,值得贊揚的是包括深度學習和強化學習在內的理論進步。 2000年代初,我們曾經參加計算機科學系的教師發展計劃。 我們提到了RL的理論方面。 現在,同一所大學的學生將使用OpenAI Gym展示RL演示! 那是關于技術和學習的變化。
What a data scientist does in an enterprise changed a lot too. The nature of use-cases, the volume of data awareness about need, and the ROI of Data Science problems increased. Project objectives are very focused on the enterprise. AI/Ml and Data Science adoption are attaining maturity level in most of the companies, beyond adjusting to hype circle.
數據科學家在企業中所做的工作也發生了很大變化。 用例的性質,對需求的數據意識的數量以及數據科學問題的ROI都在增加。 項目目標非常關注企業。 除了適應炒作圈,大多數公司都采用AI / Ml和Data Science來達到成熟水平。
The missing piece in this game is the categorization of Job Roles and expectations. The job of a Machine Learning Scientist is different from a Data Scientist, and it different from Machine Learning Engineer. Hence one size fit for all JD’s is no more relevant. The question is who is a Machine Learning Scientist, Data Scientist, and Machine Learning Engineer (there are more titles to add).
該游戲中缺少的部分是工作角色和期望的分類。 機器學習科學家的工作不同于數據科學家,它也不同于機器學習工程師。 因此,適合所有JD的一種尺寸不再重要。 問題是誰是機器學習科學家,數據科學家和機器學習工程師(還有更多標題要添加)。
Who is Who?
誰是誰?
A Machine Learning Scientist is one who designs new algorithms (maybe based on existing algorithms) to solve a specific problem or a set of issues in general. What is expected is the ability to formulate a hypothesis walk to prove the same in a very scientific method and implement it (maybe expectation may go beyond the same). Sometimes the persona will be responsible for implementing the theory and bring a new framework or system. To understand how this looks like is think that you are going to work for the core TensorFlow, PyTorch, or Watson team. The job is not to perform API mashup from only existing libraries. In such a position, knowledge in programming, Mathematics, and Machine Learning is very critical. Some of the companies call this role as an Algorithm Developer (AI/ML/DL…). When hiring for such a position, the HR concept of Purple Squirrel may be relevant. Training skills and background are essential. Most of the time, experience may not be a blocker for such roles for the right candidate.
機器學習科學家是設計新算法(可能基于現有算法)以解決特定問題或一般問題的人。 可以預期的是,能夠以一種非常科學的方法制定假設步伐以證明相同并得以實施(也許期望可能會超出相同范圍)。 有時,角色將負責實施理論并帶來新的框架或系統。 要了解這種情況,請考慮為TensorFlow,PyTorch或Watson核心團隊工作。 這項工作不是僅從現有庫中執行API混搭。 在這種情況下,編程,數學和機器學習方面的知識至關重要。 一些公司將此角色稱為算法開發人員(AI / ML / DL…)。 在招聘此類職位時,紫色松鼠的人力資源概念可能很重要。 培訓技能和背景至關重要。 在大多數情況下,經驗對于合適的候選人而言可能不是阻礙。
The Data Scientist’s role in the enterprise is to solve a given problem with existing algorithms. Starting from the Business Understanding to handover the model to production (AIOps) will be the range of typical responsibility. Everybody will be searching for unicorns in this space because the end to end Data Science is the expectation. Successful enterprises focus on hiring people who can build models and be creative in the data. For all the practical purposes, such a person should be tagged along with a Data Engineer and functional specialist. Such a structure adds the burden of an additional role as a Project Manager. Still, it has long term benefits. We will discuss the strategy bit later; let’s get into the JD writing for Data Scientist.
數據科學家在企業中的作用是使用現有算法解決給定的問題。 從業務理解開始,將模型移交給生產(AIOps),將是典型職責范圍。 每個人都將在這個領域中尋找獨角獸,因為端到端數據科學是人們的期望。 成功的企業專注于雇用可以建立模型并在數據中發揮創造力的人員。 出于所有實際目的,應該與數據工程師和職能專家一起對此類人員進行標記。 這樣的結構增加了作為項目經理的額外角色的負擔。 不過,它具有長期利益。 我們將在稍后討論該策略。 讓我們開始為數據科學家撰寫JD文章。
The rule of thumb is not to expect a Unicorn ;-). If you are looking for a short term staff, think about what your team would like to achieve in the foreseeable horizon — problem statements in hand type of data what is expected from the data scientist: current and expected technology stacks and level of experience desired. All of these points will help you to draft a clear JD. Such a JD makes the life of a recruiting agent much simple. For the long term staff (probably hiring a fulltime staff), one should do some groundwork.
經驗法則是不要期望獨角獸;-)。 如果您正在尋找短期人員,請考慮您的團隊在可預見的范圍內希望實現的目標-數據類型中的問題陳述,數據科學家的期望:當前和期望的技術堆棧以及所需的經驗水平。 所有這些要點將幫助您起草清晰的JD。 這樣的JD使招聘代理的工作變得非常簡單。 對于長期員工(可能雇用全職員工),應該做一些基礎工作。
For long term hire, there are many things to consider. First of all, ask the question of what is the organization’s AI vision for the next three years. Well, if you don’t see one for the organization, try to create one for the team/business unit first review and finalize it. (Sometimes it is better to hire a consultant to assess and recommend a strategy). When hiring for individual teams, we may expect the candidate to know the domain and experience or knowledge in Data Science. Determine the domain knowledge requirements; if you have functional experts in the team, you may be able to relax this. (There are many domains where specific experience is challenging to achieve without working in the industry. ) What problems we need to solve in the next one/two/three years. To solve such a situation, what kind of algorithms may be helpful. Are you interested in swimming in the algorithm framework wave? Answer to these questions will help you to narrow down algorithm level expectations. Now it is time for technology frameworks; here we will decide R or Python or Spark, etc..
對于長期雇用,有很多事情要考慮。 首先,提出以下問題:組織對未來三年的AI愿景是什么? 好吧,如果您看不到該組織的一個,請嘗試為團隊/業務部門創建一個,然后首先進行審核并完成。 (有時最好聘請顧問來評估和推薦策略)。 在聘請單個團隊時,我們可能希望候選人知道數據科學領域和經驗或知識。 確定領域知識要求; 如果團隊中有職能專家,則可以放松一下。 (在許多領域中,沒有行業工作就很難獲得特定的經驗。)在接下來的一,二,三年中,我們需要解決哪些問題。 為了解決這種情況,哪種算法可能會有所幫助。 您對算法框架浪潮感興趣嗎? 回答這些問題將幫助您縮小算法級別的期望范圍。 現在是時候建立技術框架了。 在這里,我們將決定R或Python或Spark等。
Last but not least, the skills to explore the data is essential. One can generalize or specify technologies in this part. It is better not to expect a candidate to be hands-on Natural Language Processing and Time Seris at the same time. What we are looking for here is clear and measurable JD concerning skill, knowledge, and experience.
最后但并非最不重要的一點是,探索數據的技能至關重要。 可以在這一部分中概括或指定技術。 最好不要期望候選人同時進行自然語言處理和時間序列。 我們在這里尋找的是關于技能,知識和經驗的清晰且可衡量的JD。
In a Data Scientists role, problem to solution plays an important role. A person who understands statistics and linear algebra, trained in Machilearning and Data Science, should work. The ability to systematically approach data and drive the desired business outcome is the primary goal in this role. Bringing a new algorithm is often not an objective at all. Here the mathematician part will get diluted.
在數據科學家角色中,解決問題起著重要作用。 受過Machilearning和Data Science培訓的理解統計和線性代數的人應該工作。 系統地處理數據并驅動所需業務成果的能力是此角色的主要目標。 提出新算法通常根本不是目標。 在這里,數學家部分將被稀釋。
One can argue! Shouldn’t we still refer them as Data Miner or Analytics Professional? That is an excellent question with debatable answers?
有人可以爭辯! 我們是否應該仍將其稱為Data Miner或Analytics Professional? 這是一個值得商bat的好問題?
AI/ML and Data Science platforms, API’s and AutoML are hot topics and trends in the industry. The trend contributed to a new set of roles for AI/ML/Data Science Journey. Machine Learning Engineer, AI (….) Developer and AIOps Engineer, etc.. Will discuss these roles in detail in a separate note. Now the story from the hiring manager standpoint is whom you are hiring? Depending on the same, you may need a strong mathematician or not. For an aspirant, what role you are fit for is essential to select a learning path. It is time for all AI/ML/Data Science course owners to publish skills it attempts to develop. Skills developed by a course and skills expected by a prospective role/employer is comparable.
AI / ML和數據科學平臺,API和AutoML是業界的熱門話題和趨勢。 這種趨勢推動了AI / ML /數據科學之旅的一系列新角色。 機器學習工程師,AI(…。)開發人員和AIOps工程師等。將在單獨的注釋中詳細討論這些角色。 現在,從招聘經理的角度來看,您正在招聘誰? 取決于兩者,您可能需要或不需要強大的數學家。 對于有抱負的人,適合自己的角色對于選擇學習路徑至關重要。 現在是所有AI / ML /數據科學課程所有者發布其嘗試開發的技能的時候了。 一門課程開發的技能和預期角色/雇主所期望的技能是可比的。
To be contd….
待續...。

#ai #aiml #datascience #machinelearning
#ai #aiml#數據科學#機器學習
翻譯自: https://medium.com/the-innovation/dsunicorns-8fa01b1de79
黑客獨角獸
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389980.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389980.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389980.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!