《成為一名機器學習工程師》
by Sudharsan Asaithambi
通過Sudharsan Asaithambi
成為機器學習的拉斐爾·納達爾 (Become the Rafael Nadal of Machine Learning)
One year back, I was a newbie to the world of Machine Learning. I used to get overwhelmed by small decisions, like choosing the language to code with, choosing the right online courses, or choosing the correct algorithms.
一年前,我是機器學習領域的新手。 我過去常常被一些小的決定所淹沒,例如選擇編碼語言,選擇正確的在線課程或選擇正確的算法。
So, I have planned to make it easier for folks to get into Machine Learning.
因此,我計劃讓人們更輕松地學習機器學習。
I’ll assume that many of us are starting from scratch on our Machine Learning journey. Let’s find out how current professionals in the field reached their destination, and how we can emulate them on our journey.
我假設我們中的許多人是在機器學習之旅中從頭開始的。 讓我們找出當前該領域的專業人員如何到達目的地,以及我們如何在旅途中效仿他們。
I will illustrate how you can learn Data Science by drawing a parallel between how Rafael Nadal learned to play tennis, and how you can learn Machine Learning.
我將通過拉斐爾·納達爾(Rafael Nadal)的打網球方式與機器學習的方式之間的相似之處來說明如何學習數據科學。
投入自己-階段1 (Commit Yourself — Stage 1)
Nadal had sports talent all around him in his family. Inspired by them, he began his tennis journey at the age of 3.
納達爾在他的家人中都擁有體育才能。 受他們的啟發,他從3歲開始網球之旅。
For anyone starting out in Machine Learning, it’s important to surround yourselves with people who are also learning, teaching and practicing Machine Learning.
對于剛開始學習機器學習的任何人來說,重要的是要讓自己也同時學習,教授和練習機器學習。
Learning the ropes is not easy if you do it alone. So, commit yourselves to learning Machine Learning — and find data science communities to help make your entry less painful.
如果獨自一人學習繩索并不容易。 因此,請致力于學習機器學習-并找到數據科學社區,以幫助減輕您的入學痛苦。
學習生態系統-第二階段 (Learn the Ecosystem — Stage 2)
Rafael Nadal learnt the not only the rules of Tennis, but also the surrounding ‘ecosystem’.
拉斐爾·納達爾(Rafael Nadal)不僅學習了網球規則,還學習了周圍的“生態系統”。
He learnt about the different types of rackets, balls, court surfaces. He learned about the scoring in tennis. He enrolled himself for a tennis coaching.
他了解了球拍,球和球場表面的不同類型。 他了解了網球得分的知識。 他報名參加了網球教練。
探索機器學習生態系統 (Discover the Machine Learning ecosystem)
Data Science is a field which has embraced and made full use of open source platforms. While data analysis can be conducted in a number of languages, using the right tools can make or break projects.
數據科學是一個已經擁抱并充分利用開源平臺的領域。 雖然可以使用多種語言進行數據分析,但是使用正確的工具可以創建或破壞項目。
Data Science libraries are flourishing in the Python and R ecosystems. See here for an infographic on Python vs R for data analysis.
數據科學圖書館在Python和R生態系統中蓬勃發展。 參見此處獲取有關Python與R進行數據分析的信息圖 。
Whichever language you choose, Jupyter Notebook and RStudio makes our life much easier. They allow us to visualize data while manipulating it. Follow this link to read more on the features of Jupyter Notebook.
無論選擇哪種語言, Jupyter Notebook和RStudio 都能使我們的生活變得更加輕松。 它們使我們能夠在處理數據時可視化數據。 單擊此鏈接以閱讀有關Jupyter Notebook功能的更多信息。
Kaggle, Analytics Vidhya, MachineLearningMastery and KD Nuggets are some of the active communityies where data scientists all over the world enrich each other’s learning.
Kaggle,Analytics Vidhya,MachineLearningMastery和KD Nuggets是活躍的社區,全世界的數據科學家都在此相互學習。
Machine Learning has been democratized by online courses or MOOCs from Coursera, EdX and others, where we learn from amazing professors at world class universities. Here’s a list of the top MOOCs on data science available right now.
機器學習已被Coursera , EdX等公司的在線課程或MOOC民主化,我們從世界一流大學的杰出教授那里學習。 這是目前可用的數據科學頂級MOOC列表 。
鞏固基金會-第三階段 (Cement the Foundation — Stage 3)
拉斐爾·納達爾(Rafael Nadal)掌握了基本動作 (Rafael Nadal learned the basic shots)
Nadal’s coach taught him the forehand and backhand shots. This is the main foundation of tennis. Rafael could play the match competently with these basic shots.
納達爾的教練教給他正手和反手射擊。 這是網球的主要基礎。 拉斐爾可以憑借這些基本投籃勝任比賽。
學習操縱數據 (Learn to manipulate data)
Data scientists, according to interviews and expert estimates, spend 50 percent to 80 percent of their time mired in the mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets. - Steve Lohr of New York Times
根據采訪和專家估計,數據科學家將其50%至80%的時間都花在了收集和準備不規則數字數據的繁瑣工作上,然后才可以探索有用的塊。 -紐約時報的史蒂夫·洛爾
‘Data Crunching’ is the soul of the whole Machine Learning workflow. To help with this process, the Pandas library in python or R’s DataFrames allow you to manipulate and conduct analysis. They provide data structures for relational or labeled data.
“數據處理”是整個機器學習工作流程的靈魂。 為了幫助完成此過程,可以使用python或R's DataFrames中的Pandas庫來操縱和進行分析。 它們提供關系數據或標記數據的數據結構。
Data science is more than just building machine learning models. It’s also about explaining the models and using them to drive data-driven decisions. In the journey from analysis to data-driven outcomes, data visualization plays a very important role of presenting data in a powerful and credible way.
數據科學不僅僅是構建機器學習模型。 它還涉及解釋模型并使用它們來驅動數據驅動的決策。 在從分析到以數據為依據的結果的過程中,數據可視化扮演著以強大而可靠的方式呈現數據的非常重要的角色。
Matplotlib library in Python or ggplot in R offer complete 2D graphic support with very high flexibility to create high quality data visualizations.
Matplotlib Python中的庫或R中的ggplot提供了完整的2D圖形支持,并且具有很高的靈活性,可以創建高質量的數據可視化。
These are some of the libraries you will be spending most of your time on when conducting the analysis.
這些是進行分析時將花費大部分時間的一些庫。
日復一日地練習—階段4 (Practice day in and day out — Stage 4)
Rafael Nadal, when asked how much he trained:
當被問及他接受了多少訓練時,拉斐爾·納達爾(Rafael Nadal):
“I train four hours a day, 210 days a year. If we add to that I play around 80 matches per year, each one lasting an average of two hours. That is 1000 hours playing tennis per year — and that is without counting the training days during tournaments.”
“我一年210天,每天訓練四個小時。 如果再加上我每年參加約80場比賽,平均每場比賽持續2個小時。 那就是每年打網球1000個小時-這還不包括比賽期間的訓練天數。”
學習機器學習算法并進行實踐 (Learn Machine Learning algorithms and practice them)
After the foundation is set, you get to implement the Machine Learning algorithms to predict and do all the cool stuff.
設置好基礎之后,您就可以實現機器學習算法來預測和完成所有有趣的工作。
The Scikit-learn library in Python or the caret, e1071 libraries in R provide a range of supervised and unsupervised learning algorithms via a consistent interface.
Python中的Scikit-learn庫或R中的caret , e1071庫通過一致的接口提供了一系列有監督和無監督的學習算法。
These let you implement an algorithm without worrying about the inner workings or nitty-gritty details.
這些使您可以實現算法,而不必擔心內部工作原理或細節問題。
Apply these machine learning algorithms in the use cases you find all around you. This could either be in your work, or you can practice in Kaggle competitions. In these, data scientists all around the world compete at building models to solve problems.
在周圍發現的用例中應用這些機器學習算法。 這可以在您的工作中,也可以在Kaggle比賽中進行練習。 在這些工具中,世界各地的數據科學家都在競爭解決問題的模型構建方面。
Simultaneously, understand the inner workings of one algorithm after another. Starting with ‘Hello World!’ of Machine Learning, Linear Regression then move to Logistic Regression, Decision Trees to Support Vector Machines. This will require you to brush up your statistics and linear algebra.
同時,了解一種算法的內部工作原理。 從“ Hello World!”開始 機器學習, 線性回歸然后轉向邏輯回歸 , 決策樹 支持向量機 。 這將要求您重新整理統計信息和線性代數。
Coursera Founder Andrew Ng, a pioneer in AI has developed a Machine Learning course which gives you a good starting point to understanding inner workings of Machine Learning algorithms.
Coursera創始人AI的先驅Andrew Ng開發了機器學習課程 ,為您提供了一個很好的起點,讓您了解機器學習算法的內部工作原理。
學習高級技能-階段5 (Learn the advanced skills— Stage 5)
拉斐爾·納達爾(Rafael Nadal)學會了打高手 (Rafael Nadal learned to play advanced shots)
Nadal, while concentrating on the fundamental play, also was introduced to the advanced shots. The shots that only professionals who play tennis day in and day out are able to pull off.
納達爾(Nadal)在專注于基本比賽的同時,也向他介紹了高級投籃。 只有日復一日打網球的專業人士才能投籃。
學習復雜的機器學習算法和深度學習架構 (Learn complex Machine Learning Algorithms and Deep Learning architectures)
While Machine Learning as a field was established long back, the recent hype and media attention is primarily due to Machine Learning applications in AI fields like Computer Vision, Speech Recognition, Language Processing. Many of these have been pioneered by the tech giants like Google, Facebook, Microsoft.
雖然機器學習作為一個領域早已建立,但最近的炒作和媒體關注主要歸因于AI領域中的機器學習應用,例如計算機視覺,語音識別,語言處理。 其中許多都是由Google,Facebook,Microsoft等科技巨頭開創的。
These recent advances can be credited to the progress made in cheap computation, the availability of large scale data, and the development of novel Deep Learning architectures.
這些最新進展可以歸功于廉價計算,大規模數據的可用性以及新型深度學習架構的發展。
To work in Deep Learning, you will need to learn how to process unstructured data — be it free text, images, or sounds.
要在深度學習中工作,您將需要學習如何處理非結構化數據-無論是自由文本,圖像還是聲音。
You will learn to use platforms like TensorFlow or Torch, which lets us apply Deep Learning without worrying about low level hardware requirements. You will learn Reinforcement learning, which has made possible modern AI wonders like AlphaGo Zero.
您將學習使用TensorFlow或Torch之類的平臺,這使我們能夠應用深度學習,而不必擔心底層硬件的需求。 您將學習強化學習,這使諸如AlphaGo Zero之類的現代AI奇跡成為可能。
立即邁出學習機器學習的第一步! (Take your first step towards learning Machine Learning now!)
- Install Anaconda and use Jupyter to write Python 安裝Anaconda并使用Jupyter編寫Python
Go through some Python tutorials and learn its fundamental data structures and syntax.
通過一些Python教程 ,學習其基本數據結構和語法。
2. Surround yourselves with Data Science. Create account at:
2.自己掌握數據科學。 在以下位置創建帳戶:
● Kaggle and checkout the kernels written by top data scientists. Kaggle helps you to lubricate and establish a standard workflow to adhere to any Data Science Problem
● Kaggle并簽出由頂級數據科學家編寫的內核。 Kaggle可幫助您潤滑并建立標準的工作流程以遵守任何數據科學問題
● Analytics Vidhya: This website is a goto place for many data scientists. This site boasts of a 4 million unique visitors per month and has a very active community.
● Analytics Vidhya :該網站是許多數據科學家的首選之地。 該網站每月擁有400萬唯一身份訪問者,并且擁有非常活躍的社區。
●Checkout YouTube pyData Channel. pyData is a conference arranged by the open source community to educate analysts with the latest developments in Data Science. This gives you
●結帳YouTube pyData Channel 。 pyData是一個由開源社區組織的會議,目的是教育分析人員了解數據科學的最新發展。 這給你
● Use podcasts to learn about the latest tools and technology in AI. Podcasts is a great way to spend time on your daily chores, be it jogging, to arranging your closet or while commuting. If you are new to podcasts, download the Podcast addict app onto your phone.
●使用播客了解AI中的最新工具和技術。 播客是一種在日常瑣事上花費時間的好方法,無論是慢跑,安排壁櫥還是上下班途中。 如果您不熟悉播客,請將播客上癮者應用程序下載到手機上。
Machine Learning — Software Engineering Daily | Every week Jeff interviews people from the heart of Data Science. It gives you the very rare early peek into what’s going on in silicon valley, helping you to get onto new techniques and technologies. It gives you so many new ideas to implement into your work. Can’t recommend this enough.
機器學習—軟件工程日報| 杰夫每周都會采訪來自數據科學中心的人們。 它為您提供了非常罕見的早期窺視硅谷動態的信息,可幫助您掌握新技術。 它為您提供了許多新想法,可以在您的工作中實施。 不能推薦這個。
● Medium
●中
Follow some of the Machine Learning publications here on Medium:
在Medium上關注一些機器學習出版物:
Towards Data Science
走向數據科學
Artificial Intelligence.
人工智能 。
● Go to Coursera and Edx, and check out the various Machine Learning courses available.
●轉到Coursera和Edx ,并查看可用的各種機器學習課程。
I will end this post with this quote by Robin Sharma:
我將以Robin Sharma的話作為結尾:
Every Pro was Once an Amateur.
每個職業選手都曾經是業余選手。
Every Expert was Once a Beginner.
每個專家都是初學者。
So Dream Big.
所以夢想大。
And Start Now.
并立即開始。
Please comment below to tell us why you are planning to start your Machine Learning journey, and how you plan to do so.
請在下面發表評論,以告訴我們您為何計劃開始您的機器學習之旅,以及您打算如何開始。
And for all you Machine Learning pros, give us the nuances of what works and what doesn’t. Please comment below on how you started your Machine Learning journey and what expedited and hindered your learning process.
對于所有機器學習專家來說,請告訴我們哪些有效和哪些無效。 請在下面評論您是如何開始機器學習之旅的,以及加速和阻礙學習過程的因素。
翻譯自: https://www.freecodecamp.org/news/baby-steps-to-learn-machine-learning-from-a-tennis-fan-d4171f51c23f/
《成為一名機器學習工程師》