機器學習深度學習 ai_如何突破AI炒作成為機器學習工程師

機器學習深度學習 ai

I’m sure you’ve heard of the incredible artificial intelligence applications out there — from programs that can beat the world’s best Go players to self-driving cars.

我敢肯定，您已經聽說過令人難以置信的人工智能應用程序-從可以擊敗世界上最好的圍棋選手的程序到無人駕駛汽車。

The problem is that most people get caught up on the AI hype, mixing technical discussions with philosophical ones.

問題在于，大多數人都被AI炒作所吸引，將技術性討論與哲學性討論混為一談。

If you’re looking to cut through the AI hype and work with practically implemented data models, train towards a data engineer or machine learning engineer position.

如果您希望消除AI的炒作并使用實際實現的數據模型，請朝著數據工程師或機器學習工程師的方向培訓。

Don’t look for interesting AI applications within AI articles. Look for them in data engineering or machine learning tutorials.

不要在AI文章中尋找有趣的AI應用程序。在數據工程或機器學習教程中查找它們。

These are the steps I took to build this fun little scraper I built to analyze gender diversity in different coding bootcamps. It’s the path I took to do research for Springboard’s new AI/ML online bootcamp with job guarantee.

這些是我為構建這個有趣的小刮板而采取的步驟，該刮板是為分析不同編碼訓練營中的性別多樣性而構建的。這就是我為具有工作保障的 Springboard 新AI / ML在線訓練營進行研究的途徑。

Here’s a step-by-step guide to getting into the machine learning space with a critical set of resources attached to each one.

這是進入機器學習領域的分步指南，每個領域都有一組關鍵資源。

1.開始梳理您的Python和軟件開發實踐 (1. Start brushing up on your Python and software development practices)

You’ll want to start off by embracing Python, the language of choice for most machine learning engineers.

首先，您需要擁抱Python，這是大多數機器學習工程師的首選語言。

The handy scripting language is the tool of choice for most data engineers and data scientists. Most tools for data have been built in Python or have built API access for easy Python access.

方便的腳本語言是大多數數據工程師和數據科學家的首選工具。大多數數據工具都是使用Python構建的，或者已經構建了API訪問權限以方便Python訪問。

Thankfully, Python’s syntax is relatively easy to pick up. The language has tons of documentation and training resources. It also includes support for all sorts of programming paradigms from functional programming to object-oriented programming.

幸運的是，Python的語法相對容易掌握。該語言具有大量的文檔和培訓資源。它還包括對從功能編程到面向對象編程的各種編程范例的支持。

The one thing that might be a bit hard to pick up is the tabbing and spacing required to organize and activate your code. In Python, the whitespace really matters.

可能有點難以理解的一件事是組織和激活代碼所需的制表符和空格。在Python中，空白確實很重要。

As a machine learning engineer, you’d be working in a team to build complex, often mission-critical applications. So, now is a good time to refresh on software engineering best practices as well.

作為機器學習工程師，您將在一個團隊中構建復雜的，通常是關鍵任務的應用程序。因此，現在也是刷新軟件工程最佳實踐的好時機。

Learn to use collaborative tools such as Github. Get into the habit of writing thorough unit tests for your code using testing frameworks such as nose. Test your APIs using tools such as Postman. Use CI systems such as Jenkins to make sure your code doesn’t break. Develop good code review skills to work better with your future technical colleagues.

學習使用協作工具，例如Github。養成使用鼻子等測試框架為代碼編寫全面的單元測試的習慣。使用Postman等工具測試您的API。使用Jenkins等CI系統來確保您的代碼不會中斷。培養良好的代碼審查技能，以便與未來的技術同事更好地合作。

One thing to read: What is the best Python IDE for data science? Take a quick read-through so you can understand what toolset you want to work in to implement Python on datasets.

讀一件事 ：什么是數據科學最好的Python IDE？快速閱讀，以便您了解要在數據集上實現Python的工具集。

I use the Jupyter Notebook myself, since it comes pre-installed with most of the important data science libraries you’ll use. It comes with an easy, clean interactive interface that allows you to edit your code on the fly.

我自己使用Jupyter Notebook ，因為它已經預裝了您將要使用的大多數重要數據科學庫。它帶有一個簡單，干凈的交互式界面，使您可以即時編輯代碼。

Jupyter Notebook also comes with extensions that allow you to easily share your results with the world at large. The files generated are also super easy to work with on Github.

Jupyter Notebook還帶有擴展程序，使您可以輕松地與全世界共享您的結果。生成的文件在Github上也非常容易使用。

One thing to do: Pandas Cookbook allows you to fork into live examples of the Pandas framework, one of the most powerful data manipulation libraries. You can quickly work through an example of how to play with a dataset through it.

要做的一件事 ： Pandas Cookbook允許您進入Pandas框架的實時示例，該框架是功能最強大的數據處理庫之一。您可以快速查看一個如何通過它處理數據集的示例。

2.研究機器學習框架和理論 (2. Look into machine learning frameworks and theory)

Once you’re playing around with Python and practicing with it, it’s time to start looking at machine learning theory.

一旦您開始使用Python并進行了實踐，就該開始研究機器學習理論了。

You’ll learn what algorithms to use. Having a baseline knowledge of the theory behind machine learning will let you implement models with ease.

您將學習使用哪些算法。擁有機器學習背后的理論基礎知識，可以輕松實現模型。

One thing to read: A Tour of The Top Ten Algorithms For Machine Learning Newbies will help you get started with the basics. You’ll learn that there isn’t a “free lunch”. There is no algorithm that will give you the optimal result for each setting, so you’ll have to dive into each algorithm.

閱讀一件事 ：機器學習十大算法新手將幫助您入門基礎知識。您會發現這里沒有“免費午餐”。沒有一種算法可以為您提供每種設置的最佳結果，因此您必須深入研究每種算法。

One thing to do: Play around with the interactive Free Machine Learning in Python Course — develop your Python skills and start implementing algorithms.

一件事要做 ：在Python課程中體驗交互式的免費機器學習 -開發您的Python技能并開始實現算法。

3.開始使用數據集并進行實驗 (3. Start working with datasets and experimenting)

You’ve got the tools and theory under your belt. You should think about doing little mini-projects that can help you refine your skills.

您掌握了工具和理論。您應該考慮做一些小型項目，這些項目可以幫助您提高技能。

One thing to read: Take a look at 19 Free Public Data Sets for Your First Data Science Project and start looking at where you can find different datasets on the web to play around with.

要讀的一件事 ：為您的第一個數據科學項目查看19個免費公共數據集，然后開始查看可以在網上找到不同數據集的地方。

One thing to do: Kaggle Datasets will let you work with lots of publicly available datasets. What’s cool about this collection is you can see how popular certain datasets are. You can also see what other projects have been built with the same dataset.

要做的一件事 ： Kaggle數據集將使您可以處理許多公開可用的數據集。這個集合的優點是您可以看到某些數據集的受歡迎程度。您還可以查看使用相同數據集構建的其他項目。

4.利用Hadoop或Spark擴展數據技能 (4. Scale your data skills with Hadoop or Spark)

Now that you’re practicing on smaller datasets, you’ll want to learn how to work with Hadoop or Spark. Data engineers work with streaming, real-time production-level data at the terabyte and sometimes petabyte scale. Skill up by learning your way through a big data framework.

現在，您正在處理較小的數據集，您將需要學習如何使用Hadoop或Spark。數據工程師使用TB級(有時甚至PB級)的流式實時生產級數據。通過學習大數據框架來掌握技能。

One thing to read: This short article How do Hadoop and Spark Stack Up? will help you walk through both Hadoop and Spark and how they compare and contrast with one another.

閱讀一件事 ：這篇簡短的文章Hadoop和Spark如何堆疊？將幫助您遍歷Hadoop和Spark以及它們如何相互比較。

One thing to do: If you want to start working with a big data framework right away, Spark Jupyter notebooks hosted on Databricks offers a tutorial-level introduction to the framework, and gets you to practice with production-level code examples.

要做的一件事 ：如果您想立即開始使用大數據框架， Databricks上托管的Spark Jupyter筆記本會提供該框架的教程級介紹，并讓您練習生產級代碼示例。

5.使用TensorFlow等深度學習框架 (5. Work with a deep learning framework like TensorFlow)

You’re done exploring machine learning algorithms and working with the different big data tools out there.

您已經完成了機器學習算法的探索，并可以使用各種不同的大數據工具。

Now it’s time to take on the sort of powerful reinforcement learning that has been the focus of new advances. Learn the TensorFlow framework and you’ll be on the cutting edge of machine learning work.

現在是時候進行強大的強化學習，而這正是新進展的重點。學習TensorFlow框架，您將處在機器學習工作的最前沿。

One thing to read: Read What is TensorFlow? and understand what’s going on below-the-hood when it comes to this powerful deep learning framework.

要閱讀的一件事 ：閱讀什么是TensorFlow？并了解有關此強大的深度學習框架的內幕。

One thing to do: TensorFlow and Deep Learning without a PhD is an interactive course built by Google that combines theory placed into slides with practical labs with code.

要做的一件事 ： TensorFlow和沒有博士學位的深度學習是Google制作的一門互動課程，它將幻燈片中的理論與帶有代碼的實際實驗室相結合。

6.開始使用大型生產級數據集 (6. Start working with big production-level datasets)

Now that you’ve worked with deep learning frameworks, you can start working towards large production-level datasets.

既然您已經使用了深度學習框架，就可以開始處理大型生產級數據集。

As a machine learning engineer, you’ll be making complex engineering decisions on managing large amounts of data and deploying your systems.

作為機器學習工程師，您將在管理大量數據和部署系統方面做出復雜的工程決策。

That would include collecting data from APIs and web scraping, SQL + NoSQL databases and when you’d use them, use of pipeline frameworks such as Luigi or Airflow.

這將包括從API和Web抓取，SQL + NoSQL數據庫收集數據，以及在使用它們時使用諸如Luigi或Airflow之類的管道框架。

When you deploy your applications, you might use container-based systems such as Docker for scalability and reliability, and tools such as Flask to create APIs for your application.

部署應用程序時，可以使用基于容器的系統(例如Docker)來實現可伸縮性和可靠性，并使用工具(例如Flask)來為應用程序創建API。

One thing to read: 7 Ways to Handle Large Data Files for Machine Learning is a nice theoretical exercise into how you would handle big datasets, and can serve as a handy checklist of tactics to use.

要讀的一件事 ：處理機器學習的大數據文件的7種方法是一個很好的理論練習，介紹了如何處理大數據集，并且可以用作方便使用的策略清單。

One thing to do: Publicly Available Big Data Sets is a list of places where you can get very large datasets — ready to practice your newfound data engineering skills on.

要做的一件事 ：公開可用的大數據集是可以獲取非常大的數據集的位置的列表-準備練習新發現的數據工程技能。

7.練習，練習，練習，建立投資組合然后再工作 (7. Practice, practice, practice, build towards a portfolio and then a job)

Finally, you’ve gotten to a point where you can build working machine learning models. The next step to advance your machine learning career is to find a job with a company that holds those large datasets so you can apply your skills every day to a cutting-edge machine learning problem.

最后，您到了可以構建有效的機器學習模型的地步。推進機器學習事業的下一步是在擁有大量數據集的公司中找到工作，以便您每天可以將自己的技能應用于前沿的機器學習問題。

One thing to read: 41 Essential Machine Learning Interview Questions (with answers) will help you practice the knowledge you need to ace a machine learning interview.

要讀的一件事 ： 41必備的機器學習面試問題(包括答案??)將幫助您練習掌握機器學習面試所需的知識。

One thing to do: Go out and find meetups that are dedicated to machine learning or data engineering on Meetup — it’s a great way to meet peers in the space and potential hiring managers.

要做的一件事 ：出去玩，在Meetup上找到專門用于機器學習或數據工程的聚會–這是結識空間中的同行和潛在招聘經理的好方法。

Hopefully, this tutorial has helped cut through the hype around AI to something practical and tailored that you can use. If you feel like you need a little bit more, the company I work with, Springboard, offers a career track bootcamp dedicated to AI and machine learning with a job guarantee, and 1:1 mentorship from machine learning experts.

希望本教程有助于將圍繞AI的炒作切入您可以使用的實用且量身定制的內容。如果您覺得需要更多一點，與我合作的公司Springboard會提供專門針對AI和機器學習的職業訓練營，并提供工作保證，并由機器學習專家提供1：1指導。