數據暑假實習面試
Unfortunately, on this occasion, your application was not successful, and we have appointed an applicant who…
不幸的是,這一次,您的申請沒有成功,我們已經任命了一位符合以下條件的申請人:
Sounds familiar, right? After all of these gruelling hours that I spend on the interview preparation, the rejection came after the rejection. Although I was passing the first few interview stages, it didn’t go that well for me during the face-to-face stages. “What a spectacular failure I am”, I thought.
聽起來很熟悉,對不對? 在我花了所有艱苦的時間進行面試準備之后,拒絕就被拒絕了。 盡管我已經通過了前幾個面試階段,但是在面對面階段對我來說進展并不順利。 我想:“我是多么的失敗。”
I started looking for ways to improve. I’ve identified a few areas that are usually overlooked but can potentially have a huge impact on what will be the interview outcome. This, in turn, helped me to improve and get a job that I wanted to have!
我開始尋找改善的方法。 我已經確定了一些通常被忽略的領域,但它們可能對面試結果產生巨大影響。 反過來,這幫助我改善了工作并獲得了想要的工作!
正確掌握基礎知識 (Get The Basics Right)

The DS internships are usually quite competitive and any red flag for the recruiter might decide if you are rejected straightaway. One of these red flags is whether your foundations are good enough. Data science is a field where you are required to have good mathematical and programming knowledge.
DS實習生通常競爭激烈,招募人員的任何危險信號都可能決定您是否被直接拒絕。 這些危險信號之一是您的基礎是否足夠好。 數據科學是一個要求您具有良好數學和編程知識的領域。
How can you improve? For data science theory, I recommend getting a good mathematical understanding of the most common algorithms. There are two books that I usually recommend: Pattern Recognition and Machine Learning, and First Course in Machine Learning. Both of them contain in-depth mathematical explanations of machine learning algorithms which will help you smash DS interview questions to pieces!
您如何改善? 對于數據科學理論,我建議您對最常見的算法有一個很好的數學理解。 我通常推薦兩本書: 模式識別和機器學習 ,以及機器學習 第一門課程 。 它們都包含對機器學習算法的深入數學解釋,這將幫助您將DS面試問題粉碎成碎片!
Depending on the company, you might be also asked programming questions. They are often not that hard but given the stress and time constraints, you really need to master them as well. You should expect any questions from sorting, recurrence, to data structures. It’s good to start practicing these questions as soon as possible. To get a good understanding of how to approach the coding questions, I recommend going through the Cracking the Coding Interview book. To get more practical experience, visit the Hackerrank, or LeetCode.
根據公司的不同,可能還會詢問您編程方面的問題。 它們通常并不難,但是由于壓力和時間限制,您確實也需要掌握它們。 您應該期望從排序,重復出現到數據結構的任何問題。 最好盡快開始練習這些問題。 為了更好地理解編碼問題,我建議您閱讀《 破解編碼面試》一書。 要獲得更多實踐經驗,請訪問Hackerrank或LeetCode 。
Glassdoor是您最好的朋友 (Glassdoor is Your Best Friend)
You can also get a good feel of what is the company’s culture and atmosphere from the Glassdoor reviews. This can give you a good indication of whether that company is a good fit for you. If, for example, one company seems to have really toxic atmosphere maybe it would be better to withdraw the application and spend more time to prepare for interviews at other companies? What’s the point in interviewing with companies that you don’t really want to work for?
從Glassdoor的評論中,您還可以很好地了解公司的文化和氛圍。 這可以很好地表明該公司是否適合您。 例如,如果一家公司似乎真的有毒的氣氛,那么最好撤回申請并花更多時間準備在其他公司進行面試是否更好? 面試您真的不想工作的公司有什么意義?
You can also find some really useful information about the interview structure, or about the type of questions they ask. Some companies are literally asking the same set of questions every time! I am not sure why they are doing that, but in this case, you should notice that the questions are being repeated in the Glassdoor reviews. You can take it to your advantage and learn them by heart.
您還可以找到有關面試結構或他們提出的問題類型的一些非常有用的信息。 實際上,有些公司每次都在問同樣的問題! 我不確定他們為什么這樣做,但是在這種情況下,您應該注意到,Glassdoor審查中重復出現了這些問題。 您可以發揮自己的優勢,并認真學習。
容易的面試問題并不容易 (Easy Interview Questions are NOT Easy)
Imagine a situation when the interviewer asks: what’s the linear regression?
想象一下,當面試官問:線性回歸是什么?
You can answer either:
您可以回答:
It is a linear approach that models the relationship in data between dependent and independent variables.
這是一種線性方法,可對因變量和自變量之間的數據關系進行建模。
Or:
要么:
It is a linear approach that models the relationship in data between dependent and independent variables. The model’s parameters can be derived using ordinary least squares approach and a general equation works on multi-dimensional data. It is an algorithm that is simple, fast, and interpretable. However, it has certain caveats such as …
這是一種線性方法,可對因變量和自變量之間的數據關系進行建模。 可以使用普通最小二乘法得出模型的參數,并且通用方程適用于多維數據。 它是一種簡單,快速且可解釋的算法。 但是,它有一些警告,例如……
Do you see what I mean? By asking a simple-looking question, the interviewer can test two things. Firstly, if you have a basic knowledge (obvious). Secondly, it tests what is the depth of your understanding and how inquisitive you are while studying a certain topic. This ability is crucial in the data scientist skillset as you will often have to work with new tools and read research papers. If you don’t analyze the subject thoroughly and fail to understand its limitations and capabilities, it’s a straight path that leads to an unsuccessful project.
你明白我的意思嗎? 通過問一個簡單的問題,面試官可以測試兩件事。 首先,如果您具有基本知識(顯而易見)。 其次,它測試您對特定主題的理解的深度和好奇心。 該功能對于數據科學家技能至關重要,因為您經常需要使用新工具并閱讀研究論文。 如果您沒有對主題進行全面分析,并且不了解主題的局限性和功能,那么這是導致項目失敗的直接途徑。
展示項目。 質量還是數量? (Showcase Projects. Quality or Quantity?)
TLDR; Quality!
TLDR; 質量!

The painful truth is that nobody cares about the endless Jupyter notebooks that you created for your 100+ mini-projects. Don’t take me wrong: it’s still a great way to experiment with new models and data. But, most likely, it won’t impress the interviewer.
痛苦的事實是,沒有人會關心您為100多個迷你項目創建的無盡Jupyter筆記本。 不要誤會我的意思:這仍然是嘗試新模型和數據的好方法。 但是,很可能不會給面試官留下深刻的印象。
There is much more to data science than just creating dozens of untested machine learning models in a single file. In the real-life scenario, the code needs to be tested, packaged, documented and deployed using internal servers or cloud services.
數據科學不僅僅是在單個文件中創建數十個未經測試的機器學習模型,還具有更多的功能。 在實際場景中,需要使用內部服務器或云服務來測試,打包,記錄和部署代碼。
My advice? Go for the quality and aim to create ~3 bigger projects that will impress the interviewers. Here are some tips that you can follow:
我的建議? 追求質量 ,目標是創建?3個更大的項目,這些項目將使訪問員印象深刻。 您可以按照以下提示操作:
- Find a real-world dataset that requires a lot of preprocessing and EDA 查找需要大量預處理和EDA的真實數據集
- Make your code modular: create separate classes for models, data preprocessing, and end-to-end pipelines 使代碼模塊化:為模型,數據預處理和端到端管道創建單獨的類
Use test-driven development (TDD) while developing a packaged code
在開發打包的代碼時使用測試驅動的開發(TDD)
Work with Git and continuous integration services such as CircleCI
與Git和持續集成服務(例如CircleCI)一起使用
Expose the model’s API to the user, e.g. Flask for Python
向用戶公開模型的API,例如Flask for Python
Document the code using Sphinx and adhere to code styling guidelines (e.g. PEP-8 for Python)
使用Sphinx記錄代碼并遵守代碼樣式準則(例如,用于Python的PEP-8 )
A really good course on ML model deployment was created by data scientists from Babylon Health and Train In Data at Udemy. You can find it here.
來自于Udemy的Babylon Health和Train In Data的數據科學家創建了關于ML模型部署的非常好的課程。 你可以在這里找到它。
獎勵:簡歷模板 (Bonus: CV Template)
I am a big fan of 1-page CVs for data science internships. It helps me to keep it simple and clear without redundant information. I used to have a Word template in the past, but I was losing a lot of time modifying it. When I was removing or adding some information, the formatting was instantly blowing off making my CV look like the Enigma code 😆
我非常喜歡用于數據科學實習的1頁簡歷。 它可以幫助我在沒有多余信息的情況下保持簡單明了。 我過去曾經有一個Word模板,但是我浪費了很多時間來修改它。 當我刪除或添加一些信息時,格式立即消失,使我的簡歷看起來像Enigma代碼😆
Anyway, I found a nice looking Overleaf CV template that I’ve been using ever since. It is simple, clear, and most importantly, it’s rendered with a modular Latex code that makes formatting a painless task. The link to the CV template is here.
無論如何,我找到了自此以來一直在使用的漂亮的Overleaf CV模板。 它簡單,清晰,最重要的是,它使用模塊化的Latex代碼進行渲染,從而使格式化工作變得輕而易舉。 簡歷模板的鏈接在這里 。
關于我 (About Me)
I am an MSc Artificial Intelligence student at the University of Amsterdam. In my spare time, you can find me fiddling with data or debugging my deep learning model (I swear it worked!). I also like hiking :)
我是阿姆斯特丹大學的人工智能碩士研究生。 在業余時間,您會發現我不喜歡數據或調試我的深度學習模型(我發誓它能工作!)。 我也喜歡遠足:)
Here are my social media profiles, if you want to stay in touch with my latest articles and other useful content:
如果您想與我的最新文章和其他有用內容保持聯系,這是我的社交媒體個人資料:
Linkedin
領英
Github
Github
Personal Website
個人網站
翻譯自: https://towardsdatascience.com/interviewing-for-data-science-internship-how-to-prepare-f6b9c2c7fa97
數據暑假實習面試
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389666.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389666.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389666.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!