數據暑假實習面試_面試數據科學實習如何準備

數據暑假實習面試

Unfortunately, on this occasion, your application was not successful, and we have appointed an applicant who…

不幸的是,這一次,您的申請沒有成功,我們已經任命了一位符合以下條件的申請人:

Sounds familiar, right? After all of these gruelling hours that I spend on the interview preparation, the rejection came after the rejection. Although I was passing the first few interview stages, it didn’t go that well for me during the face-to-face stages. “What a spectacular failure I am”, I thought.

聽起來很熟悉,對不對? 在我花了所有艱苦的時間進行面試準備之后,拒絕就被拒絕了。 盡管我已經通過了前幾個面試階段,但是在面對面階段對我來說進展并不順利。 我想:“我是多么的失敗。”

I started looking for ways to improve. I’ve identified a few areas that are usually overlooked but can potentially have a huge impact on what will be the interview outcome. This, in turn, helped me to improve and get a job that I wanted to have!

我開始尋找改善的方法。 我已經確定了一些通常被忽略的領域,但它們可能對面試結果產生巨大影響。 反過來,這幫助我改善了工作并獲得了想要的工作!

正確掌握基礎知識 (Get The Basics Right)

Image for post
Photo by Clay Banks on Unsplash
Clay Banks在Unsplash上拍攝的照片

The DS internships are usually quite competitive and any red flag for the recruiter might decide if you are rejected straightaway. One of these red flags is whether your foundations are good enough. Data science is a field where you are required to have good mathematical and programming knowledge.

DS實習生通常競爭激烈,招募人員的任何危險信號都可能決定您是否被直接拒絕。 這些危險信號之一是您的基礎是否足夠好。 數據科學是一個要求您具有良好數學和編程知識的領域。

How can you improve? For data science theory, I recommend getting a good mathematical understanding of the most common algorithms. There are two books that I usually recommend: Pattern Recognition and Machine Learning, and First Course in Machine Learning. Both of them contain in-depth mathematical explanations of machine learning algorithms which will help you smash DS interview questions to pieces!

您如何改善? 對于數據科學理論,我建議您對最常見的算法有一個很好的數學理解。 我通常推薦兩本書: 模式識別和機器學習 ,以及機器學習 第一門課程 。 它們都包含對機器學習算法的深入數學解釋,這將幫助您將DS面試問題粉碎成碎片!

Depending on the company, you might be also asked programming questions. They are often not that hard but given the stress and time constraints, you really need to master them as well. You should expect any questions from sorting, recurrence, to data structures. It’s good to start practicing these questions as soon as possible. To get a good understanding of how to approach the coding questions, I recommend going through the Cracking the Coding Interview book. To get more practical experience, visit the Hackerrank, or LeetCode.

根據公司的不同,可能還會詢問您編程方面的問題。 它們通常并不難,但是由于壓力和時間限制,您確實也需要掌握它們。 您應該期望從排序,重復出現到數據結構的任何問題。 最好盡快開始練習這些問題。 為了更好地理解編碼問題,我建議您閱讀《 破解編碼面試》一書。 要獲得更多實踐經驗,請訪問HackerrankLeetCode

Glassdoor是您最好的朋友 (Glassdoor is Your Best Friend)

You can also get a good feel of what is the company’s culture and atmosphere from the Glassdoor reviews. This can give you a good indication of whether that company is a good fit for you. If, for example, one company seems to have really toxic atmosphere maybe it would be better to withdraw the application and spend more time to prepare for interviews at other companies? What’s the point in interviewing with companies that you don’t really want to work for?

從Glassdoor的評論中,您還可以很好地了解公司的文化和氛圍。 這可以很好地表明該公司是否適合您。 例如,如果一家公司似乎真的有毒的氣氛,那么最好撤回申請并花更多時間準備在其他公司進行面試是否更好? 面試您真的不想工作的公司有什么意義?

You can also find some really useful information about the interview structure, or about the type of questions they ask. Some companies are literally asking the same set of questions every time! I am not sure why they are doing that, but in this case, you should notice that the questions are being repeated in the Glassdoor reviews. You can take it to your advantage and learn them by heart.

您還可以找到有關面試結構或他們提出的問題類型的一些非常有用的信息。 實際上,有些公司每次都在問同樣的問題! 我不確定他們為什么這樣做,但是在這種情況下,您應該注意到,Glassdoor審查中重復出現了這些問題。 您可以發揮自己的優勢,并認真學習。

容易的面試問題并不容易 (Easy Interview Questions are NOT Easy)

Image for post
Photo by Jules Bss on Unsplash
Jules Bss在Unsplash上拍攝的照片

Imagine a situation when the interviewer asks: what’s the linear regression?

想象一下,當面試官問:線性回歸是什么?

You can answer either:

您可以回答:

It is a linear approach that models the relationship in data between dependent and independent variables.

這是一種線性方法,可對因變量和自變量之間的數據關系進行建模。

Or:

要么:

It is a linear approach that models the relationship in data between dependent and independent variables. The model’s parameters can be derived using ordinary least squares approach and a general equation works on multi-dimensional data. It is an algorithm that is simple, fast, and interpretable. However, it has certain caveats such as …

這是一種線性方法,可對因變量和自變量之間的數據關系進行建模。 可以使用普通最小二乘法得出模型的參數,并且通用方程適用于多維數據。 它是一種簡單,快速且可解釋的算法。 但是,它有一些警告,例如……

Do you see what I mean? By asking a simple-looking question, the interviewer can test two things. Firstly, if you have a basic knowledge (obvious). Secondly, it tests what is the depth of your understanding and how inquisitive you are while studying a certain topic. This ability is crucial in the data scientist skillset as you will often have to work with new tools and read research papers. If you don’t analyze the subject thoroughly and fail to understand its limitations and capabilities, it’s a straight path that leads to an unsuccessful project.

你明白我的意思嗎? 通過問一個簡單的問題,面試官可以測試兩件事。 首先,如果您具有基本知識(顯而易見)。 其次,它測試您對特定主題的理解的深度和好奇心。 該功能對于數據科學家技能至關重要,因為您經常需要使用新工具并閱讀研究論文。 如果您沒有對主題進行全面分析,并且不了解主題的局限性和功能,那么這是導致項目失敗的直接途徑。

展示項目。 質量還是數量? (Showcase Projects. Quality or Quantity?)

TLDR; Quality!

TLDR; 質量!

Image for post
[Source][資源]

The painful truth is that nobody cares about the endless Jupyter notebooks that you created for your 100+ mini-projects. Don’t take me wrong: it’s still a great way to experiment with new models and data. But, most likely, it won’t impress the interviewer.

痛苦的事實是,沒有人會關心您為100多個迷你項目創建的無盡Jupyter筆記本。 不要誤會我的意思:這仍然是嘗試新模型和數據的好方法。 但是,很可能不會給面試官留下深刻的印象。

There is much more to data science than just creating dozens of untested machine learning models in a single file. In the real-life scenario, the code needs to be tested, packaged, documented and deployed using internal servers or cloud services.

數據科學不僅僅是在單個文件中創建數十個未經測試的機器學習模型,還具有更多的功能。 在實際場景中,需要使用內部服務器或云服務來測試,打包,記錄和部署代碼。

My advice? Go for the quality and aim to create ~3 bigger projects that will impress the interviewers. Here are some tips that you can follow:

我的建議? 追求質量 ,目標是創建?3個更大的項目,這些項目將使訪問員印象深刻 您可以按照以下提示操作:

  • Find a real-world dataset that requires a lot of preprocessing and EDA

    查找需要大量預處理和EDA的真實數據集
  • Make your code modular: create separate classes for models, data preprocessing, and end-to-end pipelines

    使代碼模塊化:為模型,數據預處理和端到端管道創建單獨的類
  • Use test-driven development (TDD) while developing a packaged code

    在開發打包的代碼時使用測試驅動的開發(TDD)

  • Work with Git and continuous integration services such as CircleCI

    與Git和持續集成服務(例如CircleCI)一起使用

  • Expose the model’s API to the user, e.g. Flask for Python

    向用戶公開模型的API,例如Flask for Python

  • Document the code using Sphinx and adhere to code styling guidelines (e.g. PEP-8 for Python)

    使用Sphinx記錄代碼并遵守代碼樣式準則(例如,用于Python的PEP-8 )

A really good course on ML model deployment was created by data scientists from Babylon Health and Train In Data at Udemy. You can find it here.

來自于Udemy的Babylon HealthTrain In Data的數據科學家創建了關于ML模型部署的非常好的課程。 你可以在這里找到它。

獎勵:簡歷模板 (Bonus: CV Template)

I am a big fan of 1-page CVs for data science internships. It helps me to keep it simple and clear without redundant information. I used to have a Word template in the past, but I was losing a lot of time modifying it. When I was removing or adding some information, the formatting was instantly blowing off making my CV look like the Enigma code 😆

我非常喜歡用于數據科學實習的1頁簡歷。 它可以幫助我在沒有多余信息的情況下保持簡單明了。 我過去曾經有一個Word模板,但是我浪費了很多時間來修改它。 當我刪除或添加一些信息時,格式立即消失,使我的簡歷看起來像Enigma代碼😆

Anyway, I found a nice looking Overleaf CV template that I’ve been using ever since. It is simple, clear, and most importantly, it’s rendered with a modular Latex code that makes formatting a painless task. The link to the CV template is here.

無論如何,我找到了自此以來一直在使用的漂亮的Overleaf CV模板。 它簡單,清晰,最重要的是,它使用模塊化的Latex代碼進行渲染,從而使格式化工作變得輕而易舉。 簡歷模板的鏈接在這里 。

關于我 (About Me)

I am an MSc Artificial Intelligence student at the University of Amsterdam. In my spare time, you can find me fiddling with data or debugging my deep learning model (I swear it worked!). I also like hiking :)

我是阿姆斯特丹大學的人工智能碩士研究生。 在業余時間,您會發現我不喜歡數據或調試我的深度學習模型(我發誓它能工作!)。 我也喜歡遠足:)

Here are my social media profiles, if you want to stay in touch with my latest articles and other useful content:

如果您想與我的最新文章和其他有用內容保持聯系,這是我的社交媒體個人資料:

  • Linkedin

    領英

  • Github

    Github

  • Personal Website

    個人網站

翻譯自: https://towardsdatascience.com/interviewing-for-data-science-internship-how-to-prepare-f6b9c2c7fa97

數據暑假實習面試

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389666.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389666.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389666.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

兩道簡單的入門題

1&#xff09;  for循環求100以內奇數和 1 #include<stdio.h> 2 int main(){ 3 int ans0;//定義一個答案變量存儲答案 4 for(int i1;i<100;i)//用for從1循環到100&#xff0c;如果i%2&#xff01;0&#xff08;%是一種取余運算&#xff09; 5 if(…

1716. 計算力扣銀行的錢

1716. 計算力扣銀行的錢 Hercy 想要為購買第一輛車存錢。他 每天 都往力扣銀行里存錢。 最開始&#xff0c;他在周一的時候存入 1 塊錢。從周二到周日&#xff0c;他每天都比前一天多存入 1 塊錢。在接下來每一個周一&#xff0c;他都會比 前一個周一 多存入 1 塊錢。 給你 …

谷歌 colab_如何在Google Colab上使用熊貓分析

谷歌 colabRecently, pandas have come up with an amazing open-source library called pandas-profiling. Generally, EDA starts by df.describe(), df.info() and etc which to be done separately. Pandas_profiling extends the general data frame report using a singl…

【題解】HAOI2007分割矩陣

水題盛宴啦啦啦……做起來真的極其舒服&#xff0c;比某些毒瘤題好太多了…… 數據范圍極小 --> 狀壓 / 搜索 / 高維度dp&#xff1b;觀察要求的均方差&#xff0c;開始考慮是不是能夠換一下式子。我們用\(a_{x}\)來表示第 \(x\) 個矩陣的總值&#xff0c;則式子為&#xff…

Java之生成Pdf并對Pdf內容操作

雖說網上有很多可以在線導出Pdf或者word或者轉成png等格式的工具&#xff0c;但是我覺得還是得了解知道是怎么實現的。一來&#xff0c;在線免費轉換工具&#xff0c;是有容量限制的&#xff0c;達到一定的容量時&#xff0c;是不能成功導出的;二來&#xff0c;業務需求&#x…

邊際概率條件概率_數據科學家解釋的邊際聯合和條件概率

邊際概率條件概率Probability plays a very important role in Data Science, as Data Scientist regularly attempt to draw statistical inferences that could be used to predict data or analyse data better.P robability起著數據科學非常重要的作用&#xff0c;為數據科…

1822. 數組元素積的符號

1822. 數組元素積的符號 已知函數 signFunc(x) 將會根據 x 的正負返回特定值&#xff1a; 如果 x 是正數&#xff0c;返回 1 。 如果 x 是負數&#xff0c;返回 -1 。 如果 x 是等于 0 &#xff0c;返回 0 。 給你一個整數數組 nums 。令 product 為數組 nums 中所有元素值的…

java并發編程實戰:第十四章----構建自定義的同步工具

一、狀態依賴性管理 對于單線程程序&#xff0c;某個條件為假&#xff0c;那么這個條件將永遠無法成真在并發程序中&#xff0c;基于狀態的條件可能會由于其他線程的操作而改變1 可阻塞的狀態依賴操作的結構2 3 acquire lock on object state4 while (precondition does not ho…

關于之前的函數式編程

之前寫的函數式編程是我從 JavaScript ES6 函數式編程入門經典這本書里面整理的&#xff0c;然后只在第一篇里專門提到了&#xff0c;后面的話沒有專門提到&#xff0c;而且引用了書中大量的文字&#xff0c;所以我把掘金這里的文章都刪除了&#xff0c;然后在 CSDN 上面每一篇…

袋裝決策樹_袋裝樹是每個數據科學家需要的機器學習算法

袋裝決策樹袋裝樹木介紹 (Introduction to Bagged Trees) Without diving into the specifics just yet, it’s important that you have some foundation understanding of decision trees.尚未深入研究細節&#xff0c;對決策樹有一定基礎了解就很重要。 From the evaluatio…

[JS 分析] 天_眼_查 字體文件

0. 參考 js分析 貓_眼_電_影 字體文件 font-face 1. 分析 1.1 定位目標元素 1.2 查看網頁源代碼 1.3 requests 請求提取得到大量錯誤信息 對比貓_眼_電_影抓取到unicode編碼&#xff0c;天_眼_查混合使用正常字體和自定義字體&#xff0c;難點在于如何從 紅 轉化為 美。 一開始…

深入學習Redis(4):哨兵

前言在 深入學習Redis&#xff08;3&#xff09;&#xff1a;主從復制 中曾提到&#xff0c;Redis主從復制的作用有數據熱備、負載均衡、故障恢復等&#xff1b;但主從復制存在的一個問題是故障恢復無法自動化。本文將要介紹的哨兵&#xff0c;它基于Redis主從復制&#xff0c;…

1805. 字符串中不同整數的數目

1805. 字符串中不同整數的數目 給你一個字符串 word &#xff0c;該字符串由數字和小寫英文字母組成。 請你用空格替換每個不是數字的字符。例如&#xff0c;“a123bc34d8ef34” 將會變成 " 123 34 8 34" 。注意&#xff0c;剩下的這些整數為&#xff08;相鄰彼此至…

經天測繪測量工具包_公共土地測量系統

經天測繪測量工具包部分-鄉鎮第一師 (Sections — First Divisions of Townships) The PLSS Townships are typically divided into 36 Sections (nominally one mile on a side), but in the national standard this feature is called the first division because Townships …

洛谷 P4012 深海機器人問題【費用流】

題目鏈接&#xff1a;https://www.luogu.org/problemnew/show/P4012 洛谷 P4012 深海機器人問題 輸入輸出樣例 輸入樣例#1&#xff1a; 1 1 2 2 1 2 3 4 5 6 7 2 8 10 9 3 2 0 0 2 2 2 輸出樣例#1&#xff1a; 42 說明 題解&#xff1a;建圖方法如下&#xff1a; 對于矩陣中的每…

day5 模擬用戶登錄

_user "yangtuo" _passwd "123456"# passd_authentication False #flag 標志位for i in range(3): #for 語句后面可以跟else&#xff0c;但是不能跟elifusername input("Username:")password input("Password:")if username _use…

opencv實現對象跟蹤_如何使用opencv跟蹤對象的距離和角度

opencv實現對象跟蹤介紹 (Introduction) Tracking the distance and angle of an object has many practical uses, especially in robotics. This tutorial explains how to get an accurate distance and angle measurement, even when the target is at a strong angle from…

spring cloud 入門系列七:基于Git存儲的分布式配置中心--Spring Cloud Config

我們前面接觸到的spring cloud組件都是基于Netflix的組件進行實現的&#xff0c;這次我們來看下spring cloud 團隊自己創建的一個全新項目&#xff1a;Spring Cloud Config.它用來為分布式系統中的基礎設施和微服務提供集中化的外部配置支持&#xff0c;分為服務端和客戶端兩個…

458. 可憐的小豬

458. 可憐的小豬 有 buckets 桶液體&#xff0c;其中 正好 有一桶含有毒藥&#xff0c;其余裝的都是水。它們從外觀看起來都一樣。為了弄清楚哪只水桶含有毒藥&#xff0c;你可以喂一些豬喝&#xff0c;通過觀察豬是否會死進行判斷。不幸的是&#xff0c;你只有 minutesToTest…

熊貓數據集_大熊貓數據框的5個基本操作

熊貓數據集Tips and Tricks for Data Science數據科學技巧與竅門 Pandas is a powerful and easy-to-use software library written in the Python programming language, and is used for data manipulation and analysis.Pandas是使用Python編程語言編寫的功能強大且易于使用…