微軟大數據

Microsoft was one of the software companies that come to hire interns at my university for 2021 summers. This year, it was the first time that Microsoft offered any Data Science Internship for pre-final year undergraduate students.

微軟是到2021年夏天來我大學招聘實習生的軟件公司之一。今年，這是微軟首次為預科本科生提供任何數據科學實習。

Microsoft set the requirements as follows:-

Microsoft將要求設置如下：

The student must have a minimum CGPA of 8.
學生的最低CGPA必須為8。
The student should be pursuing a Computer Science or Mathematics major.
該學生應攻讀計算機科學或數學專業。

All the eligible students had to fill the Internship application form on the Microsoft Career website with a resume. Students who had filled the application form received the test link within 1–2 days.

所有符合條件的學生都必須用簡歷填寫Microsoft Career網站上的實習申請表。填寫申請表的學生將在1-2天內收到測試鏈接。

在線測試： (Online Test:)

About 60–70 students give the test for the internship, conducted on the mettl platform. The duration of the test was 1 hour. The test consists of 62 Multiple Choice Questions, which touches almost every aspect of machine learning. There was no information about the marking scheme for the test.

在mettl平臺上進行的實習測試大約有60-70名學生。測試時間為1小時。該測驗包含62個多項選擇題，幾乎涵蓋了機器學習的各個方面。沒有有關測試標記方案的信息。

The key points takeaways from the online test were:

在線測試的要點是：

Questions ranged from various topics such as Linear Regression, Logistic Regression, SVM, Decision Trees, Random forests, Underfitting Overfitting, Bias, Variance, Bagging, Boosting, Clustering, Recommender Systems, PCA, LDA, and Neural Networks. There were some basic questions from Probability and Statistics.
問題涉及多個主題，例如線性回歸，邏輯回歸，SVM，決策樹，隨機森林，擬合不足的過擬合，偏差，方差，裝袋，增強，聚類，推薦系統，PCA，LDA和神經網絡。概率論和統計學有一些基本問題。
Most of the questions were conceptual, such as about the kernel function in the SVM or the central limit theorem.
大多數問題都是概念性的，例如關于SVM中的內核功能或中央極限定理。
There were fewer questions on Neural Networks, so the students were expected to be well-versed with traditional Machine Learning algorithms.
神經網絡上的問題較少，因此希望學生們精通傳統的機器學習算法。
There were no coding questions or questions like what is the correct code for this algorithm using sklearn etc.
沒有編碼問題或諸如使用sklearn等對該算法的正確代碼是什么的問題。

I was able to complete about 50 out of 62 questions in the 1 hour time.

我在1小時的時間內完成了62個問題中的50個。

Since I didn’t know much about Recommender Systems and LDA algorithms, so I wasn’t able to answer those questions in addition to questions on Convex optimization(about 2–3 in number).

由于我對Recommender系統和LDA算法了解不多，所以除了關于凸優化的問題(數量約為2-3)之外，我無法回答這些問題。

Microsoft didn’t release the exact results for the test but released a list of 6 students shortlisted for the interviews, including me!

微軟沒有公布測試的確切結果，但公布了入圍面試的6名學生的名單，其中包括我！

I had about a day to prepare for the interview and had no idea about a Data Science Interview. I took some help from seniors and revised the concepts asked during the online test(mostly traditional machine learning algorithms) from Stanford CS229 notes. In addition to this, I also reviewed everything about the projects on my resume.

我有大約一天的時間為面試做準備，但對數據科學面試一無所知。我從前輩那里獲得了一些幫助，并修改了斯坦福CS229筆記在在線測試(大多數是傳統的機器學習算法)中提出的概念。除此之外，我還在簡歷中回顧了有關項目的所有內容。

Interviews were taken online on the Microsoft Teams platform due to COVID-19, and there was a total of 3 rounds of technical interviews for each candidate.

由于COVID-19，面試是在Microsoft Teams平臺上進行的，每位候選人總共進行了3輪技術面試。

第1輪： (Round 1:)

At first, the interviewer asked me to introduce myself and speak about my interests in which I talked about my interests in computer vision.

最初，面試官讓我自我介紹并談論自己的興趣，其中我談到了我對計算機視覺的興趣。

I was asked the following questions:-

我被問到以下問題：

Explain the working of a convolutional layer and design a CNN for Image Classification? Explain the loss function, regularization, and activation function used for it?
解釋卷積層的工作并設計用于圖像分類的CNN？請解釋用于它的損失函數，正則化和激活函數嗎？
Explain the Decision Tree algorithm? Also, explain the bagging and boosting algorithm with Decision Trees? Explain the weighting function used in the boosting algorithm?
解釋決策樹算法？另外，用決策樹解釋裝袋和提升算法嗎？解釋提升算法中使用的加權函數？
Design a spam classification system? Also, explain the feature extraction, algorithm, and metrics used for evaluation.
設計垃圾郵件分類系統？另外，說明用于評估的特征提取，算法和度量。
Explain the in-depth working of Support Vector Machines(SVMs)? Also, explain the convex optimizations, kernel functions, and what is support vectors.
解釋支持向量機(SVM)的深入工作？另外，請解釋凸優化，核函數以及什么是支持向量。

I was able to answer all the questions except for the working of SVMs, in which I was able to explain up to margins and kernel functions but as not able to explain the convex optimization part. I explained the answers by illustrating the algorithms on a shared screen.

除了支持SVM的工作之外，我能夠回答所有問題，在SVM中，我最多可以解釋邊距和內核函數，但不能解釋凸優化部分。我通過在共享屏幕上顯示算法來解釋答案。

He then asked me if I have any questions. I then asked about some data science use cases in Microsoft. And the interview was over. The entire interview took about 45 minutes.

然后他問我是否有任何問題。然后，我詢問了Microsoft中的一些數據科學用例。采訪結束了。整個采訪耗時約45分鐘。

Three students made it to the second round, which took place after a couple of hours.

兩個小時后，三名學生進入了第二輪比賽。

I revised SVM during the time between the 1st and 2nd rounds.

我在第一輪和第二輪之間修改了SVM。

第二回合 (Round 2:)

This round was similar to round 1, but the interviewer asked a significant number of NLP(Natural Language Processing) questions.

該回合與第一回合相似，但面試官問了很多NLP(自然語言處理)問題。

The round starts similarly with introducing myself and my interests.

此輪以介紹自己和我的興趣類似地開始。

I was asked the following questions:-

我被問到以下問題：

What is the difference between bias and variance?
偏差和方差有什么區別？
Explain multiclass classification using Logistic Regression? Also, explain the softmax activation, cross-entropy loss, and write the equations for the same?
使用Logistic回歸解釋多類分類？另外，解釋softmax激活，交叉熵損失，并寫出相同的方程式嗎？
Explain the working of RNNs, GRUs, and LSTMs? Also, explain the pros and cons of each type of network? Also, explain why transformer-based models are better than these?
解釋RNN，GRU和LSTM的工作？另外，請解釋每種網絡的利弊？另外，請解釋為什么基于變壓器的模型比這些模型更好？
Explain the training procedure to obtain Glove embedding?
請解釋訓練程序以獲得手套嵌入？
Design a spam classification system? Also, explain the feature extraction, algorithm, and metrics used for evaluation?
設計垃圾郵件分類系統？另外，請解釋用于評估的特征提取，算法和指標？
Explain the in-depth working of Support Vector Machines(SVMs)? Also, explain the kernel functions? And how SVM classifies when there is no linear separation between different classes?
解釋支持向量機(SVM)的深入工作？另外，解釋內核功能嗎？當不同類別之間沒有線性分隔時，SVM如何分類？
Which algorithm should be used to extract Nouns from search engine queries? And explain why?
應該使用哪種算法從搜索引擎查詢中提取名詞？并解釋為什么？
Derive the equations for the forward and backward pass in a Linear Regression?
推導線性回歸中向前和向后通過的方程式？

I was able to answer most of the questions in the interview, except the mathematical equations involved in SVMs. The interviewer seemed satisfied with most of my answers. I explained the answers by illustrating the algorithms on a shared screen.

除了SVM中涉及的數學方程式，我能夠回答采訪中的大多數問題。面試官似乎對我的大部分回答感到滿意。我通過在共享屏幕上顯示算法來解釋答案。

She then asked me if I have any questions. I then asked the same question as round 1. The entire interview took about 45 minutes.

然后她問我是否有任何問題。然后，我問了與第1輪相同的問題。整個采訪耗時約45分鐘。

Round 3:

第三回合

The interviewer didn’t have a Data Science background, so he asked me questions on Data Structures & Algorithms. But he mentioned that it wouldn’t be hard since the interview was for a data science role.

面試官沒有數據科學背景，所以他問我有關數據結構和算法的問題。但他提到，由于面試是針對數據科學職位，所以這并不難。

The interview starts with the formal introduction, and he asked me to introduce myself as usual.

采訪從正式介紹開始，他讓我像往常一樣自我介紹。

I was asked the following questions:-

我被問到以下問題：

Given an array A=[a1,a2,a3…an,b1,b2,b3…bn], convert the array into the array B=[a1,b1,a2,b2…..an,bn] using only O(1) space.
給定數組A = [a1，a2，a3 ... an，b1，b2，b3 ... bn]，僅使用O()將數組轉換為數組B = [a1，b1，a2，b2 ..... an，bn] 1)空間。
In the previous question, given an index, in the array A, return the index it would have in array B.
在上一個問題中，給定索引，在數組A中返回數組B中應具有的索引。
You have an array of ‘2N’ elements consisting of ’N’ even, and ’N’ odd elements, using the minimum number of swaps make sure that even elements are at odd indexes and odd elements are at even indexes.
您有一個由'N'個偶數和'N'個奇數元素組成的'2N'個元素數組，使用最小數量的交換來確保偶數元素在奇數索引處，奇數元素在偶數索引處。
In the previous question, assume that the information about the number of even is equal to the number of odd elements is not given, so verify the same while using the minimum number of swaps and only in one iteration on the array.
在上一個問題中，假設沒有提供有關偶數等于奇數元素的信息，因此在使用最小交換次數并且僅在數組上進行一次迭代時，請驗證相同的信息。

I was not able to answer the first question correctly, so the interviewer modified it to 2nd question, which I answered correctly and coded in a shared screen. He seemed satisfied by the answer to the 2nd question.

我無法正確回答第一個問題，因此面試官將其修改為第二個問題，我回答正確并在共享屏幕中進行了編碼。他似乎對第二個問題的回答感到滿意。

He then asked me the 3rd question, which I answered using the 2-pointer technique, and I coded the solution after explaining to him. He seemed satisfied with the answer.

然后，他問了我第三個問題，我使用2指針技術回答了這個問題，在向他解釋后我對解決方案進行了編碼。他似乎對答案感到滿意。

The interviewer then modified the question to 4th question, for which I changed the loop and added some if-else statements in the loop, after which the interview discussed some edge cases in which the solution will fail, I then modified the code to accommodate edge cases. The interviewer seemed satisfied with the answer.

然后，采訪者將問題修改為第四個問題，為此我更改了循環，并在循環中添加了一些if-else語句，此后，采訪者討論了一些解決方案將失敗的邊緣情況，然后我修改了代碼以適應邊緣案件。面試官似乎對答案感到滿意。

He then asked if I have any questions, then I asked him about the work culture at Microsoft and the work he does at the company. After this, the interview was over. The whole interview took 45 minutes.

然后他問我是否有任何問題，然后我問他有關Microsoft的工作文化以及他在公司所做的工作。此后，采訪結束了。整個采訪耗時45分鐘。

Key takeaways:

關鍵要點：

It is crucial to understand the mathematical concepts behind the algorithm rather than treating it as black-box algorithms.
了解算法背后的數學概念而不是將其視為黑盒算法至關重要。
Having machine learning projects on your resume is a huge plus point since every other candidate had to explain their projects. Review your projects thoroughly.
在簡歷上擁有機器學習項目是一個巨大的優勢，因為其他所有候選人都必須解釋他們的項目。徹底檢查您的項目。
Have some decent practice of DSA questions. There might be some DSA rounds involved in the process. I was the only one to go through a DSA round among six candidates.
有一些體面的DSA問題練習。此過程可能涉及一些DSA回合。在六名候選人中，我是唯一一個參加DSA回合的人。
Read about some use-cases of machine learning in Industry, since most of data science interviews have these type of questions.
閱讀有關工業中機器學習的一些用例，因為大多數數據科學訪談都涉及這類問題。

結論： (Conclusions:)

I was very confident about my performance in the first two rounds but was a little unsure of my performance in the 3rd round since I was pretty weak in Data Structures and Algorithms.

我對前兩輪的表現非常有信心，但是由于我在數據結構和算法方面的能力很弱，因此對第三輪的表現有些不確定。

After three days, Microsoft declared the results for the internship position, and three students received the offer, and I was one of them!

三天后，微軟宣布了實習職位的結果，三名學生收到了錄取通知書， 我就是其中之一！

I will now intern at one of the offices at Microsoft India during May 2021-July 2021.

我現在將在2021年5月2021年7月間在Microsoft印度的一個辦事處實習。

翻譯自: https://towardsdatascience.com/my-data-science-interview-with-microsoft-6b7ec840b80e

微軟大數據

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/392187.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/392187.shtml
英文地址，請注明出處：http://en.pswp.cn/news/392187.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！