微軟大數據_我對Microsoft的數據科學采訪

微軟大數據

Microsoft was one of the software companies that come to hire interns at my university for 2021 summers. This year, it was the first time that Microsoft offered any Data Science Internship for pre-final year undergraduate students.

微軟是到2021年夏天來我大學招聘實習生的軟件公司之一。 今年,這是微軟首次為預科本科生提供任何數據科學實習。

Microsoft set the requirements as follows:-

Microsoft將要求設置如下:

  1. The student must have a minimum CGPA of 8.

    學生的最低CGPA必須為8。
  2. The student should be pursuing a Computer Science or Mathematics major.

    該學生應攻讀計算機科學或數學專業。

All the eligible students had to fill the Internship application form on the Microsoft Career website with a resume. Students who had filled the application form received the test link within 1–2 days.

所有符合條件的學生都必須用簡歷填寫Microsoft Career網站上的實習申請表。 填寫申請表的學生將在1-2天內收到測試鏈接。

在線測試: (Online Test:)

About 60–70 students give the test for the internship, conducted on the mettl platform. The duration of the test was 1 hour. The test consists of 62 Multiple Choice Questions, which touches almost every aspect of machine learning. There was no information about the marking scheme for the test.

在mettl平臺上進行的實習測試大約有60-70名學生。 測試時間為1小時。 該測驗包含62個多項選擇題,幾乎涵蓋了機器學習的各個方面。 沒有有關測試標記方案的信息。

The key points takeaways from the online test were:

在線測試的要點是:

  1. Questions ranged from various topics such as Linear Regression, Logistic Regression, SVM, Decision Trees, Random forests, Underfitting Overfitting, Bias, Variance, Bagging, Boosting, Clustering, Recommender Systems, PCA, LDA, and Neural Networks. There were some basic questions from Probability and Statistics.

    問題涉及多個主題,例如線性回歸,邏輯回歸,SVM,決策樹,隨機森林,擬合不足的過擬合,偏差,方差,裝袋,增強,聚類,推薦系統,PCA,LDA和神經網絡。 概率論和統計學有一些基本問題。
  2. Most of the questions were conceptual, such as about the kernel function in the SVM or the central limit theorem.

    大多數問題都是概念性的,例如關于SVM中的內核功能或中央極限定理。
  3. There were fewer questions on Neural Networks, so the students were expected to be well-versed with traditional Machine Learning algorithms.

    神經網絡上的問題較少,因此希望學生們精通傳統的機器學習算法。
  4. There were no coding questions or questions like what is the correct code for this algorithm using sklearn etc.

    沒有編碼問題或諸如使用sklearn等對該算法的正確代碼是什么的問題。

I was able to complete about 50 out of 62 questions in the 1 hour time.

我在1小時的時間內完成了62個問題中的50個。

Since I didn’t know much about Recommender Systems and LDA algorithms, so I wasn’t able to answer those questions in addition to questions on Convex optimization(about 2–3 in number).

由于我對Recommender系統和LDA算法了解不多,所以除了關于凸優化的問題(數量約為2-3)之外,我無法回答這些問題。

Microsoft didn’t release the exact results for the test but released a list of 6 students shortlisted for the interviews, including me!

微軟沒有公布測試的確切結果,但公布了入圍面試的6名學生的名單,其中包括我!

I had about a day to prepare for the interview and had no idea about a Data Science Interview. I took some help from seniors and revised the concepts asked during the online test(mostly traditional machine learning algorithms) from Stanford CS229 notes. In addition to this, I also reviewed everything about the projects on my resume.

我有大約一天的時間為面試做準備,但對數據科學面試一無所知。 我從前輩那里獲得了一些幫助,并修改了斯坦福CS229筆記在在線測試(大多數是傳統的機器學習算法)中提出的概念。 除此之外,我還在簡歷中回顧了有關項目的所有內容。

Interviews were taken online on the Microsoft Teams platform due to COVID-19, and there was a total of 3 rounds of technical interviews for each candidate.

由于COVID-19,面試是在Microsoft Teams平臺上進行的,每位候選人總共進行了3輪技術面試。

第1輪: (Round 1:)

At first, the interviewer asked me to introduce myself and speak about my interests in which I talked about my interests in computer vision.

最初,面試官讓我自我介紹并談論自己的興趣,其中我談到了我對計算機視覺的興趣。

I was asked the following questions:-

我被問到以下問題:

  1. Explain the working of a convolutional layer and design a CNN for Image Classification? Explain the loss function, regularization, and activation function used for it?

    解釋卷積層的工作并設計用于圖像分類的CNN? 請解釋用于它的損失函數,正則化和激活函數嗎?
  2. Explain the Decision Tree algorithm? Also, explain the bagging and boosting algorithm with Decision Trees? Explain the weighting function used in the boosting algorithm?

    解釋決策樹算法? 另外,用決策樹解釋裝袋和提升算法嗎? 解釋提升算法中使用的加權函數?
  3. Design a spam classification system? Also, explain the feature extraction, algorithm, and metrics used for evaluation.

    設計垃圾郵件分類系統? 另外,說明用于評估的特征提取,算法和度量。
  4. Explain the in-depth working of Support Vector Machines(SVMs)? Also, explain the convex optimizations, kernel functions, and what is support vectors.

    解釋支持向量機(SVM)的深入工作? 另外,請解釋凸優化,核函數以及什么是支持向量。

I was able to answer all the questions except for the working of SVMs, in which I was able to explain up to margins and kernel functions but as not able to explain the convex optimization part. I explained the answers by illustrating the algorithms on a shared screen.

除了支持SVM的工作之外,我能夠回答所有問題,在SVM中,我最多可以解釋邊距和內核函數,但不能解釋凸優化部分。 我通過在共享屏幕上顯示算法來解釋答案。

He then asked me if I have any questions. I then asked about some data science use cases in Microsoft. And the interview was over. The entire interview took about 45 minutes.

然后他問我是否有任何問題。 然后,我詢問了Microsoft中的一些數據科學用例。 采訪結束了。 整個采訪耗時約45分鐘。

Three students made it to the second round, which took place after a couple of hours.

兩個小時后,三名學生進入了第二輪比賽。

I revised SVM during the time between the 1st and 2nd rounds.

我在第一輪和第二輪之間修改了SVM。

第二回合 (Round 2:)

This round was similar to round 1, but the interviewer asked a significant number of NLP(Natural Language Processing) questions.

該回合與第一回合相似,但面試官問了很多NLP(自然語言處理)問題。

The round starts similarly with introducing myself and my interests.

此輪以介紹自己和我的興趣類似地開始。

I was asked the following questions:-

我被問到以下問題:

  1. What is the difference between bias and variance?

    偏差和方差有什么區別?
  2. Explain multiclass classification using Logistic Regression? Also, explain the softmax activation, cross-entropy loss, and write the equations for the same?

    使用Logistic回歸解釋多類分類? 另外,解釋softmax激活,交叉熵損失,并寫出相同的方程式嗎?
  3. Explain the working of RNNs, GRUs, and LSTMs? Also, explain the pros and cons of each type of network? Also, explain why transformer-based models are better than these?

    解釋RNN,GRU和LSTM的工作? 另外,請解釋每種網絡的利弊? 另外,請解釋為什么基于變壓器的模型比這些模型更好?
  4. Explain the training procedure to obtain Glove embedding?

    請解釋訓練程序以獲得手套嵌入?
  5. Design a spam classification system? Also, explain the feature extraction, algorithm, and metrics used for evaluation?

    設計垃圾郵件分類系統? 另外,請解釋用于評估的特征提取,算法和指標?
  6. Explain the in-depth working of Support Vector Machines(SVMs)? Also, explain the kernel functions? And how SVM classifies when there is no linear separation between different classes?

    解釋支持向量機(SVM)的深入工作? 另外,解釋內核功能嗎? 當不同類別之間沒有線性分隔時,SVM如何分類?
  7. Which algorithm should be used to extract Nouns from search engine queries? And explain why?

    應該使用哪種算法從搜索引擎查詢中提取名詞? 并解釋為什么?
  8. Derive the equations for the forward and backward pass in a Linear Regression?

    推導線性回歸中向前和向后通過的方程式?

I was able to answer most of the questions in the interview, except the mathematical equations involved in SVMs. The interviewer seemed satisfied with most of my answers. I explained the answers by illustrating the algorithms on a shared screen.

除了SVM中涉及的數學方程式,我能夠回答采訪中的大多數問題。 面試官似乎對我的大部分回答感到滿意。 我通過在共享屏幕上顯示算法來解釋答案。

She then asked me if I have any questions. I then asked the same question as round 1. The entire interview took about 45 minutes.

然后她問我是否有任何問題。 然后,我問了與第1輪相同的問題。整個采訪耗時約45分鐘。

Round 3:

第三回合

The interviewer didn’t have a Data Science background, so he asked me questions on Data Structures & Algorithms. But he mentioned that it wouldn’t be hard since the interview was for a data science role.

面試官沒有數據科學背景,所以他問我有關數據結構和算法的問題。 但他提到,由于面試是針對數據科學職位,所以這并不難。

The interview starts with the formal introduction, and he asked me to introduce myself as usual.

采訪從正式介紹開始,他讓我像往常一樣自我介紹。

I was asked the following questions:-

我被問到以下問題:

  1. Given an array A=[a1,a2,a3…an,b1,b2,b3…bn], convert the array into the array B=[a1,b1,a2,b2…..an,bn] using only O(1) space.

    給定數組A = [a1,a2,a3 ... an,b1,b2,b3 ... bn],僅使用O()將數組轉換為數組B = [a1,b1,a2,b2 ..... an,bn] 1)空間。
  2. In the previous question, given an index, in the array A, return the index it would have in array B.

    在上一個問題中,給定索引,在數組A中返回數組B中應具有的索引。
  3. You have an array of ‘2N’ elements consisting of ’N’ even, and ’N’ odd elements, using the minimum number of swaps make sure that even elements are at odd indexes and odd elements are at even indexes.

    您有一個由'N'個偶數和'N'個奇數元素組成的'2N'個元素數組,使用最小數量的交換來確保偶數元素在奇數索引處,奇數元素在偶數索引處。
  4. In the previous question, assume that the information about the number of even is equal to the number of odd elements is not given, so verify the same while using the minimum number of swaps and only in one iteration on the array.

    在上一個問題中,假設沒有提供有關偶數等于奇數元素的信息,因此在使用最小交換次數并且僅在數組上進行一次迭代時,請驗證相同的信息。

I was not able to answer the first question correctly, so the interviewer modified it to 2nd question, which I answered correctly and coded in a shared screen. He seemed satisfied by the answer to the 2nd question.

我無法正確回答第一個問題,因此面試官將其修改為第二個問題,我回答正確并在共享屏幕中進行了編碼。 他似乎對第二個問題的回答感到滿意。

He then asked me the 3rd question, which I answered using the 2-pointer technique, and I coded the solution after explaining to him. He seemed satisfied with the answer.

然后,他問了我第三個問題,我使用2指針技術回答了這個問題,在向他解釋后我對解決方案進行了編碼。 他似乎對答案感到滿意。

The interviewer then modified the question to 4th question, for which I changed the loop and added some if-else statements in the loop, after which the interview discussed some edge cases in which the solution will fail, I then modified the code to accommodate edge cases. The interviewer seemed satisfied with the answer.

然后,采訪者將問題修改為第四個問題,為此我更改了循環,并在循環中添加了一些if-else語句,此后,采訪者討論了一些解決方案將失敗的邊緣情況,然后我修改了代碼以適應邊緣案件。 面試官似乎對答案感到滿意。

He then asked if I have any questions, then I asked him about the work culture at Microsoft and the work he does at the company. After this, the interview was over. The whole interview took 45 minutes.

然后他問我是否有任何問題,然后我問他有關Microsoft的工作文化以及他在公司所做的工作。 此后,采訪結束了。 整個采訪耗時45分鐘。

Key takeaways:

關鍵要點:

  1. It is crucial to understand the mathematical concepts behind the algorithm rather than treating it as black-box algorithms.

    了解算法背后的數學概念而不是將其視為黑盒算法至關重要。
  2. Having machine learning projects on your resume is a huge plus point since every other candidate had to explain their projects. Review your projects thoroughly.

    在簡歷上擁有機器學習項目是一個巨大的優勢,因為其他所有候選人都必須解釋他們的項目。 徹底檢查您的項目。
  3. Have some decent practice of DSA questions. There might be some DSA rounds involved in the process. I was the only one to go through a DSA round among six candidates.

    有一些體面的DSA問題練習。 此過程可能涉及一些DSA回合。 在六名候選人中,我是唯一一個參加DSA回合的人。
  4. Read about some use-cases of machine learning in Industry, since most of data science interviews have these type of questions.

    閱讀有關工業中機器學習的一些用例,因為大多數數據科學訪談都涉及這類問題。

結論: (Conclusions:)

I was very confident about my performance in the first two rounds but was a little unsure of my performance in the 3rd round since I was pretty weak in Data Structures and Algorithms.

我對前兩輪的表現非常有信心,但是由于我在數據結構和算法方面的能力很弱,因此對第三輪的表現有些不確定。

After three days, Microsoft declared the results for the internship position, and three students received the offer, and I was one of them!

三天后,微軟宣布了實習職位的結果,三名學生收到了錄取通知書, 我就是其中之一!

I will now intern at one of the offices at Microsoft India during May 2021-July 2021.

我現在將在2021年5月2021年7月間在Microsoft印度的一個辦事處實習。

翻譯自: https://towardsdatascience.com/my-data-science-interview-with-microsoft-6b7ec840b80e

微軟大數據

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/392187.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/392187.shtml
英文地址,請注明出處:http://en.pswp.cn/news/392187.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

再次檢查打印機名稱 并確保_我們的公司名稱糟透了。 這是確保您沒有的方法。...

再次檢查打印機名稱 并確保by Dawid Cedrych通過戴維德塞德里奇 我們的公司名稱糟透了。 這是確保您沒有的方法。 (Our company name sucked. Here’s how to make sure yours doesn’t.) It is harder than one might think to find a good business name. Paul Graham of Y …

linux中文本查找命令,Linux常用的文本查找命令 find

一、常用的文本查找命令grep、egrep命令grep:文本搜索工具,根據用戶指定的文本模式對目標文件進行逐行搜索,先是能夠被模式匹配到的行。后面跟正則表達式,讓grep工具相當強大。-E之后還支持擴展的正則表達式。# grep [options] …

分布與并行計算—日志挖掘(Java)

日志挖掘——處理數據、計費統計 1、讀取附件中日志的內容,找出自己學號停車場中對應的進出車次數(in/out配對的記錄數,1條in、1條out,視為一個車次,本日志中in/out為一一對應,不存在缺失某條進或出記錄&a…

《人人都該買保險》讀書筆記

內容目錄: 1.你必須知道的保險知識 2.家庭理財的必需品 3.保障型保險產品 4.儲蓄型保險產品 5.投資型保險產品 6.明明白白買保險 現在我所在的公司Manulife是一家金融保險公司,主打業務就是保險,因此我需要熟悉一下保險的基礎知識&#xff0c…

Linux下查看txt文檔

當我們在使用Window操作系統的時候,可能使用最多的文本格式就是txt了,可是當我們將Window平臺下的txt文本文檔復制到Linux平臺下查看時,發現原來的中文所有變成了亂碼。沒錯, 引起這個結果的原因就是兩個平臺下,編輯器…

如何擊敗騰訊_擊敗股市

如何擊敗騰訊個人項目 (Personal Proyects) Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an…

滑塊 組件_組件制作:如何使用鏈接的輸入創建滑塊

滑塊 組件by Robin Sandborg羅賓桑德伯格(Robin Sandborg) 組件制作:如何使用鏈接的輸入創建滑塊 (Component crafting: how to create a slider with a linked input) Here at Stacc, we’re huge fans of React and the render-props pattern. When it came time…

配置靜態IPV6 NAT-PT

一.概述: IPV6 NAT-PT( Network Address Translation - Port Translation)應用與ipv4和ipv6網絡互訪的情況,根據參考鏈接配置時出現一些問題,所以記錄下來。參考鏈接:http://www.cisco.com/en/US/tech/tk648/tk361/technologies_c…

linux 線程與進程 pid,linux下線程所屬進程號問題

這一段看《unix環境高級編程》,一個關于線程的小例子。#include#include#includepthread_t ntid;void printids(const char *s){pid_t pid;pthread_t tid;pidgetpid();tidpthread_self();printf("%s pid %u tid %u (0x%x)n",s,(unsigned int)pid,(unsigne…

python3虛擬環境中解決 ModuleNotFoundError: No module named '_ssl'

前提是已經安裝了openssl 問題 當我在python3虛擬環境中導入ssl模塊時報錯,報錯如下: (py3) [rootlocalhost Python-3.6.3]# python3 Python 3.6.3 (default, Nov 19 2018, 14:18:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux Type "help…

python 使用c模塊_您可能沒有使用(但應該使用)的很棒的Python模塊

python 使用c模塊by Adam Goldschmidt亞當戈德施密特(Adam Goldschmidt) 您可能沒有使用(但應該使用)的很棒的Python模塊 (Awesome Python modules you probably aren’t using (but should be)) Python is a beautiful language, and it contains many built-in modules that…

分布與并行計算—生產者消費者模型實現(Java)

在實際的軟件開發過程中,經常會碰到如下場景:某個模塊負責產生數據,這些數據由另一個模塊來負責處理(此處的模塊是廣義的,可以是類、函數、線程、進程等)。產生數據的模塊,就形象地稱為生產者&a…

通過Xshell登錄遠程服務器實時查看log日志

主要想總結以下幾點: 1.如何使用生成密鑰的方式來登錄Xshell連接遠端服務器 2.在遠程服務器上如何上傳和下載文件(下載log文件到本地) 3.如何實時查看log,提取錯誤信息 一. 使用生成密鑰的方式來登錄Xshell連接遠端服務器 ssh登錄…

如何將Jupyter Notebook連接到遠程Spark集群并每天運行Spark作業?

As a data scientist, you are developing notebooks that process large data that does not fit in your laptop using Spark. What would you do? This is not a trivial problem.作為數據科學家,您正在開發使用Spark處理筆記本電腦無法容納的大數據的筆記本電腦…

是銀彈嗎?業務基線方法論

Fred.Brooks在1987年就提出:沒有銀彈。沒有任何一項技術或方法可以能讓軟件工程的生產力在十年內提高十倍。 我無意挑戰這個理論,只想討論一個方案,一個可能大幅提高業務系統開發效率的方案。 方案描述 我管這個方案叫做“由基線擴展…

linux core無權限,linux – 為什么編輯core_pattern受限制?

當我試圖為故意崩潰的程序生成核心文件時,最初的核心文件生成似乎被abrt-ccpp阻礙了.所以我嘗試用vim手動編輯/ proc / sys / kernel / core_pattern:> sudo vim /proc/sys/kernel/core_pattern當我試圖保存文件時,vim報告了這個錯誤:"/proc/sys…

nsa構架_我如何使用NSA的Ghidra解決了一個簡單的CrackMe挑戰

nsa構架by Denis Nu?iu丹尼斯努尤(Denis Nu?iu) 我如何使用NSA的Ghidra解決了一個簡單的CrackMe挑戰 (How I solved a simple CrackMe challenge with the NSA’s Ghidra) Hello!你好! I’ve been playing recently a bit with Ghidra, which is a reverse engi…

分布與并行計算—生產者消費者模型隊列(Java)

在生產者-消費者模型中&#xff0c;在原有代碼基礎上&#xff0c;把隊列獨立為1個類實現&#xff0c;通過公布接口&#xff0c;由生產者和消費者調用。 public class Consumer implements Runnable {int n;CountDownLatch countDownLatch;public Consumer(BlockingQueue<In…

python 日志內容提取

問題&#xff1a;如下&#xff0c;一個很大的日志文件&#xff0c;提取 start: 到 end: 標志中間的內容 日志文件a.log xxxxx yyyyy start: start: hahahaha end: start: hahahahha end: ccccccc kkkkkkk cdcdcdcd start: hahahaha end: code import reisfindFalse with open(&…

同一服務器部署多個tomcat時的端口號修改詳情

2019獨角獸企業重金招聘Python工程師標準>>> 同一服務器部署多個tomcat時&#xff0c;存在端口號沖突的問題&#xff0c;所以需要修改tomcat配置文件server.xml&#xff0c;以tomcat7為例。 首先了解下tomcat的幾個主要端口&#xff1a;<Connector port"808…