數據科學家訪談錄 百度網盤
A quick search on Medium with the keywords “Data Science Interview” resulted in hundreds of Medium articles to help guide the reader from what concepts are covered to even specific company interviews ranging from Tesla, Walmart, Twitter, Apple, AWS, etc.
在帶有關鍵字“數據科學面試”的Medium上進行了快速搜索,結果產生了數百篇Medium文章,可幫助指導讀者從涵蓋哪些概念到甚至是特斯拉,沃爾瑪,Twitter,Apple,AWS等特定公司的訪談。
W
w ^
Not everyone is going to land a data science job at a major company like Amazon and you might have to work at some companies where they are just starting out in data science but you should have a good idea of what you can expect during your time as an employee there and avoid any nasty surprises.
并非每個人都打算在像Amazon這樣的大公司找到數據科學工作,您可能需要在一些剛開始從事數據科學工作的公司工作,但您應該對自己在這段時間內的期望有所了解那里的員工,并避免任何令人討厭的驚喜。
Through ONLY 3 THEMES you should be able to discern if this will be a nightmare job or if you will be supported in your career path as a data scientist. The last thing someone wants is to start a job and want to leave a few months in. Also, this does not mean every job will be easy, your job should challenge you but in a good way. And if you like really hard challenges that involve shaking up legacy systems/processes and bureaucracy in some companies that is fine but at least you have an idea of what you are getting yourself into.
通過僅3個主題,您應該能夠辨別這將是一場噩夢,還是作為數據科學家在您的職業道路上得到支持。 某人想要做的最后一件事是開始工作并想離開幾個月。而且,這并不意味著每一項工作都會很輕松,您的工作應該挑戰您,但是會以一種很好的方式挑戰您。 并且,如果您真的喜歡艱巨的挑戰,包括在一些公司中改變舊系統/流程和官僚作風,那很好,但是至少您對自己的目標有所了解。
主題編號1:數據 (Theme Number 1: The Data)
Some of the first questions I ask are about the data. As a data scientist, you will be obviously expected to wrangle, explore, transform, and feed your models with data.
我首先問的一些問題是關于數據的。 作為數據科學家,很明顯,您將需要糾纏,探索,轉換并為模型提供數據。
How is your data stored? How large is your data? If your data is all in a data lake and you are expected to create tables just know now you are also the database person other than data scientist. Make sure you are coming into at least a database that is already set up. Sometimes companies might be in the middle of migrating their data from databases which takes a while at some companies. At a startup I worked at they migrated their entire data from Redshift to Snowflake in under 3 months versus a large international company I worked at it took over a year.
您的數據如何存儲? 您的數據有多大? 如果您的數據全部在數據湖中,并且您應該創建表,那么現在就知道您也是數據庫科學家,而不是數據科學家。 確保至少要進入一個已經建立的數據庫。 有時,公司可能正處于從數據庫遷移數據的過程中,這在某些公司中需要一些時間。 在我工作過的一家初創公司中,他們在不到3個月的時間內將所有數據從Redshift遷移到了Snowflake,而在我工作過的一家大型國際公司花了一年多的時間。
How often does it get updated and how at a high level? Will I be expected to manage the data pipelines? (If working with large amounts of data they should have a Data Engineer if not you will be expected to wear both hats). There should be someone other than you making sure the database is being maintained.
它多久更新一次,并在較高水平上更新? 是否需要管理數據管道? (如果要處理大量數據,那么他們應該擁有一名數據工程師,否則,您將被戴上兩頂帽子)。 除了確保數據庫正在維護之外,還應該有其他人。
What are the data sources of the data? You would be surprised at how some of the data is updated, maybe even coming from through Excel uploads. Ideally, it would come in through an API or logs from a web application if you have a product. I have worked at a company where data came from MANY sources: survey forms, SMS data, email data, CRM, APIs, excel uploads. Make sure you have an idea of the depth and diversity of the data sources.
數據的數據源是什么? 您會驚訝于某些數據是如何更新的,甚至可能來自Excel上傳。 理想情況下,如果有產品,它將通過API或來自Web應用程序的日志進入。 我曾在一家公司的數據來自很多來源的公司工作:調查表,SMS數據,電子郵件數據,CRM,API,excel上載。 確保您對數據源的深度和多樣性有所了解。
主題2:您的經理。 (Theme Number 2: Your manager.)
This theme is SO important and will be the theme with the longer paragraphs. Managers are really your gateway to success in a company, and if you have a bad manager but love the company it will be much harder to navigate. A book I read called Positioning: The Battle for your Mind is mostly about marketing and brand image, however; there was a paragraph that talked about managers on how they were really part of cultivating your brand image at a company. They are the ones that will give you projects that should be aligned to your interest or give you a project that you didn’t know you would like but end up liking it anyway because they understand you. A good manager wants you to succeed. I have had four managers and let me tell you it is a very different experience having a manager who sees the value in you and wants you to excel even if it means standing out more at times than themselves versus a manager who sees the value in you and is intimidated and tries to keep you from acquiring any project. However, also be careful with those managers who buy into the “hire someone smarter than yourself” but turns it into so their direct reports do all the hard work and the manager just sits back. Make sure your manager has skin in the game and wants you to excel. There is nothing worse than a manager that is not passionate about the role they play themselves at the company.
這個主題非常重要,將成為較長段落的主題。 經理確實是您通往公司成功的門戶,如果您的經理不好,卻熱愛公司,將很難駕馭。 我讀過的一本書叫做《 定位:為您的心靈而戰》主要是關于營銷和品牌形象的。 有一段內容談到了經理們,他們實際上是如何在公司中培養您的品牌形象的。 他們會為您提供符合您的興趣的項目,或者為您提供您不知道想要的項目,但最終還是喜歡它,因為他們了解您。 一個好的經理希望你成功。 我有四位經理,讓我告訴您,擁有一個看到您內在價值并希望您脫穎而出的經理是一種截然不同的經歷,即使這意味著與一個看到您內在價值的經理相比,有時會比自己更突出并被嚇到并試圖阻止您獲得任何項目。 但是,也要對那些購買“雇用比自己聰明的人”但變成事實的經理人保持謹慎,以便他們的直接報告完成所有艱苦的工作,而經理人只能坐下來。 確保您的經理在游戲中有皮膚,并希望您表現出色。 沒有一個經理比不熱衷于他們在公司中扮演的角色更糟。
What is the background of your supervisor and title? Ideally, your supervisor should speak your same language although the higher up your title is let’s say, Manager, Director, VP, or Chief Data Officer sometimes you will be reporting to someone who is more business savvy. At the startup I was at the Chief Data Officer would query from the database themselves and also use Python. Management understands data and strategy but they might not know why your Support Vector Model is doing worse than your Logistic Regression Model. If you are an entry-level or even experienced Data Scientist you might prefer that your supervisor can guide you. It also helps when a project is a road blocked, they will understand quickly and thoroughly what is causing the delay and can relay the communication to upper management.
您的主管背景和職務是什么? 理想情況下,您的主管應該說相同的語言,盡管您的職位較高,例如經理,董事,副總裁或首席數據官,有時您將向業務更精明的人匯報工作。 在啟動時,我在首席數據官處會從數據庫本身查詢并使用Python。 管理層了解數據和策略,但他們可能不知道為什么支持向量模型比Logistic回歸模型做得更差。 如果您是入門級甚至是經驗豐富的數據科學家,那么您可能希望主管可以指導您。 當項目遇到道路阻塞時,它也有幫助,他們將Swift而透徹地了解造成延遲的原因,并將通信傳達給高層管理人員。
Have they ever managed someone before? Note this is not something that people should be inherently biased towards. I have had a manager before that has never managed before and took the time to read books about management, really cared about my growth and development, and did not micromanage versus I have reported to a manager with TONS of management experience but has never managed an analytics person and did not know how to best leverage my skill set. However, both of these people never managed a data scientist before which to my earlier point it is so much easier when you can speak the same language as your supervisor, especially at an entry-level position.
他們以前曾經管理過某人嗎? 請注意,這不是人們固有的偏見。 我以前有位經理,以前從未管理過,花時間閱讀有關管理的書籍,真正關心我的成長和發展,沒有微觀管理,而我曾向擁有TONS管理經驗的經理報告過,但從未管理過分析人員,不知道如何最好地利用我的技能。 但是,這兩個人都從未管理過數據科學家,而在我之前,您可以說出與主管相同的語言要容易得多,尤其是在入門級職位上。
How many direct reports do they have and who are they? I don’t know if there is a rule of thumb on how many direct reports to each manager but the manager should be able to manage weekly check-ins with each direct report. It also helps to know who the other reports are to get an idea of the vastness your manager will have to juggle. A manager directing a bunch of data scientists or analysts is very different than managing a broader team with the subject matter expert covering communications, operations, finance, etc. That second manager will have to do more context switching on a team versus a manager managing a team of analysts. However, the first manager might have an analyst in each department and will have to understand the nuances that lie in the data relating to each department but they won’t have to know as much as the second manager since the analyst is the middle man to each department and everyone will share similar tools eg: SQL and Python or R.
他們有多少直接報告,他們是誰? 我不知道向每個經理報告多少直接報告是否有經驗法則,但是經理應該能夠管理每個直接報告的每周簽到。 這也有助于了解其他報表是誰,從而使您了解經理必須處理的工作量。 領導一群數據科學家或分析師的經理與管理范圍更廣的團隊(與主題專家溝通,運營,財務等)截然不同。與管理一個團隊的經理相比,第二個經理必須在團隊中進行更多的上下文切換。分析師團隊。 但是,第一位經理可能在每個部門中都有一位分析師,并且必須了解與每個部門相關的數據中的細微差別,但由于第二位經理是分析師的中間人,因此他們不必像第二位經理那么多每個部門和每個人都將共享類似的工具,例如:SQL和Python或R。
What are some of the direct impacts the manager has done for the team? Any upper management title instantly makes you think of meetings. So sometimes it is hard to see the direct impact unless you are in those meetings with that person. However, managing a technical team the tangible impact they should be producing/ facilitating processes, projects, data products, models, case studies, webinars, etc for your team. Like in that book I mentioned earlier, the manager is a huge part of not only aiding your brand image but representing the brand image of your team. They may not be doing the analysis themselves but they should be helping to provide the environment where everything mentioned beforehand can be executed by investing in the right tools, people, and creating those processes.
經理對團隊產生了哪些直接影響? 任何高級管理人員職位都會立即讓您想到會議。 因此,有時除非您正在與該人開會,否則很難看到直接的影響。 但是,管理技術團隊應該對他們產生/促進流程,項目,數據產品,模型,案例研究,網絡研討會等產生切實的影響。 就像我之前提到的那本書一樣,經理不僅是幫助您的品牌形象而且代表著團隊的品牌形象的重要組成部分。 他們可能不會自己進行分析,但是應該通過提供合適的工具,人員并創建這些流程來幫助提供一個可以執行之前提到的所有內容的環境。
What is the road map for their team and for you that they have in mind? This blends with the previous question on how have the manager contributed to the ecosystem of the team. Sometimes the manager is building the team from scratch and still defining some of those processes, however; they should at least have a road map of where they eventually want to be. It is a huge red flag if the manager can not articulate this road map, even if there is a lot to maintain. Sometimes a manager inherits legacy systems and tries to optimize or migrate them into a new world but there should always be a clear road map on why his or her team is focusing on X and how your role plays into it. This question will bring up what are the roadblocks right now and how they think of their team adds value to the company.
他們對他們的團隊和您的想法是什么? 這與先前的問題有關,即經理如何為團隊的生態系統做出貢獻。 有時經理會從頭開始建立團隊,但仍然定義其中一些流程。 他們至少應該有最終目標的路線圖。 如果經理不能闡明這個路線圖,即使有很多需要維護的地方,也是一個巨大的危險信號。 有時,經理會繼承舊系統,并嘗試對其進行優化或遷移到新的世界中,但是對于他或她的團隊為何專注于X以及您的角色如何發揮作用,應該始終有明確的路線圖。 這個問題將提出目前的障礙,以及他們對團隊的看法如何為公司增加價值。
What is the road map for your career? Does the company have an annual or six-month reviews? A company I was at, your bonus is tied to your annual review which I liked because it would force your manager to recognize your weaknesses and your strengths and reward you for them. This also helped maintain a record so made promotions a bit more seamless. Versus another company, I was at they had you on a probation period for six months and after that, there were no more annual reviews. The latter seemed more like the company cared for themselves versus the employee. Not every company has a clean-lined out promotion plan with dates and expectations (which is ideal) but there should be some sort of emphasis on career growth whether that is tuition reimbursement, professional development classes, bonuses, raises, etc.
您的職業路線圖是什么? 公司是否有年度或六個月的審查? 在我所在的公司中,您的獎金與我喜歡的年度檢查掛鉤,因為這會迫使您的經理認識到您的弱點和長處并為他們獎勵。 這也有助于保持記錄,使促銷活動更加順暢。 與另一家公司相比,我在他們的試用期為六個月,此后便沒有年度審核了。 后者似乎更像是公司關心自己而不是員工。 并非每家公司都有一個明確的日期和期望的晉升計劃(這是理想的),但無論是學費報銷,專業發展課程,獎金,加薪等,都應該在某種程度上強調職業發展。
主題編號3:協作/團隊環境。 (Theme Number 3: Collaboration / Team environment.)
Is there code review done at your company and what is the process like? If you work with code, pipelines, models, analysis, etc there should ABSOLUTELY be a code review. I have had both types of experiences where the code review was heavily implemented versus another company where I implemented the code review process. Both companies had a Slack channel where we would be alerted when code was pushed to production. If a company pushes code from development → staging → production then that is a good indication they know what they are doing.
貴公司是否進行過代碼審查,流程如何? 如果您使用代碼,管道,模型,分析等,則應該絕對進行代碼審查。 我經歷過兩種類型的經歷,即大量實施代碼審查,而另一家公司則實施了代碼審查過程。 兩家公司都有一個Slack渠道,當將代碼推送到生產環境時,我們會收到警報。 如果公司從開發→階段→生產中推送代碼,那么這很好地表明他們知道自己在做什么。
What are the types of tools I would use? There is a balance of too many tools and too little tools. At least for a team analyzing data, there should be a database (Redshift, Postgres, Snowflake, etc), a visualization tool for non-data analytics shareholders also known as business intelligence tool (Sisense, Looker, Domo, etc) — this is different than the data people using open source to visualize data during exploratory data analysis, a version control (Github, Gitlab), containers for software building (eg Docker), data management of pipelines (Airflow or Luigi), communication tools (email, Slack, Zoom, JIRA, etc).
我將使用哪種工具? 工具太多和工具太少之間存在一種平衡。 至少對于一個分析數據的團隊來說,應該有一個數據庫(Redshift,Postgres,Snowflake等),一個用于非數據分析股東的可視化工具,也稱為商業智能工具(Sisense,Looker,Domo等),這是不同于在探索性數據分析過程中使用開源工具可視化數據的數據人員,版本控制(Github,Gitlab),用于軟件構建的容器(例如Docker),管道的數據管理(Airflow或Luigi),通信工具(電子郵件,Slack ,縮放,JIRA等)。
Do I work with more than one data scientist or analyst or am I the only one? This question is VERY important especially if you are an entry-level because not only do you want your supervisor to speak your same language but you also want to have others where you can learn from. You might be dedicated to different departments/stakeholders/projects but the fact that someone else with a similar skill set is comforting especially if your team believes in cross-training and having time set aside to collaborate, share project results, and do retrospects together.
我是否與一位以上的數據科學家或分析師合作,或者我是唯一的一位? 這個問題非常重要,特別是在您是入門級的情況下,因為您不僅希望您的主管說相同的語言,而且還希望有其他可以學習的地方。 您可能致力于不同的部門/利益相關者/項目,但是事實是,其他具有類似技能的人會感到很安慰,尤其是如果您的團隊相信交叉培訓并留出時間進行協作,共享項目成果并共同回顧。
Is there a subject matter expert? This is so so so important. I can’t stress this enough. Yes as a Data Scientist you can build a model with high accuracy, high AUC score, or F1 score but will it be usable/ implemented? It is important that Data scientists have as much context to the data as possible, whether it be from their background or access to a subject matter expert. At the end of the day, we build models for the company or a stakeholder to use. Also, a complex model could be beaten out in terms of training time/money by a simpler more explainable model like logistic regression just because some great feature engineering was used.
有主題專家嗎? 這是如此重要。 我不能太強調這一點。 是的,作為數據科學家,您可以構建具有高精度,高AUC分數或F1分數的模型,但是它將可用/實施嗎? 重要的是,無論是從背景還是與主題專家接觸,數據科學家都應盡可能多地了解數據。 歸根結底,我們為公司或利益相關者建立模型以供使用。 同樣,就復雜的模型而言,在訓練時間/金錢方面,可以通過更簡單,更易解釋的模型(如邏輯回歸)來擊敗,因為使用了一些出色的功能工程。
What are the culture and work-life balance like? This question is not specific to data science teams but it is important to know how not only within teams reporting to the same manager but how do you communicate with the other departments you will be working with? I worked at a startup where we would have bi-weekly coffee dates that were set up with other people from different teams versus worked at a company where people did not know each other unless they worked there for almost ten years, there was no socializing across different departments and that definitely hinders companies at times especially if you are all working on a common goal that is specific. I believe a good data science team has done some socializing because it is an immense help to have context to data from processes.
文化和工作與生活的平衡如何? 這個問題不是特定于數據科學團隊的,但重要的是要知道不僅在團隊中向同一個經理匯報,而且如何與將與之合作的其他部門溝通? 我在一家初創公司工作,在那里我們會與來自不同團隊的其他人一起安排每兩周一次的咖啡約會,而不是在一家彼此不認識的公司工作,除非他們在那里工作了將近十年,各個部門,這有時會阻礙公司發展,尤其是當您都在為特定的共同目標而努力時。 我相信一支優秀的數據科學團隊已經進行了一些社交活動,因為對流程數據具有上下文有極大的幫助。
Thank you for your time! I hope this is helpful and if you have any thoughts please discuss below and/or want to reach out, you can at www.monicapuerto.com.
牛逼絞紗您的時間! 希望對您有所幫助,如果您有任何想法,請在下面進行討論和/或與我們聯系,請訪問www.monicapuerto.com 。
翻譯自: https://towardsdatascience.com/questions-you-should-ask-them-in-a-data-science-interview-1288c754ca51
數據科學家訪談錄 百度網盤
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388204.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388204.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388204.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!