陸濤喜歡夏琳嗎_夏琳·香布利斯(Charlene Chambliss):從心理學到自然語言處理和應用研究

陸濤喜歡夏琳嗎

技術系列中的女性 (WOMEN IN TECHNOLOGY SERIES)

Interest in data science has been exponentially increasing over the past decade, and more and more people are working towards making a career switch into the field. In 2020, articles and YouTube videos about transitioning into a career in data science abound. Yet, for a lot of people, many key questions about this switch still remain: How do you break into data science from a social science background? And what are some of the most important skills in fields like psychology that can be applied to data science?

在過去的十年中,人們對數據科學的興趣呈指數增長,并且越來越多的人致力于將職業轉變為該領域。 2020年,有關如何過渡到數據科學職業的文章和YouTube視頻不勝枚舉。 但是,對于許多人來說,有關此切換的許多關鍵問題仍然存在:如何從社會科學背景闖入數據科學? 在心理學等領域可以應用于數據科學的最重要的技能是什么?

Charlene Chambliss has an inspiring and non-traditional career path. Currently, she leverages state-of-the-art natural language processing to “build smarter tools for analyzing massive amounts of information.1” In the past two years, she has written about NLP topics including BERT for named entity recognition, and word2vec for news headline analysis to name a few. However, before her current role as a machine learning engineer, she held roles in marketing, psychology, research, and interned as a data scientist in the skincare industry.

夏琳·香布利斯(Charlene Chambliss)具有非傳統的職業道路。 目前,她利用最先進的自然語言處理能力來“構建用于分析大量信息的更智能工具。1”。在過去兩年中,她撰寫了有關NLP主題的文章,包括用于命名實體識別的BERT和用于命名實體識別的word2vec。新聞標題分析等。 但是,在擔任機器學習工程師之前,她曾在市場營銷,心理學,研究領域任職,并在皮膚護理行業擔任數據科學家的工作。

Amber: Could you tell us a bit about your background?Charlene: Sure! I’ve had kind of an unusual path into data science, so I’ll start from the beginning and go into some detail to help illuminate what it took for me.

琥珀色:您能談談您的背景嗎? 夏琳:當然! 我在數據科學領域走過一條不尋常的道路,因此,我將從頭開始,并進行一些詳細介紹,以幫助闡明對我而言所需要的。

I grew up in a smallish agricultural town (Modesto, CA), where my dad worked at Safeway (still does!) and my mom was a stay-at-home mom. They really impressed upon me the importance of taking my education seriously, which was fine with me because I loved learning and I enjoyed making them proud of me.

我在一個很小的農業小鎮(加利福尼亞州莫德斯托)長大,我父親曾在Safeway工作(現在還在!),媽媽是一個全職媽媽。 他們的確給我留下了認真對待我的教育的重要性,這對我來說很好,因為我熱愛學習,并且喜歡讓他們為我感到驕傲。

Ever since I was little, I wanted to be a scientist. I loved tinkering and learning how things worked. My mom indulged my curiosity by taking me to the library (I would come home with a stack of like 12 books), having me help her in the kitchen (cooking = chemistry!), and getting me the occasional toy science kit.

從小我就想成為一名科學家。 我喜歡修補和學習事物的工作原理。 媽媽通過帶我去圖書館(我會帶著一堆12本書回家),讓我在廚房里幫助她(烹飪=化學!)和偶爾給我的玩具科學工具包來滿足我的好奇心。

That interest carried on through high school and into freshman year of college, where I had decided I wanted to study chemical engineering and become a flavor scientist, because chemistry was my favorite subject. I (adorably) thought that I would simply invent new flavors to make healthy food taste better, so people would have an easier time eating salads and vegetables, and thus be healthier overall. I hated eating salads and vegetables, so 17-year-old-me thought I was brilliant and that this was an amazing solution.

這種興趣一直持續到高中,然后進入大學一年級,在那里我決定要學習化學工程并成為一名風味科學家,因為化學是我最喜歡的學科。 我(一直)認為,我只是發明新的口味以使健康食品的味道更好,所以人們可以更輕松地食用沙拉和蔬菜,從而總體上更健康。 我討厭吃沙拉和蔬菜,所以我17歲的我認為我很聰明,這是一個了不起的解決方案。

“Studying a social science is, in general, a great way to get used to dealing with muddy, hard-to-define questions, a skill that’s key to delivering data science work that decision-makers will actually feel comfortable using.”

“一般來說,學習社會科學是一種很好的方式來習慣處理泥濘的,難以定義的問題,這是交付決策者實際上會感到滿意的數據科學工作的關鍵技能。”

I kept up my education focus and work ethic throughout high school, and made it into Stanford for undergrad. Frankly, that was pretty unexpected for me — I thought I would be going to UC Davis and maaaybe Berkeley if I was really lucky. Around half of folks who graduate from my high school don’t end up going to college at all, so even these felt like pretty high ambitions. Of my graduating class of 500 that year, I think only around 5 of us made it into “top schools” (Berkeley, Stanford, Harvard).

在整個高中期間,我一直保持著教育重點和職業道德,并進入斯坦福大學攻讀本科。 坦白說,這對我來說是非常出乎意料的-如果我真的很幸運,我想我會去加州大學戴維斯分校和maaaybe Berkeley。 大約有一半從我的高中畢業的人根本沒有去上大學,因此即使是這些人也感覺到很高的野心。 那一年我的500屆畢業班中,我認為只有大約5人進入了“頂尖學校”(伯克利,斯坦福,哈佛)。

What I was really not expecting when I went to Stanford was the culture shock I was in for. The vast majority of students at Stanford come from upper-income backgrounds, with a median family income of $167,500. They are, by and large, the kinds of kids who have college-educated, professional parents, go to the best, most well-funded high school in town, and have paid tutors to help them out in any area they’re struggling with. Meanwhile, I grew up with a HH income around a quarter of that, and the level of preparation I received in some areas relative to my peers was reflective of that difference. (My parents and teachers were wonderful and had done their best, but there’s only so much one can do with limited resources.)

我真正去斯坦福大學時沒想到的是,我對文化的震驚。 斯坦福大學的絕大多數學生來自高收入家庭,家庭收入中位數為167,500美元 。 總的來說,他們是那種具有受過大學教育的專業父母,去城里最好,資金最充裕的高中,并付錢給家教的孩子,他們可以在他們所苦苦掙扎的任何地方幫助他們。 同時,我的家庭生活收入約為家庭收入的四分之一,在某些方面相對于同齡人,我的準備水平反映了這種差異。 (我的父母和老師都很出色,已經盡了最大的努力,但是只有有限的資源可以做很多事情。)

Suddenly, I found myself feeling very insecure about my abilities (particularly my aptitude for math and computer science) and was really questioning whether I measured up to the other students. I didn’t realize that our backgrounds had been so different, since no one goes around talking about that sort of thing, so I attributed differences in performance to my own lack of ability. I was also the only one from my high school who went to Stanford that year, so I didn’t know anyone when I got there and had no one to talk to about what I was experiencing. The feeling of being an impostor never really went away during my time at Stanford, but I did at least get better at faking-it-’til-I-made-it.

突然,我發現自己對自己的能力(尤其是我對數學和計算機科學的天賦)感到很不安全,并且真的在質疑我是否對其他學生進行了評估。 我沒有意識到我們的背景如此不同,因為沒有人談論這種事情,所以我將性能差異歸因于自己缺乏能力。 我也是那年高中唯一去過斯坦福大學的人,所以我到那兒時不認識任何人,也沒有人談論我的經歷。 在斯坦福大學期間,作為冒名頂替者的感覺從未真正消失過,但我至少在偽造“直到我造出來的”方面做得更好。

I did make it through Stanford, although I ended up not pursuing chemical engineering and also needed to take a year off after junior year to help with my parents’ divorce (my mom is disabled and needed help selling our family home and moving out). I graduated with a B.A. in Psychology in 2017 — first in my immediate family to get a 4-year degree — but I felt like I had made a lot of mistakes along the way due to a lack of guidance and role models. Even just searching for my first job proved difficult, because I could really only turn to the career center for advice on how to navigate the job market for “educated professionals.” The pamphlets and 30-minute consultations they could offer couldn’t really fill in all the gaps, but after a lot of research and attending career fairs, I was able to land a job doing social media marketing for a small agency.

我確實通過斯坦福大學取得了成功,盡管我最終沒有追求化學工程,并且還需要在大三后休假一年以幫助父母離婚(我的母親殘疾,需要幫助我們賣掉家并搬出去)。 我于2017年獲得心理學學士學位-我是直系親屬中第一個獲得4年學位的人-但由于缺乏指導和榜樣,我覺得自己一路上犯了很多錯誤。 即使只是尋找我的第一份工作也證明是困難的,因為我真的只能向職業中心尋求關于如何為“受過教育的專業人員”打入職場的建議。 他們所能提供的小冊子和30分鐘的咨詢服務并不能真正填補所有空白,但是經過大量研究和參加職業博覽會之后,我得以找到了一家小型代理商從事社交媒體營銷的工作。

Image for post

Without going into too much detail about one’s financial and overall career prospects as a psychology major with only a Bachelor’s, it became clear to me over the course of my time in that job that I wasn’t going to get where I wanted to go career-wise unless I made a big change. So near the end of 2017, I decided I wanted to go into data science, specifically focusing on machine learning, and threw myself into GRE studies so I could get my applications in in time for Fall 2018 admissions. (I’ll go into more detail about why I chose data science, and NLP in particular, in the next section.)

沒有過多地了解只有心理學士學位的心理學專業的財務和整體職業前景,在我從事該工作的過程中,我很清楚自己不會去自己想去的職業-除非我做了很大的改變。 因此,在2017年底左右,我決定想進入數據科學領域,特別專注于機器學習,然后投入GRE學習,以便在2018年秋季入學之前及時獲得申請。 (在下一節中,我將詳細介紹為什么選擇數據科學,尤其是NLP。)

I enrolled in my M.S. as planned, doing my coursework and studying as much as I could outside class, focusing especially on stats, linear algebra, Python, and machine learning. The degree coursework was all in R, so I learned Python entirely on my own using a combination of online classes and a massive 1500+ page textbook (Learning Python). Toward the end of my first year (spring 2019), I landed a data science internship at Curology and worked there through fall. Then, at the beginning of my second year, I partnered up with an amazing mentor, Nina Lopatina, through SharpestMinds, because I had decided I wanted to focus specifically on getting a role doing NLP. At the end of the 10-week mentorship, I started looking for jobs, and got an offer to join Primer full-time in December of 2019.

我按計劃報名參加了MS課程,完成了課業并在課外學習了很多東西,尤其是專注于統計,線性代數,Python和機器學習。 學位課程全部用R編寫,所以我結合了在線課程和龐大的1500多頁教科書( Learning Python ),完全靠自己學Python 。 在第一年末(2019年Spring),我在Curology進行了數據科學實習,并一直到秋天工作。 然后,在第二年開始的時候,我通過SharpestMinds與一位了不起的導師Nina Lopatina合作,因為我決定我想專門專注于扮演NLP的角色。 在為期10周的指導期結束后,我開始尋找工作,并于2019年12月獲得了全職加入Primer的邀請。

I would need to defer the last semester of my MS program to start full-time, which was a tough call, but the experience was more important to me, so I did. It turns out that that decision was frighteningly well-timed, because the COVID-19 pandemic decimated the recent grad job market only a few months later. I have classmates who are still struggling to find jobs, and I easily could have ended up in the same situation. I realize that I am very privileged that my roll of the dice worked out so well.

我需要推遲我的MS程序的最后一個學期才能開始全日制學習,這是一個艱難的決定,但是經驗對我來說更重要,所以我做到了。 事實證明,這一決定的時機非常糟糕,因為COVID-19大流行僅在幾個月后就摧毀了最近的畢業生就業市場。 我有一些仍在努力尋找工作的同學,而我很可能最終會遇到同樣的情況。 我意識到我很榮幸自己的骰子制作得如此出色。

All in all, it took about 2 years to transition from marketing into a full-time machine learning engineer role, from a background of relatively little math and programming experience. (Prior to 2017, I had only taken single-variable calculus, basic/intro statistics, and one Java programming class.)

總的來說,從相對較少的數學和編程經驗的背景下,從市場營銷過渡到專職機器學習工程師角色大約花費了2年時間。 (在2017年之前,我只參加了單變量演算,基本/入門統計和一個Java編程課程。)

A: Before working in the data science industry, you studied psychology at Stanford. Could you tell us how your experience there influenced your career path into data science?C: Studying a social science is, in general, a great way to get used to dealing with muddy, hard-to-define questions, a skill that’s key to delivering data science work that decision-makers will actually feel comfortable using. It’s sort of a Murphy’s Law mindset, as applied to experiment results: I’ve become extremely attentive to anything that could be “confounding” or otherwise influencing the results of my analysis, and I can call attention to potential caveats whenever appropriate. That way, the stakeholder can leverage their domain knowledge to decide whether they think those things do or don’t matter for our conclusions, and we can adjust the experiment/analysis accordingly.

答:在從事數據科學行業之前,您曾在斯坦福大學學習過心理學。 您能否告訴我們您的經驗如何影響您進入數據科學的職業道路? C:一般而言,學習社會科學是一種很好的方式來習慣于處理泥濘,難以定義的問題,這是交付決策者實際上會感到滿意的數據科學工作的關鍵技能。 這有點像墨菲定律的心態,用于實驗結果:我已經非常注意可能會“混淆”或影響分析結果的任何事情,并且在適當的時候我可以引起注意。 這樣,利益相關者可以利用他們的領域知識來決定他們認為這些事情對我們的結論是否重要,并且我們可以相應地調整實驗/分析。

In addition to that, I’ve probably spent a collective year and a half working in psychology labs to implement experiments. While there is a lot of “grunt work” involved in these sorts of positions, like data entry, you also get a front-row seat to how scientific studies actually happen, from data collection to the final statistical analyses, and you get to participate in some of the decisions that are made along the way. This prepared me quite well for data science workflows, as well as giving me some practical skills (like working with spreadsheets) and a can-do attitude that would be helpful later.

除此之外,我可能已經花了一年半的時間在心理學實驗室工作以實施實驗。 盡管這類職位涉及很多“艱巨的工作”,例如數據輸入,但您也可以從數據收集到最終的統計分析,在科學研究的實際過程中占有一席之地,并參與其中在此過程中做出的一些決定中。 這使我為數據科學工作流程做好了充分的準備,同時還為我提供了一些實踐技能(例如,使用電子表格)和可以做的態度,這些將對以后的工作有所幫助。

A: Previously, you interned at Curology as a data science intern. Could you discuss how data science looks like in the skincare industry? What types of questions did you and your team seek to answer? And what are some of the most interesting projects you worked on in your time at Curology?C: I think my experience at Curology was a good example of how data science looks in a D2C (direct-to-consumer) business in general, especially in a startup context. It is often the case that the first thing consumer-focused businesses need data-wise (after data engineers, of course) is really just a lot of descriptive statistics, often known as “consumer insights.”

答:以前,您曾在Curology實習過,是一名數據科學實習生。 您能否討論一下數據科學在護膚行業中的樣子? 您和您的團隊尋求回答什么類型的問題? 您在Curology期間從事哪些最有趣的項目? C:我認為我在Curology上的經驗很好地說明了數據科學在D2C(直接面向消費者)業務中的表現,尤其是在啟動環境中。 通常,以消費者為中心的企業首先需要數據方面的東西(當然,在數據工程師之后)實際上只是大量描述性統計信息,通常被稱為“消費者洞察力”。

Since I was embedded in the user acquisition department, I was especially focused on answering questions that would help us make better marketing decisions across the many different acquisition channels. 80% of the time, I was writing SQL against our data warehouse to better understand the behavior of different customer segments and track how that behavior trended over time, and turning those findings into interpretable dashboards for use by the rest of the team. The other 20% of the time, I used Python to analyze and visualize customers’ survey responses to better understand what they liked and needed from Curology.

自從我進入用戶獲取部門以來,我特別專注于回答可以幫助我們在許多不同的獲取渠道中做出更好的營銷決策的問題。 80%的時間里,我正在針對我們的數據倉庫編寫SQL,以更好地了解不同客戶群的行為,并跟蹤該行為隨時間的變化趨勢,并將這些發現轉化為可解釋的儀表板,以供團隊其他成員使用。 另外20%的時間,我使用Python分析和可視化了客戶的調查反饋,以更好地了解他們對Curology的需求。

So a few of the questions I got to ask and answer were:

所以我要問和回答的幾個問題是:

  • How do customers’ skincare goals vary based on their demographics (gender, age, etc.)? What is most important to each segment of customers, and how can we make sure we serve each of their needs well?

    客戶的護膚目標如何根據其受眾特征(性別,年齡等)而變化? 對于每個客戶群而言,最重要的是什么?我們如何確保我們很好地滿足他們的每一個需求?
  • Which of our channels have had the “stickiest” customers, i.e. customers who have tended to stay with us the longest? Do any other behaviors or preferences correlate with subscription length?

    我們哪個渠道擁有“最粘性”客戶,即那些與我們在一起時間最長的客戶? 是否有其他行為或喜好與訂閱時間有關?
  • Can we build a model that will leverage the historical data we have on customer behavior to predict customer lifetime value (LTV) at time of signup? (This is actually very hard when your customer base is growing quickly, due to sampling considerations!)

    我們可以建立一個模型來利用我們在客戶行為方面的歷史數據來預測注冊時的客戶生命周期價值(LTV)嗎? (出于抽樣考慮,當您的客戶群快速增長時,這實際上非常困難!)

I learned a ton. Doing data analysis with SQL doesn’t just help you learn SQL; it actually helps you think analytically, as cliche as that sounds. You first have to learn to translate someone’s natural-language question about customers into the appropriate metrics (where those metrics will often have different filter conditions and assumptions, depending on the intended use-case!), then ALSO learn how to actually execute that in a mathematically and technically correct way using SQL code. Sometimes you will even have to make sure that you are using the correct tables/data, because tables get deprecated, not all data makes it into the table due to bugs in the pipeline, or X metric only started being tracked 6 months ago, etc. There are many practical considerations you need to keep in the back of your mind when doing this kind of work. Doing rock-solid data analysis is just as challenging as machine learning IMO, albeit sometimes for different reasons.

我學到了很多。 使用SQL進行數據分析不僅可以幫助您學習SQL,還可以幫助您更好地學習SQL。 它實際上可以幫助您進行分析思考,聽起來像陳詞濫調。 您首先必須學習將某人關于客戶的自然語言問題轉換為適當的度量標準(其中,這些度量標準通常具有不同的過濾條件和假設,具體取決于預期的用例!),然后還學習如何在其中實際執行該度量標準。使用SQL代碼在數學和技術上正確的方法。 有時,您甚至必須確保使用正確的表/數據,因為表已被棄用,由于管線中的錯誤,或者不是6個月前才開始跟蹤X指標,所以并非所有數據都將其放入表中。在進行此類工作時,您需要牢記許多實際的考慮因素。 進行堅如磐石的數據分析與IMO機器學習一樣具有挑戰性,盡管有時是出于不同的原因。

Image for post
Image courtesy of Charlene Chambliss
圖片由Charlene Chambliss提供

A: Did you always know that working in data science was what you wanted to do? What inspired you to pursue a career in natural language processing? And, could you tell us a bit about your work on the Primer.Ai applied research team looks like?C: Not at all! I don’t think many of us who work in data science today could have anticipated the rise of this field. I didn’t even really know about the widespread use of applied statistics in the private sector until my senior year of undergrad.

答:您一直都知道從事數據科學工作是您想要做的嗎? 是什么激發了您從事自然語言處理的職業? 而且,您能否介紹一下您在 Primer.Ai 應用研究團隊的 工作情況 C:一點也不! 我認為今天在數據科學領域工作的許多人都沒有預料到這一領域的興起。 直到我大四的時候,我才真正知道私有領域應用統計的廣泛使用。

When I graduated with my B.A. and started my first job in marketing, I figured out that that wasn’t the right fit for me relatively quickly. I started researching my alternatives to see if there might be a career I could transition into that would be better-suited to my personality and values (and frankly, better-paying, as the entry-level marketing salary was only enough to live paycheck-to-paycheck in the Bay Area).

當我獲得文學學士學位并開始從事市場營銷的第一份工作時,我發現相對不合適我是不合適的。 我開始研究自己的替代方案,以查看是否有可以轉變為職業的職業更適合我的個性和價值觀(坦率地說,薪水更高,因為入門級營銷人員的薪水僅足以應付薪水,在海灣地區進行支付)。

After a few months of digging, I landed on data science. It was intellectually challenging work, poised to make an enormous impact both economically and in society at large. Not only that, but I noticed that people in data science careers often cared more deeply about ethics than I had seen elsewhere. To see that people in the field genuinely cared about how their work would impact people really spoke to me, and is what ultimately helped me decide on making the transition.

經過幾個月的挖掘,我學習了數據科學。 這是一項具有智力挑戰性的工作,有望對經濟和整個社會產生巨大影響。 不僅如此,而且我注意到,數據科學職業中的人們通常比我在其他地方看到的更加關注道德。 看到這個領域的人們真正關心他們的工作會對人們產生怎樣的影響,這真的對我說話,這最終幫助我決定進行過渡。

That said, I was still unsure, because I had had negative experiences with math and computer science in undergrad, and I wasn’t sure that I could hack it (haha). In my first quarter at Stanford, I got the worst grades I had ever received in my life in a calculus course and a CS course, which caused me to seriously question whether I was cut out for those kinds of subjects. When I started this journey, I had to convince myself that I could succeed by using objective measurements instead of my own feelings: “well I scored X on the SAT, and the average score for CS majors on the SAT was Y (where X > Y), so I should be able to learn the math and other material just as well as other folks in this field…”

就是說,我仍然不確定,因為我曾經在本科生中擁有過數學和計算機科學方面的負面經驗,而且我不確定是否可以破解它(哈哈)。 在斯坦福大學的第一學期,我在微積分課程和CS課程中獲得了我一生中最差的成績,這使我嚴重質疑我是否適合這些學科。 當我開始這一旅程時,我不得不說服自己,使用客觀的測量方法代替我自己的感覺可以成功:“好吧,我在SAT上得分為X,而在SAT上CS專業的平均得分為Y(其中X>是的,所以我應該能夠像該領域的其他人一樣學習數學和其他材料……”

“I love NLP because I can contribute directly to helping people cut through the noise and get down to what they need to know in order to live their lives and do their work more effectively.”

“我喜歡NLP,因為我可以直接幫助人們減少噪音,并深入了解他們需要知道的東西,以便生活和更有效地工作。”

I later made the connection during my M.S. in stats that the primary reasons for my underperformance as an undergrad were a lack of good study habits and a lack of interest in math as a subject. In high school I could get away with waiting until the night before to study for the test, and never reading the textbook outside of class, but that was no longer the case at Stanford. I improved my study habits over time, and by the time I took 2 statistics courses in senior year, I was able to ace them both. Similarly, once I started learning about some of the fascinating and unexpected ways that math and stats are being applied to the real world via data science, my mind really awoke to the benefits of math, and suddenly the motivation to master it was there. I got straight A’s in my M.S. coursework for all 3 semesters that I was enrolled.

后來,我在MS期間的統計數據中得出了這樣的聯系:造成我本科成績不佳的主要原因是缺乏良好的學習習慣和對數學作為學科的興趣。 在高中時,我可以等到晚上學習考試,再也不用在課外閱讀教科書了,但斯坦福大學不再是這種情況了。 隨著時間的流逝,我的學習習慣得到了改善,當我在大四時選修了2門統計學課程時,我就可以將兩者都放在首位。 同樣,一旦我開始學習通過數據科學將數學和統計信息應用于現實世界的一些有趣且出乎意料的方式,我的腦子就真正意識到了數學的好處,突然間就有了掌握它的動力。 在我注冊的所有三個學期中,我的MS課程都獲得了A。

It is still a little crazy to think that just a few years ago I truly disliked both math and programming, yet here I am now, using them both every day and genuinely enjoying it. I really want to emphasize how important it is not to put yourself into a “math person”/”not a math person” box, and the same goes for programming. Both skillsets are simply tools, and these tools have incredible power to make you more effective at any other area or interest you care about making a difference in, whether that’s art, law, a social science, or a more traditional synergy like engineering. If you can push through those early feelings of resistance and intimidation, there are wonderful feelings of competence and accomplishment waiting for you once you’re able to start using these tools for the things you care about.

想到幾年前我真的不喜歡數學和編程,仍然有些瘋狂,但是現在我在這里,每天都在使用它們并真正享受它。 我真的想強調不要讓自己陷入“數學人” /“不是數學人”的框有多么重要,編程也是如此。 這兩個技能集都是簡單的工具,這些工具具有令人難以置信的功能,可以使您在任何其他領域或在您關心的領域產生興趣的領域變得更有效,無論是藝術,法律,社會科學還是更傳統的協同作用(例如工程學)。 如果您能夠克服那些早期的抵制和恐嚇感覺,那么一旦您能夠開始使用這些工具來處理自己關心的事情,就會有美好的能力和成就感等待著您。

As for why I chose natural language processing (NLP) in particular, there are a few reasons. On a career level, I saw the NLP community as more welcoming to people from unconventional backgrounds, relative to an area like computer vision where I was really only seeing people from CS, math, physics, and electrical engineering backgrounds. On a more personal and interests-based level, I see NLP as the field best suited to helping solve the problem of information overload. There is an endless amount of information to consume, which is contributing to a heightened level of stress for everyone, as well as impairing the productivity of people in knowledge work careers. I love NLP because I can contribute directly to helping people cut through the noise and get down to what they need to know in order to live their lives, make informed decisions, and do their work more effectively.

至于為什么我特別選擇自然語言處理(NLP),有幾個原因。 在職業層面上,我看到NLP社區更歡迎來自非常規背景的人們,而相對于計算機視覺等領域,我實際上只看到來自CS,數學,物理學和電氣工程背景的人們。 從個人和基于興趣的角度來看,我認為NLP是最適合解決信息超載問題的領域。 信息的消耗量無窮無盡,這加劇了每個人的壓力,并削弱了知識型職業中人們的生產力。 我喜歡NLP,因為我可以直接幫助人們降低噪音,掌握他們的知識,以便生活,做出明智的決定并更有效地開展工作。

My work at Primer is directly relevant to the problem of information overload. At Primer, we’re leveraging powerful, cutting-edge NLP models to extract structured information from noisy, unstructured text data. This helps our customers get at the information they need much faster than having individual humans poring over the data themselves. Some analysts are working 12 hour days simply because they have no way of quickly reading and digesting the deluge of information they’re responsible for staying up-to-date with, and we want to change that.

我在Primer的工作與信息超載問題直接相關。 在Primer,我們利用功能強大的尖端NLP模型從嘈雜的非結構化文本數據中提取結構化信息。 這使我們的客戶獲得所需信息的速度比讓個人親自檢查數據快得多。 一些分析師每天工作12小時,只是因為他們無法快速閱讀和消化他們負責保持最新狀態的大量信息,我們希望對此進行更改。

My team, Applied Research, is tasked with training, testing, and making deep learning models available for Primer’s products, then integrating those models into our data pipeline or exposing them for use via an API. We also create reusable scripts and resources that allow people to train their own models on their own data. The work involves not just model experiments and engineering, but also plenty of collaboration with other teams that work more directly on our products and infrastructure.

我的團隊Applied Research負責培訓,測試和為Primer產品提供深度學習模型,然后將這些模型集成到我們的數據管道中,或通過API公開以供使用。 我們還創建可重用的腳本和資源,使人們可以根據自己的數據訓練自己的模型。 這項工作不僅涉及模型實驗和工程設計,還涉及與其他團隊的大量協作,這些團隊可以更直接地在我們的產品和基礎架構上工作。

In terms of the week-to-week, I’d say half the time goes to writing code for model training/evaluation, data preprocessing, and other typical machine learning tasks, and the other half goes toward communicating about the work: discussing plans, specifications, and progress with product managers, working with our data labeling team to create datasets for new and existing tasks, as well as presenting to the company at large about new developments and improvements of our models.

就每周而言,我會說一半時間用于編寫用于模型訓練/評估,數據預處理和其他典型機器學習任務的代碼,而另一半則用于交流工作:討論計劃,規格和產品經理的進度,與我們的數據標簽團隊合作為新任務和現有任務創建數據集,并向公司全面介紹我們模型的新發展和改進。

A: During your fellowship at SharpestMinds, you developed a toolkit for training “BERT-based named entity recognition models” for an error analysis frontend in Russian to English machine translation. Could you describe your project more detail and share what your three most important takeaways are? C: The TL;DR of the project is that my mentor needed a way to train BERT models to do named-entity recognition in Russian and English, and my task was to learn how to do NER with BERT in PyTorch, then build out the entire pipeline in the form of a git repo that could be cloned and run locally.

答:在SharpestMinds進修期間,您開發了一個工具包,用于培訓“基于BERT的命名實體識別模型”,用于俄語到英語機器翻譯的錯誤分析前端。 您能否更詳細地描述您的項目,并分享三個最重要的要點? C:該項目的TL; DR是我的導師需要一種訓練BERT模型以俄語和英語進行命名實體識別的方法,而我的任務是學習如何在PyTorch中使用BERT進行NER,然后建立可以復制并在本地運行的git repo形式的整個管道。

The resulting trained models could then be used to highlight entities, such as people, places, and organizations, in a user interface, where translators would identify whether a separate model (a Russian-to-English translation model) had made mistakes when translating names from Russian to English. I thought this — using the models to create more powerful, human-friendly software — was super cool, and this project really served to develop my intrigue for building ML-powered tools and interfaces.

然后,可以將生成的經過訓練的模型用于在用戶界面中突出顯示諸如人,地點和組織之類的實體,翻譯人員可以在其中識別出一個單獨的模型(俄語到英語翻譯模型)在翻譯名字時是否犯了錯誤從俄語到英語。 我認為使用模型創建功能更強大,更人性化的軟件非常棒,這個項目確實有助于培養我開發ML驅動的工具和界面的興趣。

At the time that I did this, there was exactly one blog post about using BERT to do NER, and the code didn’t work out-of-the-box for me, so needless to say: there was a lot to figure out along the way! (Regardless, props to the author, Tobias Sterbak, for a very useful post; without it, it would have taken me a lot longer to get started.)

在我這樣做時, 恰好有一篇關于使用BERT進行NER的博客文章 ,并且代碼對我來說不是開箱即用的,所以不用多說:有很多事情要做一路上! (無論如何,對于作者Tobias Sterbak的一篇非常有用的文章的支持;沒有它,花了我很長的時間才能上手。)

For anyone who would like more details, I wrote a 2-part series for In-Q-Tel Labs’ blog about the project as I was wrapping it up. Here are Part 1 and Part 2 of the series, and the repo can be found here.

對于想要了解更多詳細信息的人,我在包裹之前為In-Q-Tel Labs博客撰寫了有關該項目的兩部分系列文章。 這是該系列的第1部分和第2部分 ,可以在此處找到回購。

Image for post
Image courtesy of Charlene Chambliss, original source here Source: https://gab41.lab41.org/how-to-fine-tune-bert-for-named-entity-recognition-2257b5e5ce7e
圖片由Charlene Chambliss提供,此處的原始來源來源: https : //gab41.lab41.org/how-to-fine-tune-bert-for-named-entity-recognition-2257b5e5ce7e

C: What advice would you have for other women who are looking to enter the field?C: If you’re still in school (especially undergrad), you have 3 good options: study computer science and get a broad CS education, study a quantitative subject like applied statistics, economics, or engineering and combine with CS coursework, or study a qualitative subject while teaching yourself how to apply DS/ML to your field. Get research experience, especially if you want to pursue a MS or Ph.D., and if you think you prefer industry, get industry experience (have an internship every summer, maybe even work part-time during the school year). Whichever path you take, be aware that some conceptual understanding of the math behind DS/ML (statistical estimation and probability, linear algebra, and calculus) and good programming skills are still required to be successful in most roles.

C:您對其他想進入該領域的女性有什么建議? C:如果您還在上學(尤其是本科生),則有3個不錯的選擇:學習計算機科學并獲得廣泛的CS教育,學習諸如應用統計,經濟學或工程學的定量學科,并結合CS課程工作或學習教您如何將DS / ML應用于您的領域的定性主題。 獲得研究經驗,特別是如果您想攻讀MS或博士學位,并且如果您認為自己更喜歡行業,請獲得行業經驗(每年夏天都有實習機會,甚至在學年期間可以兼職工作)。 無論采用哪種方法,請注意,對于大多數角色而言,仍然需要對DS / ML背后的數學(統計估計和概率,線性代數和微積分)和良好的編程技巧有一定的概念性理解。

If you are a career-changer already out of school, study the career paths of people who are already in the industry. Try to pay the most attention to people whose backgrounds are similar to yours: for example, if you’re coming from a “non-technical” field that doesn’t involve much math or programming, take note of how other people transitioned in from non-technical fields. Figure out what they needed to do in order to prove that they had sufficient technical skills. Reach out to those people, see if you can get 30 minutes of their time for a phone call, and ask them specific, focused questions on what you would need to do to become hirable for the kinds of roles you’re interested in.

如果您已經是輟學的職業改變者,請研究該行業人士的職業發展道路。 盡量關注與您的背景相似的人:例如,如果您來自一個涉及數學或編程工作不多的“非技術”領域,請注意其他人如何從非技術領域。 弄清楚他們需要做些什么,以證明他們具有足夠的技術技能。 與這些人聯系,看看您是否可以在30分鐘的時間內打通電話,并向他們詢問一些具體的,有針對性的問題,以了解要想聘用感興趣的職位需要做什么。

Career changers should also strongly consider going through a mentorship program such as SharpestMinds, if you’re transitioning from an unrelated background and need help designing and scoping an impressive, professional-quality data science project and preparing for interviews. If you are coming from a lower-paid field as I was, the income-share agreement is a lifesaver since you don’t have to pay anything until you actually get hired in a data science role.

如果您正從不相關的背景過渡并且需要幫助設計和確定令人印象深刻的專業質量的數據科學項目并準備面試,則職業更換者還應強烈考慮通過諸如SharpestMinds之類的指導計劃。 如果您像我一樣來自低薪領域,那么收入分成協議將是一項救命稻草,因為您無需支付任何費用,直到您真正被聘用為數據科學職位為止。

Also, Vicki Boykis’ article Data Science Is Different Now (written last year) is required reading for any aspiring data scientist. I don’t agree that everyone who takes a Coursera course or even a bootcamp is necessarily qualified for entry-level data science, but it absolutely is the case that competition for these roles is fierce, and you will need to do something to differentiate yourself from the many aspirants. As Vicki suggests, taking an adjacent role in general software engineering or data analysis first can be extremely useful for building skills and getting your foot in the door.

此外,任何有抱負的數據科學家都必須閱讀Vicki Boykis的文章“ Data Science Is Now Now” (去年撰寫)。 我不同意參加Coursera課程甚至是訓練營的每個人都必須具備入門級數據科學的資格,但是對于這些角色的競爭絕對是一種極端情況,您將需要做一些事情來使自己與眾不同來自許多有抱負的人。 正如Vicki所建議的那樣,首先在通用軟件工程或數據分析中扮演相鄰角色對于提高技能和踏進大門非常有用。

A: How can our readers connect with you and get involved with your projects?C: You can follow me on Twitter, connect with me on LinkedIn, or just send me an email. I’m pretty heads-down on my work at Primer right now, but if I start any side projects in the future, I’ll be sure to share!

答:我們的讀者如何與您聯系并參與您的項目? C:您可以在Twitter上關注我,在LinkedIn上與我聯系,或者給我發送電子郵件。 我現在在Primer上的工作非常頭腦清醒,但是如果將來我開始任何附帶項目,我一定會分享的!

In addition to innovating in the NLP space as a machine learning engineer, Charlene has also been an active member in the SharpestMinds community. Her career path and multi-disciplinary experience focused on social science and technology continues to empower women technologists to innovate in data science. Today, Charlene is solving the toughest problems in NLG and NLU, thus giving people the power to cut through the noise and understand the world at scale. As well as being a role model for data scientists who have non-traditional career paths, Charlene inspires women from diverse backgrounds to democratize the field of NLP.

除了作為機器學習工程師在NLP領域進行創新之外,Charlene還是SharpestMinds社區的活躍成員。 她的職業道路和專注于社會科學和技術的多學科經驗繼續使女性技術專家有能力在數據科學方面進行創新。 如今,Charlene正在解決NLG和NLU中最棘手的問題,從而使人們有能力消除噪音并大規模地了解世界。 作為具有非傳統職業道路的數據科學家的榜樣,Charlene激發了來自不同背景的女性,使NLP領域民主化。

Special thanks to Charlene Chambliss for allowing me to interview her for this series, and a huge shout out to the TDS Editorial Team for supporting this project.

特別感謝Charlene Chambliss允許我采訪本系列的她,并向TDS編輯團隊大呼支持該項目。

Do you know an inspiring woman in tech who you would like featured in this series? Are you working on any cool data science and tech projects that you’d like me to write about? Feel free to email me at angelamarieteng@gmail.com for comments and suggestions. Thanks for reading!

您是否知道本系??列中有一位鼓舞人心的高科技女性? 您是否正在從事任何我想寫的很酷的數據科學和技術項目? 請隨時通過angelamarieteng@gmail.com給我發送電子郵件,以提出評論和建議。 謝謝閱讀!

[1] Additional information obtained from LinkedIn, available upon request from the author.

[1]從LinkedIn獲得的其他信息,可應作者要求提供。

翻譯自: https://towardsdatascience.com/charlene-chambliss-from-psychology-to-natural-language-processing-and-applied-research-3845b1c83ac0

陸濤喜歡夏琳嗎

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388973.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388973.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388973.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

【angularJS】簡介

簡介 AngularJS 是一個 JavaScript 框架。它可通過 <script> 標簽添加到 HTML 頁面。 AngularJS 通過 指令 擴展了 HTML&#xff0c;且通過 表達式 綁定數據到 HTML。 AngularJS 是一個 JavaScript 框架。它是一個以 JavaScript 編寫的庫。 AngularJS 是以一個 JavaScrip…

爬取淘寶商品信息selenium+pyquery+mongodb

爬取淘寶商品信息,通過selenium獲得渲染后的源碼,pyquery解析,mongodb存儲 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import Timeout…

紋個雞兒天才小熊貓_給熊貓用戶的5個提示

紋個雞兒天才小熊貓A popular Python library used by those working with data is pandas, an easy and flexible data manipulation and analysis library. There are a myriad of awesome methods and functions in pandas, some of which are probably less well-known tha…

本人服務器遭受黑客長期攻擊,特把這幾天做的一些有用的安全方面總結出來,以方便以后查閱

消息隊列iis360northrarsql2000 netscren本人服務器遭受黑客長期攻擊&#xff0c;特把這幾天做的一些有用的安全方面總結出來&#xff0c;以方便以后查閱&#xff0c;希望這次徹底解覺黑客的攻擊&#xff0c;特次謝謝“冷雨夜”的一些提示。 windows 2003服務器安全設置方法 0…

用戶與用戶組管理

linux最優秀的地方之一&#xff0c;就在于他的多用用戶、多任務環境。 用戶及用戶組的概念 1、文件所有者 由于linux是一個多用戶、多任務的系統。因此可能常常會有很多人同時使用這臺主機來進行工作的情況發生&#xff0c;為了考慮每個人的隱私權以及每個人的喜好的工作環境&a…

代碼 摳圖_3 行 Python 代碼 5 秒摳圖的 AI 神器,根本無需 PS,附教程

曾幾何時&#xff0c;「摳圖」是一個難度系數想當高的活兒&#xff0c;但今天要介紹的這款神工具&#xff0c;只要 3 行代碼 5 秒鐘就可以完成高精度摳圖&#xff0c;甚至都不用會代碼&#xff0c;點兩下鼠標就完成了。感受下這款摳圖工具摳地有多精細&#xff1a;是不是很贊&a…

python函數使用易錯舉例

關于嵌套&#xff1a; 嵌套使用中&#xff0c; retrun inner ---> 返回的是函數的地址 retrun inner() &#xff1a; ---> 運行inner()函數 ---> 運行inner()函數后的返回值a&#xff08;假設&#xff09;返回上級 --> retrun inner()得到返回值a 如…

圖像離群值_什么是離群值?

圖像離群值你是&#xff01; (You are!) Actually not. This is not a text about you.其實并不是。 這不是關于您的文字。 But, as Gladwell puts it in Outliers, if you find yourself being that type of outlier, you’re quite lucky. And rare.但是&#xff0c;正如Gla…

混合模型和EM---混合高斯

2019獨角獸企業重金招聘Python工程師標準>>> 混合高斯 最大似然 用于高斯混合模型的EM 轉載于:https://my.oschina.net/liyangke/blog/2986520

永恒python地速_立竿見影地把你的 Python 代碼提速7倍

之前曾經測試計算斐波那契數列的幾種方法&#xff0c;其中基于遞歸的方法是速度最慢的&#xff0c;例如計算第 40 項的值&#xff0c;需要 36 秒。如下圖所示。要提高運算速度&#xff0c;根本辦法當然是改進算法。不過算法的提高是一個長期積累加上靈機一動的過程。我們今天要…

頂尖大學實驗室的科研方法_這是來自頂尖大學的5門免費自然語言處理課程

頂尖大學實驗室的科研方法Data Science continues to be a hot topic, but more specifically, Natural Language Processing (NLP) is increasing in demand.數據科學仍然是一個熱門話題&#xff0c;但更具體地說&#xff0c;自然語言處理(NLP)的需求正在增長。 Broadly spea…

Python學習---django知識補充之CBV

Django知識補充之CBV Django: url --> def函數 FBV[function based view] 用函數和URL進行匹配 url --> 類 CBV[function based view] 用類和URL進行匹配 POSTMAN插件 http://blog.csdn.net/zzy1078689276/article/details/77528249 基于CBV的登…

「CH2101」可達性統計 解題報告

CH2101 可達性統計 描述 給定一張N個點M條邊的有向無環圖&#xff0c;分別統計從每個點出發能夠到達的點的數量。N,M≤30000。 輸入格式 第一行兩個整數N,M&#xff0c;接下來M行每行兩個整數x,y&#xff0c;表示從x到y的一條有向邊。 輸出格式 共N行&#xff0c;表示每個點能夠…

藍圖解鎖怎么用_[UE4藍圖][Materials]虛幻4中可互動的雪地材質完整實現(一)

不說廢話&#xff0c;先上個演示圖最終成果&#xff08;腳印&#xff0c;雪地可慢慢恢復&#xff0c;地形可控制&#xff09;主要原理&#xff08;白話文&#xff09;&#xff1a;假如你頭上是塊白色并且可以透視的平地&#xff0c;來了個非洲兄弟踩上面&#xff0c;你拿起單反…

數據預處理工具_數據預處理

數據預處理工具As the title states this is the last project from Udacity Nanodegree. The goal of this project is to analyze demographics data for customers of a mail-order sales company in Germany.如標題所示&#xff0c;這是Udacity Nanodegree的最后一個項目。…

這幾日英文大匯

int > 整數. 主要?用來進?行行數學運算 str > 字符串串, 可以保存少量量數據并進?行行相應的操作 bool>判斷真假, True, False list> 存儲?大量量數據.?用[ ]表?示 tuple> 元組, 不可以發?生改變 ?用( )表?示 dict>字典,保存鍵值對,?一樣可以…

在網上收集了一部分關于使用Google API進行手機定位的資料和大家分享

在網上收集了一部分關于使用Google API進行手機定位的資料和大家分享&#xff1a;關于基站定位方面的介紹&#xff1a;http://tech.c114.net/164/a140837.html開發方面的幫助&#xff1a;http://www.dotblogs.com.tw/kylin/archive/2009/08/09/9964.aspxhttp://code.google.com…

background圖片疊加_css怎么讓兩張圖片疊加,不用background只用img疊加

展開全部css層疊圖片代碼&#xff1a;//這個層為外面的父層&#xff0c;只需設置相對位置樣式即可//這個為里e69da5e887aa3231313335323631343130323136353331333431363030面要疊加的層&#xff0c;只需設置絕對樣式//這個為層里面的內容圖片//這個為父層內容或者&#xff1a;擴…

“入鄉隨俗,服務為主” 發明者量化兼容麥語言啦!

5年時光 我們裹挾前行。發明者量化從篳路藍縷到步履蹣跚&#xff0c;從以“區塊鏈資產交易”為陣地&#xff0c;再到以“內外盤商品期貨”為依托。再到今天全面兼容“麥語言”。每一步&#xff0c;我們始終都在為建立一個優秀的量化交易平臺而努力。 什么是麥語言&#xff1f; …

自考數據結構和數據結構導論_我跳過大學自學數據科學

自考數據結構和數據結構導論A few months back, I decided I wanted to learn data science. In order to do this, I skipped an entire semester of my data science major.幾個月前&#xff0c;我決定要學習數據科學。 為此&#xff0c; 我跳過了數據科學專業的整個學期。 …