一件登錄facebook
Between 2018 to 2019, I worked at Facebook as a data scientist — during that time I was involved in developing and teaching a class for R beginners. This was a two-day course that was taught about once a month to a group of roughly 15–20 students, and the goal was that they would leave the class with the ability to use R in their day-to-day work.
乙切口白內障手術挽2018至19年,我曾在Facebook上的數據科學家-那段時間我曾參與開發和教學的R初學者一類。 這是一門為期兩天的課程,每月大約有15至20名學生參加一次該課程,目的是讓他們在日常工作中擁有使用R的能力。
This article goes shares some of the things that I learned from teaching these classes, with an emphasis on what worked well for the students. Hopefully these six tips can be of use for anyone that uses R, especially those just beginning their journey.
本文將分享我從這些課程的教學中學到的一些知識,并重點介紹對學生有效的方法。 希望這六個技巧對使用R的任何人都有用,尤其是剛開始使用R的人。
但是首先,我的個人經驗學習R (But first, my personal experiences learning R)
I initially learned R as a statistics undergrad at Berkeley. In college I despised using R, and used it as a means to an end for completing projects and problem sets so that I could graduate.
我最初在伯克利學習R作為統計專業的本科生。 在大學里,我鄙視使用R,并將其用作完成項目和問題集以達到畢業的目的。
Once I entered the workforce and started learning R from my coworkers, my perspective towards the language started to shift. I realized that there were some key gaps on how R was taught in college — mainly that we were learning R for a classroom setting, which does not translate too well to a workplace setting.
一旦我進入工作隊伍并開始從同事那里學習R,我對語言的看法就開始發生變化。 我意識到在大學教授R的方法上存在一些關鍵空白-主要是我們在教室環境中學習R,這對工作場所的設置并不太好。
Since graduating college, I have grown to embrace R fully— I’ve developed R packages at Facebook and Doordash, taught R at Facebook, and have attended several R conferences. With my background out of the way, I wanted to share some tips and advice for those on their own journey to using R in their day-to-day.
自大學畢業以來,我已經完全擁抱R —我在Facebook和Doordash開發了R軟件包,在Facebook上教過R,并參加了幾次R會議。 在沒有背景的情況下,我想為那些在日常使用R的旅途中的人分享一些技巧和建議。
Note: I graduated college in 2015, so the curriculum has likely improved, so my personal experiences may not be as relevant for more recent college grads.
注意:我于2015年大學畢業,因此課程可能有所改善,因此我的個人經歷可能與最近的大學畢業生不太相關。
1. R不僅適合數據科學家,而且有使用該語言的理由會使學習變得更容易 (1. R is not just for data scientists, and having a reason for using the language will make learning easier)
Before teaching R, I assumed that a large majority our students would be data scientists looking to increase their impact by bringing R into their SQL/Excel workflow. However, I was really surprised by the diversity of people that attended these classes. We had a good mix of software engineers, data scientists, data engineers, researchers, and finance/operations people just to name a few.
在教授R之前,我假設絕大多數學生都是數據科學家,他們希望通過將R引入他們SQL / Excel工作流程來增加其影響。 但是,我對參加這些課程的人的多樣性感到非常驚訝。 我們匯集了軟件工程師,數據科學家,數據工程師,研究人員以及財務/運營人員,僅舉幾例。

For data scientists, their main reason for taking the class was clear — they’re constantly working with data, and learning R will gives them a more effective and flexible way of working with data. Also, learning R will come more naturally as they have a lot of opportunity to practice the language while at the same time making a direct impact on their work.
對于數據科學家而言,他們上課的主要原因很明確-他們一直在處理數據,而學習R將為他們提供一種更有效,更靈活的數據處理方式。 另外,學習R會更自然,因為他們有很多機會練習語言,同時直接影響他們的工作。
When trying to understand why the some of the other students signed up for the class there were a variety of reasons, for example:
當試圖理解為什么其他一些學生報名參加該課程時,有多種原因,例如:
- Engineers who wanted to be able to improve their ability to modify and visualize data. 希望能夠提高其修改和可視化數據能力的工程師。
- Operations and finance looking for an alternative for repetitive daily/weekly Excel updates. 運營和財務部門正在尋找替代方案,以進行每日/每周重復的Excel更新。
- People who are already familiar with R but wanted to freshen up their knowledge and learn how to use it effectively at Facebook. 那些已經熟悉R但想要更新他們的知識并在Facebook上學習如何有效使用它的人們。
In the three examples above, we see ways that non-data scientists can gain value from learning R. These tangible use cases are great things to have to keep focused because learning R takes a fair amount of persistence. Broadly, you want to be in one of these two categories if you’re not a data scientist/analyst:
在上面的三個示例中,我們看到了非數據科學家從學習R中獲得價值的方法。 這些有形的用例是必須重點關注的好事情,因為學習R需要相當多的持久性。 廣義來說,如果您不是數據科學家/分析師,則希望屬于以下兩種類別之一:
- You’re already doing something and it can be improved/made faster by learning R 您已經在做某事,可以通過學習R來改進/更快
You want to do something but it will be very difficult/impossible without knowing R (or some other programming language)
您想做點什么,但是如果不了解R (或其他編程語言) ,將非常困難/不可能。
One last point on this topic —sometimes R is not the best tool for the job. For example, if you already know how to use SQL+Excel you already have a deadly duo of tools to aggregate, analyze, and visualize data. Having used R myself for around 7 years, I often find myself resorting to SQL + Excel simply because it’s faster and more sharable. So if you spend a lot of time learning R, don’t feel like you need to use it for everything because sometimes it will actually take twice as long then if you use tools you’re already an expert in.
關于這個話題的最后一點-有時R并不是完成這項工作的最佳工具。 例如,如果您已經知道如何使用SQL + Excel,那么您已經擁有了致命的工具組合,用于匯總,分析和可視化數據。 使用R本身已有大約7年的時間,我經常發現自己求助于SQL + Excel是因為它更快,更易于共享。 因此,如果您花費大量時間學習R,就不需要使用它來做所有事情,因為有時使用R的時間實際上是使用R的兩倍,而如果您已經是專家。
2. Tidyverse為王 (2. Tidyverse is king)

What is Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
什么是 Tidyverse ? tidyverse是為數據科學而設計的R軟件包的自以為是的集合。 所有軟件包都共享基本的設計理念,語法和數據結構。
The two most popular and useful packages in Tidyverse are:
Tidyverse中兩個最受歡迎和最有用的軟件包是:

To keep this section short and to the point: Tidyverse is the quickest and most straightforward way to aggregate and modify data in R. Not only that, but it makes learning R a lot more fun and easy. I’ve first learned R without Tidyverse and it was a miserable experience, and others who learned R a similar way share my sentiments. Tidyverse has become so widespread amongst R users that I would not recommend learning/teaching R without it.
為了使本節簡短明了 : Tidyverse是聚合和修改R中數據的最快,最直接的方法 。 不僅如此,它還使學習R變得更加有趣和輕松。 我最初是在沒有Tidyverse的情況下學習R的,這是一次痛苦的經歷,而其他以類似方式學習R的人也分享了我的觀點。 Tidyverse已經在R用戶中變得如此普遍,以至于我不建議沒有 Rdy 學習/教學R。
If you’ve never used Tidyverse, it’s super simple to set up and I would highly encourage you to start using it (there are many resources online to learn)
如果您從未使用過Tidyverse,那么它的設置非常簡單,我強烈建議您開始使用它(有很多在線資源可供學習)
# This is all you need to install tidyverse:install.pacakges('tidyverse')
library(tidyverse)
Note: I reference some packages later in this article, if you ever need to install a new package, you can use the function above to do so. Once installed, load it into R using library()
注意:我將在本文后面引用一些軟件包,如果您需要安裝新軟件包,則可以使用上面的功能來安裝。 安裝后,使用library()
其加載到R中
3.備忘單,備忘單,備忘單 (3. Cheatsheets, cheatsheets, cheatsheets)
This goes well with the previous topic because learning Tidyverse can be daunting at first with its unique syntax and long list of functions. Luckily, the RStudio team has created a bunch of cheatsheets. For our in-person classes, we would make sure to print cheat sheets for all of the students so that they wouldn’t have to keep switching tabs to search for functions. If you are able to, I would highly recommend printing and laminating your own cheat sheets for personal use. I still reference my cheat sheets even having used the language for over 5 years.
這與上一個主題非常吻合,因為學習Tidyverse最初可能因其獨特的語法和長功能列表而令人生畏。 幸運的是,RStudio團隊創建了很多備忘單。 對于我們的現場授課,我們將確保為所有學生打印備忘單,這樣他們就不必繼續切換選項卡來搜索功能 。 如果可以的話,我強烈建議您打印并層壓自己的備忘單以供個人使用。 即使使用該語言已有5年以上,我仍然參考我的備忘單。
This website contains a list cheatsheets published by the RStudio team. Some of the topics here are more advanced, but I would say two essential cheat sheets to get started are the ones below:
該網站包含RStudio團隊發布的清單備忘單。 這里的一些主題更高級,但是我要說的是以下兩個基本的備忘單:


4.通過使用內部數據集學習 (4. Learn by using internal datasets)
Within the first hour of class, we have our students query data from the internal database into R. At Facebook, this would be as simple as using our internal package and writing:
在上課的第一個小時內,我們讓學生將內部數據庫中的數據查詢到R中。在Facebook上,這就像使用內部程序包并編寫以下代碼一樣簡單:
df <- presto("SELECT * from example_table limit 10000")
There are two main reasons I recommend learning with internal datasets:
我建議學習內部數據集的主要原因有兩個:
Being able to query internal data directly into your R amplifies your ability to use company data. If you are not able to query internal data directly into R, you’d have to do some sort of workaround such as exporting data into a csv file, then reading that into R. This wastes a lot of time, so I would try to get familiar with bringing data directly into R as early as possible, even if it means an extra hour or two of initial set up/getting the right permissions.
能夠直接查詢R中的內部數據,從而增強了使用公司數據的能力。 如果您無法直接向R查詢內部數據,則必須采取某種變通方法,例如將數據導出到csv文件中,然后再將其讀入R。這會浪費很多時間,因此我將嘗試盡早熟悉將數據直接帶到R中,即使這意味著一兩個小時的初始設置/獲得正確的權限也是如此。
A company’s data is one of its most valuable resources. If you work at Facebook, then you should be taking advantage of the fact that you have some of the richest and most interesting datasets in the world. The same applies with any other company — Uber with its ride data, Airbnb with its bookings data, Medium with data on articles. A lot of online resources will have you use a generic dataset, so I would try to take the extra step and bring in key company datasets when possible to aid your learning. By doing this, you’re already in the mindset of easing R into your workflow.
公司的數據是其最有價值的資源之一。 如果您在Facebook工作,那么您應該利用以下事實:您擁有世界上最豐富,最有趣的數據集。 其他公司也是如此,Uber擁有乘車數據,Airbnb擁有預訂數據,Medium擁有商品數據。 很多在線資源將使您使用通用數據集,因此,我將嘗試采取額外的步驟,并盡可能引入重要的公司數據集,以幫助您學習。 這樣,您就已經可以將R放寬到工作流程中了。
5.導入和導出數據的重要性 (5. The importance of importing and exporting data)
R is a great tool for analyzing data but if you can’t get data into or out of R that’s a really big problem. The previous section touched a little bit on this, so this section is meant to be more practical and goes over some the main methods to get different types of data into/out of R.
R是用于分析數據的好工具,但是如果您無法將數據放入R中或從R中取出,那將是一個很大的問題。 上一節對此進行了一些介紹,因此本節旨在更加實用,并介紹了一些用于將不同類型的數據傳入/傳出R的主要方法。
By focusing on these methods, you should be able to import/export almost 100% of what is necessary. And of course, there is also a cheat sheet that you may find helpful for this:
通過專注于這些方法,您應該能夠導入/導出幾乎100%的必需品。 當然,還有一個備忘單 ,您可能會對此有所幫助:

For importing data:
導入數據:
Csv:
read_csv()
(Tidyverse)read_csv()
:read_csv()
(Tidyverse)Excel:
read_excel()
(Tidyverse)Excel:
read_excel()
(Tidyverse)Google Sheets: Similar to the above, but may require extra steps for private sheets. You want to use the package googlesheets4. Worst case scenario, you export the Google Sheet as a csv and read it in using
read_csv()
Google表格:與上述類似,但對于私人表格可能需要額外的步驟。 您要使用包googlesheets4 。 最壞的情況是,您將Google表格導出為csv并使用
read_csv()
讀取Internal database: Use SQL to bring data directly into R. You’ll need to consult with your data team to see if there is an internal package to do this. At Facebook,
presto("SELECT * FROM tbl")
is all you need to grab data from a table. At smaller companies, there may be some extra steps to connect R to an internal database, but at the very least setting up ODBC connection should allow you to grab data.內部數據庫:使用SQL將數據直接帶到R中。您需要咨詢數據團隊,以查看是否有內部軟件包可以執行此操作。 在Facebook上,只需
presto("SELECT * FROM tbl")
即可從表中獲取數據。 在較小的公司中,可能需要一些額外的步驟才能將R連接到內部數據庫,但是至少要設置ODBC連接才能允許您獲取數據。
For exporting data:
對于導出數據:
Copy to clipboard:
write_clip()
from theclipr
package copies a data frame directly into your clipboard. If your company uses Google Sheets, this is the quickest way to get data into there, so this is one of the most useful functions that you can learn. Essentially, it’s cutting down the steps from:Export df to csv -> Open csv and copy contents -> Paste into Sheets
toCopy df to clipboard -> Paste into Sheets
復制到剪貼板:來自
clipr
包的write_clip()
將數據幀直接復制到剪貼板中。 如果您的公司使用Google表格,這是將數據導入其中的最快方法,因此這是您可以學習的最有用的功能之一。 從本質上講,它減少了以下步驟:Export df to csv -> Open csv and copy contents -> Paste into Sheets
以Copy df to clipboard -> Paste into Sheets
Copy a plot/graph: When you make a graph in R, the easiest way to share it out is to copy/paste it. Simple zoom in on a plot to bring it into its own window, and you can right click and copy the image.
復制圖/圖:在R中創建圖時,最簡單的共享方法是復制/粘貼。 只需簡單地放大繪圖,即可將其帶到其自己的窗口中,然后可以右鍵單擊并復制圖像。
Screenshot directly from R: If you want to share out a small table more informally (i.e. Slack), taking a screenshot of your R console is probably the best bet. If you want to get fancy, you can use the
kable()
function from theknitr
package to clean up your table so that it’s a little easier to read.直接來自R的屏幕截圖:如果您想更非正式地共享一張小桌子(即Slack),那么為R控制臺截圖可能是最好的選擇。 如果您想花哨的話,可以使用
knitr
包中的kable()
函數清理表,以便于閱讀。
# Format the iris table to be a little neateriris %>% head %>% kable

Write to csv:
write_csv()
寫入csv:
write_csv()
Write to internal database: This is usually a lot more complicated than reading from an internal database, but would definitely talk your data team if you think you’ll do this often.
寫入內部數據庫:與從內部數據庫讀取相比,這通常要復雜得多,但是如果您認為自己經常這樣做,肯定會與您的數據團隊聯系。
6.保持簡單,專注于基本原理 (6. Keep it simple and focus on the fundamentals)
There are so many things you can do with R, it can be a little overwhelming at first. For example, just in the cheat sheet link alone, you already see so many topics/packages that R is capable of, and even that is just scratching the surface. Don’t be intimated by this.
R可以做很多事情,一開始可能有點讓人不知所措。 例如,僅在備忘單鏈接中 ,您已經看到了R能夠支持的如此多的主題/程序包,甚至只是在刮擦表面。 不要被這個暗示。
We found that focusing on the fundamentals is the best way to learn R:
我們發現,專注于基礎知識是學習R的最好方法:
- How to import data 如何匯入資料
Modifying the data with
dplyr
to do analysis使用
dplyr
修改數據以進行分析Creating visualizations with
ggplot2
使用
ggplot2
創建可視化- Exporting results to share with your teammates 導出結果以與您的隊友共享
If you are able to do these well, then you will have a strong foundation for doing a lot with R.
如果您能夠做到這些很好,那么您將為使用R做很多事打下堅實的基礎。
總結思想 (Closing thoughts)
I wanted to write this article to because I enjoyed teaching R classes at Facebook, and thought that my unique experiences as an instructor could be helpful for others who do not have access to these types of classes or who are looking for advice on ways to use R more effectively in their own work.
我之所以寫這篇文章,是因為我喜歡在Facebook上教授R課,并認為我作為一名講師的獨特經歷會對那些無法使用此類課程或正在尋求使用方式建議的人有所幫助R在自己的工作中更有效。
翻譯自: https://towardsdatascience.com/6-things-i-learned-from-teaching-r-at-facebook-806fc2832ec0
一件登錄facebook
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389256.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389256.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389256.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!