r語言處理數據集編碼
重點 (Top highlight)
I got an Instagram DM the other day that really got me thinking. This person explained that they were a data analyst by trade, and had years of experience. But, they also said that they felt that their technical skills were slightly lacking, as they had never heard of many of the terms mentioned on my page. This person mentioned that they were looking forward to expanding their skill set by learning more technical tools (SQL, Python, R, etc.)
前幾天,我得到了一個Instagram DM,這確實讓我思考。 此人解釋說,他們是貿易數據分析師,并且有多年的經驗。 但是,他們還說,他們覺得自己的技術技能有些欠缺,因為他們從未聽說過我頁面上提到的許多術語。 該人提到他們希望通過學習更多技術工具(SQL,Python,R等)來擴展自己的技能。
As I thought about how to advise this person further, I realized that this person was in the perfect position to make the transition that they desired. Why? They had already mastered the data skills and data mindset that is crucial to being successful in the field of data.
當我考慮如何進一步建議此人時,我意識到該人處于完成他們所希望的過渡的完美位置。 為什么? 他們已經掌握了數據技能和數據思維方式,這對于在數據領域取得成功至關重要。
I (and so many others) worry about mastering every technical tool or product that is out there. I worry about only having experience with Microsoft products (SQL Server, Excel, Power BI), and feel that I need to broaden my horizons to be a better data analyst. I constantly see data scientists questioning and debating online about whether Python or R is better in their line of work.
我(以及許多其他人)擔心掌握其中的每個技術工具或產品。 我擔心只擁有Microsoft產品(SQL Server,Excel,Power BI)的經驗,并感到我需要開闊視野才能成為更好的數據分析師。 我經常看到數據科學家在網上質疑和辯論關于Python還是R在他們的工作中是否更好。
But, speaking with my new Instagram friend helped me realize that these worries and debates are quite silly. Tools and programming languages are constantly evolving and changing, coming and going. But you know what is here to stay? The core concepts. Every tool or language that is ever built will always fall back on these core concepts.
但是,與我的新Instagram朋友交談使我意識到這些擔憂和辯論很愚蠢。 工具和編程語言不斷發展變化,不斷發展。 但是你知道這里還能留下什么嗎? 核心概念。 曾經構建的每種工具或語言都將始終依賴于這些核心概念。
If you understand how to take a data set, manipulate it, and present it in a way that provides genuine insight (or at least invites more questions that you didn’t have before… because that happens!!), you are on the right path to succeed as some sort of data professional.
如果您了解如何獲取數據集,進行操作并以提供真正洞察力的方式進行呈現(或至少會邀請您之前從未有過的其他問題……因為那樣的話!),那么您就對了。成為某種數據專業人員的成功之路。
This base understanding of data is so powerful. You can take this understanding, and combine it with any technical tool of your choice. Then, you can group and filter data for business reporting and KPI monitoring, conduct statistical tests to answer questions about data, predict future data, or even generate AI models to use data to help guide business action. And you can do all these things with huge data sets containing millions and millions of rows!
對數據的基本了解是如此強大。 您可以將這種理解與您選擇的任何技術工具結合起來。 然后,您可以對數據進行分組和過濾以進行業務報告和KPI監控,進行統計測試以回答有關數據的問題,預測未來數據,甚至生成AI模型以使用數據來幫助指導業務行動。 您可以使用包含數百萬行的龐大數據集來完成所有這些工作!
OK I know I’m selling you and selling you on this idea, so let me cut to the chase. If you understand data concepts and how to apply them, you can easily implement these concepts with any technical tool or product of your choice.
好吧,我知道我在這個想法上要賣給你,也要賣給你,所以讓我開始追逐。 如果您了解數據概念以及如何應用它們,則可以使用您選擇的任何技術工具或產品輕松實現這些概念。
But don’t worry, I’m not just here to sell you on this and then head out. I’m going to talk about 3 basic data skills that I use daily as a data analyst, from a general perspective. NO TECHNICAL TERMS OR CODE INVOLVED. If you begin to master these (and other) data concepts, it is EASY PEASY LEMON SQUEEZY to take them and apply them with any tool. I even have a serious life hack at the end of the article that will help you further flex your new data knowledge in any tool you’ve been wanting to master. Stick with me, I got you!
但請放心,我不只是在這里賣給您,然后出發。 我將從總體的角度來談論我日常用作數據分析師的3種基本數據技能。 不涉及技術術語或代碼。 如果您開始掌握這些(和其他)數據概念,則很容易將它們應用于任何工具。 我什至在文章結尾處都有一個嚴肅的生活技巧,可以幫助您在想要掌握的任何工具中進一步擴展新的數據知識。 堅持我,我得到了你!
#1 篩選資料 (#1. Filtering Data)
The first data concept that is crucial in the data world is filtering data. Honestly, filtering data is a super simple concept and one that we as human beings do on a daily basis. Take this example. If you are going to get McDonald’s, you should probably ask your 3 roomies if they want some (because you don’t wanna be that roommate). But, before you go ask your roomies if they want chicken nugs, you remember that 2 out of your 3 roomies don’t even like McDonald’s, so you only end up asking one. Basically, you “filtered out” your two roommates from your “data set” based on some “attribute”, which is whether or not they like McDonald’s.
在數據世界中至關重要的第一個數據概念是過濾數據。 老實說,過濾數據是一個非常簡單的概念,這是我們人類每天都在做的事情。 舉這個例子。 如果要購買麥當勞,您可能應該問問3個室友是否想要一些(因為您不想成為那個室友)。 但是,在您去問您的空姐是否想要雞塊之前,您要記住,三分之二的空姐甚至都不喜歡麥當勞,所以最終只問了一個。 基本上,您是根據某些“屬性”從“數據集”中“過濾”出兩個室友的,這就是他們是否喜歡麥當勞的。
Filtering data as a data analyst or data scientist works the exact same way. If you are conducting an analysis on female customers, you will need to use whatever tool you have at your disposal to filter out the non-female customers. If you are trying to build a model that helps recommend skincare for adults, you would want to filter out any data for non-adult patients.
作為數據分析師或數據科學家過濾數據的方式完全相同。 如果要對女性顧客進行分析,則需要使用可用的任何工具來過濾掉非女性顧客。 如果您試圖建立一個有助于推薦成人皮膚護理的模型,則可能要過濾掉非成人患者的所有數據。
Long story short, filtering data is just taking away all of the undesired data from whatever data set you have, until you are left with whatever data you need for your analysis.
長話短說,過濾數據只是從您擁有的任何數據集中刪除所有不需要的數據,直到您剩下進行分析所需的任何數據為止。
#2。 數據類型轉換 (#2. Data Type Conversion)
Another commonly used data skill is data type conversion. Data types are certain categories that data can fall into when it is stored in a spreadsheet, software, or database. Some common examples of data types are:
另一個常用的數據技能是數據類型轉換。 數據類型是數據存儲在電子表格,軟件或數據庫中時可以歸入的某些類別。 數據類型的一些常見示例是:
- Strings (ex: “Hello, this is a string.”) 字符串(例如:“您好,這是一個字符串。”)
- Integers (ex: 400) 整數(例如:400)
- Decimals (ex: 400.17) 小數(例如:400.17)
- Booleans (ex: TRUE) 布爾值(例如:TRUE)
When we are working with a data set, we want to make sure that each data attribute is stored as the correct data type.
在處理數據集時,我們要確保每個數據屬性都存儲為正確的數據類型。
We would not want to store the integer 123 as a string. If we store 123 as a string, the spreadsheet, software, or database would not be able to perform necessary operations on it. The computer would get confused. If we tell the computer that we have a string (“123”), but later we want to add that “123” to something, the computer is going to say “HOLD UP A SECOND. You taught me that “123” was a STRING, which is basically a word. Ya can’t add words crazy person! You can only add numbers!!!!”
我們不想將整數123存儲為字符串。 如果我們將123存儲為字符串,則電子表格,軟件或數據庫將無法對其執行必要的操作。 電腦會感到困惑。 如果我們告訴計算機我們有一個字符串(“ 123”),但是稍后我們想將該“ 123”添加到某個內容中,則計算機將說“ HOLD UP SECOND”。 您告訴我“ 123”是一個STRING,基本上是一個字。 雅不能添加單詞瘋狂的人! 您只能加數字!!!”
Sorry the hypothetical computer got so aggressive there, but you get the point. In order to ensure that we can perform proper operations on our data down the road, we want to absolutely make sure that it is represented as the right type.
抱歉,假設的計算機在那里攻擊性很強,但是您明白了。 為了確保我們可以對數據進行正確的操作,我們要絕對確保將其表示為正確的類型。
#3。 匯總數據 (#3. Aggregating Data)
The final concept that I want to touch on *for now* is aggregating data. Aggregating data is so so so SO powerful. Aggregating data can take you from a big giant text file of rows and columns of data, and turn it into a summary value or a summary table that is much more meaningful and pleasing to the eye.
我現在要談的最后一個概念是聚合數據。 聚合數據是如此強大。 匯總數據可以使您從數據行和列的大型文本文件中獲取,并將其轉變為摘要值或摘要表,這些文件或表格更加有意義并令人賞心悅目。
Notice how I kept saying the word summary up there? It’s probably the best way to explain an aggregation, because aggregations take multiple rows of data and summarize them into a smaller number of rows.
請注意,我在那邊一直說“總結”一詞嗎? 這可能是解釋聚合的最佳方式,因為聚合會吸收多行數據并將其匯總為較少的行數。

If you have a data set that contains numbers that would make sense to be added (such as quantities or sales), one of the simplest ways to aggregate that data is to sum it up. In the example below, I took a data set that contained the amount of coffees I drank each day. I applied an aggregation to it by summing it, which created a summary view of my data on the right. This summary shows that I drank a total of 4 coffees (in this data set at least).
如果您的數據集包含要添加的數字(例如數量或銷售額),那么匯總該數據的最簡單方法之一就是對其進行匯總。 在下面的示例中,我獲取了一個數據集,其中包含我每天喝的咖啡量。 我通過匯總對其應用了匯總,從而在右側創建了我的數據的匯總視圖。 此摘要顯示我總共喝了4杯咖啡(至少在此數據集中)。

There are many other aggregate operations that are pretty intuitive, even for those that are new to the data world. Each of these operations answers some question that informs us more about our data set. Some examples of other simple aggregate operations are:
還有許多非常直觀的聚合操作,即使對于數據世界中的新操作也是如此。 這些操作中的每一個都會回答一個問題,這些問題可以使我們更多地了解我們的數據集。 其他簡單聚合操作的一些示例包括:
- Count (how many records are there?) 計數(有多少條記錄?)
- Maximum (what’s the biggest observation?) 最大值(最大的觀察值是什么?)
- Minimum (what’s the smallest observation?) 最小(什么是最小觀察值?)
- Average (what do I tend to observe?) 平均(我傾向于觀察什么?)
好的,coooOooOol ..那下一步呢? (OK coooOooOol.. so what’s next?)
I know I promised you a life hack earlier, so don’t worry — I didn’t forget. Now that you have got a firmer grasp on some of the most crucial steps in a data professional’s workflow, you can take them and apply them with any technical tool of your choice, even if you are a newbie. How? With our best friend, our ultimate savior, GOOGLE!
我知道我已答應過給您一個生活小知識,所以不用擔心-我沒有忘記。 既然您已經掌握了數據專業人員工作流程中最關鍵的一些步驟,那么即使您是新手,也可以采用這些方法并將其與您選擇的任何技術工具一起應用。 怎么樣? 與我們最好的朋友,我們的終極救星GOOGLE!
Whenever I want to practice any of my skills with some tool, and I need a refresher on how to execute it properly, I will Google in this format:
每當我想使用某種工具來練習我的任何技能,并且需要重新學習如何正確執行它時,我都會以這種格式使用Google:
[insert data skill] in [insert technical tool]
[插入技術工具]中的[插入數據技能]
I swear to you, any time I Google in this format, I always end up finding great documentation, blog posts, or other resources (such as Stack Overflow) that direct my thoughts toward the solution.
我向你發誓,每當我使用這種格式的Google時,總會找到很多很棒的文檔,博客文章或其他資源(例如Stack Overflow),這些思想將我的想法引向解決方案。
So, did you find aggregating data interesting? And are you wanting to better your SQL skills? Then I would recommend reviewing and working on:
那么,您發現匯總數據有趣嗎? 您是否想提高您SQL技能? 然后,我建議您進行審查并進行以下工作:
aggregating data in SQL
在SQL中聚合數據
Are you basically a pro at filtering data in Python, but now you would like to try it out in R? Try my life hack and Google:
您基本上是精通Python過濾數據的專業人士,但是現在您想在R中嘗試一下嗎? 試試我的生活技巧和Google:
filtering data in R
在R中過濾數據
Take it from the girl who overwhelmed herself for months before pursuing her data career dreams. Learn the concepts first. Worry about the tech to get it done later. Technology is always evolving, but the foundations aren’t.
從追求了數據職業夢想的幾個月來讓自己不知所措的女孩那里拿來。 首先學習概念。 擔心技術會在以后完成。 技術始終在發展,但基礎卻沒有。
Originally published at https://datadreamer.io on August 7, 2020.
最初于 2020年8月7日 發布在 https://datadreamer.io 。
翻譯自: https://towardsdatascience.com/learn-these-3-basic-data-concepts-before-stressing-about-coding-languages-or-tools-e599896e6d4
r語言處理數據集編碼
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388483.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388483.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388483.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!