r語言處理數據集編碼_在強調編碼語言或工具之前,請學習這3個基本數據概念

r語言處理數據集編碼

重點 (Top highlight)

I got an Instagram DM the other day that really got me thinking. This person explained that they were a data analyst by trade, and had years of experience. But, they also said that they felt that their technical skills were slightly lacking, as they had never heard of many of the terms mentioned on my page. This person mentioned that they were looking forward to expanding their skill set by learning more technical tools (SQL, Python, R, etc.)

前幾天,我得到了一個Instagram DM,這確實讓我思考。 此人解釋說,他們是貿易數據分析師,并且有多年的經驗。 但是,他們還說,他們覺得自己的技術技能有些欠缺,因為他們從未聽說過我頁面上提到的許多術語。 該人提到他們希望通過學習更多技術工具(SQL,Python,R等)來擴展自己的技能。

As I thought about how to advise this person further, I realized that this person was in the perfect position to make the transition that they desired. Why? They had already mastered the data skills and data mindset that is crucial to being successful in the field of data.

當我考慮如何進一步建議此人時,我意識到該人處于完成他們所希望的過渡的完美位置。 為什么? 他們已經掌握了數據技能和數據思維方式,這對于在數據領域取得成功至關重要。

I (and so many others) worry about mastering every technical tool or product that is out there. I worry about only having experience with Microsoft products (SQL Server, Excel, Power BI), and feel that I need to broaden my horizons to be a better data analyst. I constantly see data scientists questioning and debating online about whether Python or R is better in their line of work.

我(以及許多其他人)擔心掌握其中的每個技術工具或產品。 我擔心只擁有Microsoft產品(SQL Server,Excel,Power BI)的經驗,并感到我需要開闊視野才能成為更好的數據分析師。 我經常看到數據科學家在網上質疑和辯論關于Python還是R在他們的工作中是否更好。

But, speaking with my new Instagram friend helped me realize that these worries and debates are quite silly. Tools and programming languages are constantly evolving and changing, coming and going. But you know what is here to stay? The core concepts. Every tool or language that is ever built will always fall back on these core concepts.

但是,與我的新Instagram朋友交談使我意識到這些擔憂和辯論很愚蠢。 工具和編程語言不斷發展變化,不斷發展。 但是你知道這里還能留下什么嗎? 核心概念。 曾經構建的每種工具或語言都將始終依賴于這些核心概念。

If you understand how to take a data set, manipulate it, and present it in a way that provides genuine insight (or at least invites more questions that you didn’t have before… because that happens!!), you are on the right path to succeed as some sort of data professional.

如果您了解如何獲取數據集,進行操作并以提供真正洞察力的方式進行呈現(或至少會邀請您之前從未有過的其他問題……因為那樣的話!),那么您就對了。成為某種數據專業人員的成功之路。

This base understanding of data is so powerful. You can take this understanding, and combine it with any technical tool of your choice. Then, you can group and filter data for business reporting and KPI monitoring, conduct statistical tests to answer questions about data, predict future data, or even generate AI models to use data to help guide business action. And you can do all these things with huge data sets containing millions and millions of rows!

對數據的基本了解是如此強大。 您可以將這種理解與您選擇的任何技術工具結合起來。 然后,您可以對數據進行分組和過濾以進行業務報告和KPI監控,進行統計測試以回答有關數據的問題,預測未來數據,甚至生成AI模型以使用數據來幫助指導業務行動。 您可以使用包含數百萬行的龐大數據集來完成所有這些工作!

OK I know I’m selling you and selling you on this idea, so let me cut to the chase. If you understand data concepts and how to apply them, you can easily implement these concepts with any technical tool or product of your choice.

好吧,我知道我在這個想法上要賣給你,也要賣給你,所以讓我開始追逐。 如果您了解數據概念以及如何應用它們,則可以使用您選擇的任何技術工具或產品輕松實現這些概念。

But don’t worry, I’m not just here to sell you on this and then head out. I’m going to talk about 3 basic data skills that I use daily as a data analyst, from a general perspective. NO TECHNICAL TERMS OR CODE INVOLVED. If you begin to master these (and other) data concepts, it is EASY PEASY LEMON SQUEEZY to take them and apply them with any tool. I even have a serious life hack at the end of the article that will help you further flex your new data knowledge in any tool you’ve been wanting to master. Stick with me, I got you!

但請放心,我不只是在這里賣給您,然后出發。 我將從總體的角度來談論我日常用作數據分析師的3種基本數據技能。 不涉及技術術語或代碼。 如果您開始掌握這些(和其他)數據概念,則很容易將它們應用于任何工具。 我什至在文章結尾處都有一個嚴肅的生活技巧,可以幫助您在想要掌握的任何工具中進一步擴展新的數據知識。 堅持我,我得到了你!

#1 篩選資料 (#1. Filtering Data)

The first data concept that is crucial in the data world is filtering data. Honestly, filtering data is a super simple concept and one that we as human beings do on a daily basis. Take this example. If you are going to get McDonald’s, you should probably ask your 3 roomies if they want some (because you don’t wanna be that roommate). But, before you go ask your roomies if they want chicken nugs, you remember that 2 out of your 3 roomies don’t even like McDonald’s, so you only end up asking one. Basically, you “filtered out” your two roommates from your “data set” based on some “attribute”, which is whether or not they like McDonald’s.

在數據世界中至關重要的第一個數據概念是過濾數據。 老實說,過濾數據是一個非常簡單的概念,這是我們人類每天都在做的事情。 舉這個例子。 如果要購買麥當勞,您可能應該問問3個室友是否想要一些(因為您不想成為那個室友)。 但是,在您去問您的空姐是否想要雞塊之前,您要記住,三分之二的空姐甚至都不喜歡麥當勞,所以最終只問了一個。 基本上,您是根據某些“屬性”從“數據集”中“過濾”出兩個室友的,這就是他們是否喜歡麥當勞的。

Filtering data as a data analyst or data scientist works the exact same way. If you are conducting an analysis on female customers, you will need to use whatever tool you have at your disposal to filter out the non-female customers. If you are trying to build a model that helps recommend skincare for adults, you would want to filter out any data for non-adult patients.

作為數據分析師或數據科學家過濾數據的方式完全相同。 如果要對女性顧客進行分析,則需要使用可用的任何工具來過濾掉非女性顧客。 如果您試圖建立一個有助于推薦成人皮膚護理的模型,則可能要過濾掉非成人患者的所有數據。

Long story short, filtering data is just taking away all of the undesired data from whatever data set you have, until you are left with whatever data you need for your analysis.

長話短說,過濾數據只是從您擁有的任何數據集中刪除所有不需要的數據,直到您剩下進行分析所需的任何數據為止。

#2。 數據類型轉換 (#2. Data Type Conversion)

Another commonly used data skill is data type conversion. Data types are certain categories that data can fall into when it is stored in a spreadsheet, software, or database. Some common examples of data types are:

另一個常用的數據技能是數據類型轉換。 數據類型是數據存儲在電子表格,軟件或數據庫中時可以歸入的某些類別。 數據類型的一些常見示例是:

  • Strings (ex: “Hello, this is a string.”)

    字符串(例如:“您好,這是一個字符串。”)
  • Integers (ex: 400)

    整數(例如:400)
  • Decimals (ex: 400.17)

    小數(例如:400.17)
  • Booleans (ex: TRUE)

    布爾值(例如:TRUE)

When we are working with a data set, we want to make sure that each data attribute is stored as the correct data type.

在處理數據集時,我們要確保每個數據屬性都存儲為正確的數據類型。

We would not want to store the integer 123 as a string. If we store 123 as a string, the spreadsheet, software, or database would not be able to perform necessary operations on it. The computer would get confused. If we tell the computer that we have a string (“123”), but later we want to add that “123” to something, the computer is going to say “HOLD UP A SECOND. You taught me that “123” was a STRING, which is basically a word. Ya can’t add words crazy person! You can only add numbers!!!!”

我們不想將整數123存儲為字符串。 如果我們將123存儲為字符串,則電子表格,軟件或數據庫將無法對其執行必要的操作。 電腦會感到困惑。 如果我們告訴計算機我們有一個字符串(“ 123”),但是稍后我們想將該“ 123”添加到某個內容中,則計算機將說“ HOLD UP SECOND”。 您告訴我“ 123”是一個STRING,基本上是一個字。 雅不能添加單詞瘋狂的人! 您只能加數字!!!”

Sorry the hypothetical computer got so aggressive there, but you get the point. In order to ensure that we can perform proper operations on our data down the road, we want to absolutely make sure that it is represented as the right type.

抱歉,假設的計算機在那里攻擊性很強,但是您明白了。 為了確保我們可以對數據進行正確的操作,我們要絕對確保將其表示為正確的類型。

#3。 匯總數據 (#3. Aggregating Data)

The final concept that I want to touch on *for now* is aggregating data. Aggregating data is so so so SO powerful. Aggregating data can take you from a big giant text file of rows and columns of data, and turn it into a summary value or a summary table that is much more meaningful and pleasing to the eye.

我現在要談的最后一個概念是聚合數據。 聚合數據是如此強大。 匯總數據可以使您從數據行和列的大型文本文件中獲取,并將其轉變為摘要值或摘要表,這些文件或表格更加有意義并令人賞心悅目。

Notice how I kept saying the word summary up there? It’s probably the best way to explain an aggregation, because aggregations take multiple rows of data and summarize them into a smaller number of rows.

請注意,我在那邊一直說“總結”一詞嗎? 這可能是解釋聚合的最佳方式,因為聚合會吸收多行數據并將其匯總為較少的行數。

Image for post
SQLiteTutorial.NetSQLiteTutorial.Net提供

If you have a data set that contains numbers that would make sense to be added (such as quantities or sales), one of the simplest ways to aggregate that data is to sum it up. In the example below, I took a data set that contained the amount of coffees I drank each day. I applied an aggregation to it by summing it, which created a summary view of my data on the right. This summary shows that I drank a total of 4 coffees (in this data set at least).

如果您的數據集包含要添加的數字(例如數量或銷售額),那么匯總該數據的最簡單方法之一就是對其進行匯總。 在下面的示例中,我獲取了一個數據集,其中包含我每天喝的咖啡量。 我通過匯總對其應用了匯總,從而在右側創建了我的數據的匯總視圖。 此摘要顯示我總共喝了4杯咖啡(至少在此數據集中)。

Image for post

There are many other aggregate operations that are pretty intuitive, even for those that are new to the data world. Each of these operations answers some question that informs us more about our data set. Some examples of other simple aggregate operations are:

還有許多非常直觀的聚合操作,即使對于數據世界中的新操作也是如此。 這些操作中的每一個都會回答一個問題,這些問題可以使我們更多地了解我們的數據集。 其他簡單聚合操作的一些示例包括:

  • Count (how many records are there?)

    計數(有多少條記錄?)
  • Maximum (what’s the biggest observation?)

    最大值(最大的觀察值是什么?)
  • Minimum (what’s the smallest observation?)

    最小(什么是最小觀察值?)
  • Average (what do I tend to observe?)

    平均(我傾向于觀察什么?)

好的,coooOooOol ..那下一步呢? (OK coooOooOol.. so what’s next?)

I know I promised you a life hack earlier, so don’t worry — I didn’t forget. Now that you have got a firmer grasp on some of the most crucial steps in a data professional’s workflow, you can take them and apply them with any technical tool of your choice, even if you are a newbie. How? With our best friend, our ultimate savior, GOOGLE!

我知道我已答應過給您一個生活小知識,所以不用擔心-我沒有忘記。 既然您已經掌握了數據專業人員工作流程中最關鍵的一些步驟,那么即使您是新手,也可以采用這些方法并將其與您選擇的任何技術工具一起應用。 怎么樣? 與我們最好的朋友,我們的終極救星GOOGLE!

Whenever I want to practice any of my skills with some tool, and I need a refresher on how to execute it properly, I will Google in this format:

每當我想使用某種工具來練習我的任何技能,并且需要重新學習如何正確執行它時,我都會以這種格式使用Google:

[insert data skill] in [insert technical tool]

[插入技術工具]中的[插入數據技能]

I swear to you, any time I Google in this format, I always end up finding great documentation, blog posts, or other resources (such as Stack Overflow) that direct my thoughts toward the solution.

我向你發誓,每當我使用這種格式的Google時,總會找到很多很棒的文檔,博客文章或其他資源(例如Stack Overflow),這些思想將我的想法引向解決方案。

So, did you find aggregating data interesting? And are you wanting to better your SQL skills? Then I would recommend reviewing and working on:

那么,您發現匯總數據有趣嗎? 您是否想提高您SQL技能? 然后,我建議您進行審查并進行以下工作:

aggregating data in SQL

在SQL中聚合數據

Are you basically a pro at filtering data in Python, but now you would like to try it out in R? Try my life hack and Google:

您基本上是精通Python過濾數據的專業人士,但是現在您想在R中嘗試一下嗎? 試試我的生活技巧和Google:

filtering data in R

在R中過濾數據

Take it from the girl who overwhelmed herself for months before pursuing her data career dreams. Learn the concepts first. Worry about the tech to get it done later. Technology is always evolving, but the foundations aren’t.

從追求了數據職業夢想的幾個月來讓自己不知所措的女孩那里拿來。 首先學習概念。 擔心技術會在以后完成。 技術始終在發展,但基礎卻沒有。

Originally published at https://datadreamer.io on August 7, 2020.

最初于 2020年8月7日 發布在 https://datadreamer.io

翻譯自: https://towardsdatascience.com/learn-these-3-basic-data-concepts-before-stressing-about-coding-languages-or-tools-e599896e6d4

r語言處理數據集編碼

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388483.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388483.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388483.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

springboot微服務 java b2b2c電子商務系統(一)服務的注冊與發現(Eureka)

一、spring cloud簡介spring cloud 為開發人員提供了快速構建分布式系統的一些工具,包括配置管理、服務發現、斷路器、路由、微代理、事件總線、全局鎖、決策競選、分布式會話等等。它運行環境簡單,可以在開發人員的電腦上跑。Spring Cloud大型企業分布式…

linux部署服務器常用命令

fdisk -l 查分區硬盤 df -h 查空間硬盤 cd / 進目錄 ls/ll 文件列表 vi tt.txt iinsert 插入 shift: 進命令行 wq 保存%退出 cat tt.txt 內容查看 pwd 當期目錄信息 mkdir tt建目錄 cp tt.txt tt/11.txt 拷貝文件到tt下 mv 11.txt /usr/ 移動 rm -rf tt.txt 刪除不提示 rm t…

HTML和CSS面試問題總結,html和css面試總結

html和cssw3c 規范結構化標準語言樣式標準語言行為標準語言1) 盒模型常見的盒模型有w3c盒模型(又名標準盒模型)box-sizing:content-box和IE盒模型(又名怪異盒模型)box-sizing:border-box。標準盒子模型:寬度內容的寬度(content) border padding margin低版本IE盒子…

css清除浮動float的七種常用方法總結和兼容性處理

在清除浮動前我們要了解兩個重要的定義: 浮動的定義:使元素脫離文檔流,按照指定方向發生移動,遇到父級邊界或者相鄰的浮動元素停了下來。 高度塌陷:浮動元素父元素高度自適應(父元素不寫高度時,…

數據遷移測試_自動化數據遷移測試

數據遷移測試Data migrations are notoriously difficult to test. They take a long time to run on large datasets. They often involve heavy, inflexible database engines. And they’re only meant to run once, so people think it’s throw-away code, and therefore …

使用while和FOR循環分布打印字符串S='asdfer' 中的每一個元素

方法1: s asdfer for i in s :print(i)方法2:index 0 while 1:print(s[index])index1if index len(s):break 轉載于:https://www.cnblogs.com/yuhoucaihong/p/10275800.html

山師計算機專業研究生怎么樣,山東師范大學有計算機專業碩士嗎?

山東師范大學位于山東省濟南市,學校是一所綜合性高等師范院校。該院校深受廣大報考專業碩士學員的歡迎,因此很多學員想要知道山東師范大學有沒有計算機專業碩士?山東師范大學是有計算機專業碩士的。下面就和大家介紹一下培養目標有哪些&#…

ZOJ-Crashing Balloon

先從最大的數開始, 深度優先遍歷. 如果是 m 和 n 的公因子, 先遍歷m的, 回溯返回的數值還是公因子, 再遍歷n. 如果有某一或幾條路徑可以讓 m 和 n 變成 1 ,說明 m 和 n 不沖突, m 勝利. 如果沒有找到一條路徑當 n 分解完成時, m 也分解完成, 則判定 m說謊(無論 n 是否說謊), n…

使用TensorFlow概率預測航空乘客人數

TensorFlow Probability uses structural time series models to conduct time series forecasting. In particular, this library allows for a “scenario analysis” form of modelling — whereby various forecasts regarding the future are made.TensorFlow概率使用結構…

python畫激活函數圖像

導入必要的庫 import math import matplotlib.pyplot as plt import numpy as np import matplotlib as mpl mpl.rcParams[axes.unicode_minus] False 繪制softmax函數圖像 fig plt.figure(figsize(6,4)) ax fig.add_subplot(111) x np.linspace(-10,10) y sigmoid(x)ax.s…

計算機網絡管理SIMP,計算機網絡管理實驗報告.docx

計算機網絡管理實驗報告計算機網絡管理實驗報告PAGEPAGE #計算機網絡管理實驗報告作 者: 孫玉虎 學 號:914106840229學院(系):計算機科學與工程學院專 業:網絡工程題 目:SNMR報文禾口 MIB指導教師陸一飛2016年12月目錄…

tomcat集群

1】 下載安裝 httpd-2.2.15-win32-x86-no_ssl.msi 網頁服務器 32-bit Windows zip tomcat mod_jk-1.2.30-httpd-2.2.3.so Apache/IIS 用來連接后臺Tomcat的模塊,支持集群和負載均衡 JK 分為兩個版本 1,x 和 2.x &…

pdf.js插件使用記錄,在線打開pdf

pdf.js插件使用記錄,在線打開pdf 原文:pdf.js插件使用記錄,在線打開pdf天記錄一個js庫:pdf.js。主要是實現在線打開pdf功能。因為項目需求需要能在線查看pdf文檔,所以就研究了一下這個控件。 有些人很好奇,在線打開pdf…

程序員 sql面試_非程序員SQL使用指南

程序員 sql面試Today, the word of the moment is DATA, this little combination of 4 letters is transforming how all companies and their employees work, but most people don’t really know how data behaves or how to access it and they also think that this is j…

Apache+Tomcat集群負載均衡的兩種session處理方式

session共享有兩種方式: 1、session共享,多個服務器session拷貝保存,一臺宕機不會影響用戶的登錄狀態; 2、請求精確集中定位,即當前用戶的請求都集中定位到一臺服務器中,這樣單臺服務器保存了用戶的sessi…

SmartSVN:File has inconsistent newlines

用SmartSVN提交文件的時候,提示svn: File has inconsistent newlines 這是由于要提交的文件編碼時混合了windows和unix符號導致的。 解決方案 SmartSVN設置做如下修改可以解決問題: Project–>Setting選擇Working copy下的EOL-style將Default EOL-sty…

我要認真學Git了 - Config

有一天,當我像往常一樣打開SourceTree提交代碼,然后推送的時候,我突然意識到我只是根據肌肉記憶完成這個過程,我壓根不知道這其中到底發生了什么。這是個很嚴重的問題,作為一個技術人員,居然只滿足于使用工…

計算機科學與技術科研論文,計算機科學與技術學院2007年度科研論文一覽表

1Qiang Sun,Xianwen Zeng, Raihan Ur Rasool, Zongwu Ke, Niansheng Chen. The Capacity of Wireless Ad Hoc Networks with Power Control. IWCLD 2007. (EI收錄: 083511480101)2Hong jia ping. The Application of the AES in the Bootloader of AVR Microcontroller. In: DC…

r a/b 測試_R中的A / B測試

r a/b 測試什么是A / B測試? (What is A/B Testing?) A/B testing is a method used to test whether the response rate is different for two variants of the same feature. For instance, you may want to test whether a specific change to your website lik…

一臺機器同時運行兩個Tomcat

如果不加任何修改,在一臺服務器上同時運行兩個Tomcat服務顯然會發生端口沖突。假設現在已經按照正常的方式安裝配置好了第一個Tomcat,第二個如何設置呢?以下是使用Tomcat5.5解壓版本所做的實驗。 解決辦法: 1.解壓Tomcat到一個新的目錄&#…