熊貓數據集

I started learning Data Science like everyone else by creating my first model using some machine learning technique. My first line of code was :

通過使用某種機器學習技術創建我的第一個模型，我開始像其他所有人一樣學習數據科學。我的第一行代碼是：

import pandas as pd

Apart from noticing a cuddly bear name, I didn’t pay much attention to this library but used it a lot while creating models. Soon I realized that I was underestimating power of Pandas, it can do more than Kung-fu and that is what we are going to learn through the series of articles where I am going to explore Pandas library to gain skills which can help us analyze data in depth.

除了注意到一個可愛的熊名外，我并沒有過多地關注這個庫，但是在創建模型時經常使用它。很快，我意識到我低估了熊貓的力量，它比功夫還可以做更多的事情，這就是我們將通過系列文章學習的內容，在這些文章中，我將探索熊貓圖書館以獲得技能，以幫助我們分析數據深入。

In this article, we will understand

在本文中，我們將了解

How to read data using Pandas?
如何使用熊貓讀取數據？
How data is stored ?
數據如何存儲？
How can we access data ?
我們如何訪問數據？

什么是熊貓？ (What is Pandas ?)

Pandas is a python library for data analysis and manipulation. That said, pandas revolve all around data. Data that we read through pandas is most commonly in Comma Seperated Values or csv format.

Pandas是用于數據分析和處理的python庫。就是說，大熊貓圍繞著數據。我們通過熊貓讀取的數據通常以逗號分隔值或csv格式顯示。

如何讀取數據？ (How to read data ?)

We use read_csv() method to read csv file which is first line of code that we all come across when we start using Pandas library. Remember to import pandas before you start coding.

我們使用read_csv()方法讀取csv文件，這是我們開始使用Pandas庫時遇到的第一行代碼。在開始編碼之前，請記住要導入熊貓。

import pandas as pdtitanic_data = pd.read_csv("../Dataset/titanic.csv")

In this article we are going to use Titanic database, which you can access from here. After reading data using pd.read_csv(), we store it in a variable titanic_data which is of type Dataframe.

在本文中，我們將使用Titanic數據庫，您可以從此處訪問它。使用pd.read_csv()讀取數據后，我們將其存儲在Dataframe類型的變量titanic_data中。

什么是數據框？ (What is a Dataframe ?)

Dataframe is collection of data in rows and columns.Technically, dataframes are made up of individual Series. Series is simply a list of data. Lets understand with some example code

數據框是行和列中數據的集合。從技術上講，數據框由各個Series組成。 系列只是數據列表。讓我們看一些示例代碼

#We use pd.Series() to create a series in Pandas>> colors = pd.Series(['Blue','Green']) 
>> print(colors)output:0     Blue
1    Green
dtype: object>> names_list = ['Ram','Shyam']
>> names = pd.Series(names_list)output:0      Ram
1    Shyam
dtype: object

We provide a list as parameter to pd.Series() method which create a series with index. As default, index starts with 0. However, we can even change index since index is also a series.

我們提供一個列表作為pd.Series()方法的參數，該方法創建帶有索引的序列。默認情況下，索引以0開頭。但是，由于索引也是一個序列，因此我們甚至可以更改索引。

>> index = pd.Series(["One","Two"])
>> colors = pd.Series(['Blue','Green'],index = index) 
>> print(colors)output:One     Blue
Two    Green
dtype: object

Now coming back to our definition, Dataframe is collection of individual Series. Let us use colors and names series that we initialized above to create a dataframe.

現在回到我們的定義，Dataframe是各個系列的集合。讓我們使用上面初始化的顏色和名稱系列來創建數據框。

>> df = pd.DataFrame({"Colors":colors,"Names":names})
>> print(df)output:   Colors  Names
0   Blue    Ram
1  Green  Shyam

We used pd.DataFrame() to create a dataframe and passed a dictionary to it. Keys of this dictionary represents the column name and values represents corresponding data to that column which is a series. So from above example you can understand that Dataframe is nothing but collection of series. We can also change index of the Dataframe in same manner as we did with series.

我們使用pd.DataFrame()創建一個數據框，并向其傳遞了一個字典。該字典的鍵代表列名，值代表該列的對應數據，該列是一個序列。因此，從以上示例中您可以理解，Dataframe只是系列的集合。我們也可以像處理序列一樣更改Dataframe的索引。

>> index = pd.Series(["One","Two"])
>> colors = pd.Series(['Blue','Green'],index = index) 
>> names = pd.Series(['Ram','Shyam'],index = index)# Creating a Dataframe
>> data = pd.DataFrame({"Colors":colors,"Names":names},index=index)
>> print(data)output:Colors  Names
One   Blue    Ram
Two  Green  Shyam

So far we have understood how we read csv data and how this data is represented. Lets move on to understand how can we access this data.

到目前為止，我們已經了解了如何讀取csv數據以及如何表示該數據。讓我們繼續了解如何訪問這些數據。

如何從數據框訪問數據？ (How to access data from Dataframes ?)

There are two ways to access data from Dataframes :

有兩種方法可以從數據框訪問數據：

Column-wise
列式
Row-wise
逐行

列式 (Column-wise)

First of all let us check columns in our Titanic data

首先讓我們檢查一下泰坦尼克號數據中的列

>> print(titanic_data.columns)output:Index(['Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket','Fare', 'Cabin', 'Embarked'],
      dtype='object')

We can now access data using column name in two ways either by using column name as property of our dataset object or by using column name as index of our dataset object. Advantage of using column name as index is that we can use columns with names such as “First Name”,”Last Name” which is not possible to use as property.

現在，我們可以通過兩種方式使用列名訪問數據：將列名用作數據集對象的屬性，或者將列名用作數據集對象的索引。使用列名作為索引的優點是我們可以使用名稱不能使用的列，例如“ First Name”，“ Last Name”。

# Using column name as property>> print(titanic_data.Name)output:0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
....
Name: Name, Length: 891, dtype: object# Using column name as index
>> print(titanic_data['Name'])output:0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
....
Name: Name, Length: 891, dtype: object>> print(titanic_data['Name'][0])output:Braund, Mr. Owen Harris

逐行 (Row-wise)

In order to access data row-wise we use methods like loc() and iloc(). Lets take a look at some example to understand these methods.

為了按行訪問數據，我們使用loc()和iloc()之類的方法。讓我們看一些例子來了解這些方法。

# Using loc() to display a row
>> print(titanic_data.loc[0])output:PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                                 22
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Name: 0, dtype: object# Using iloc() to display a row
>> print(titanic_data.iloc[0])output: same as above>> print(titanic_data.loc[0,'Name'])output:Braund, Mr. Owen Harris>> print(titanic_data.iloc[0,3])output: same as above

As we saw in code above, we access rows using their index values and to further grill down to a specific value in a row we use either column name or column index. Remember as we saw earlier that columns are also stored as list whose index start from 0. So first column “PassengerId” is present at index 0. Apart from this we saw a difference between loc() and iloc() methods. Both perform same task but in a different way.

正如我們在上面的代碼中所看到的，我們使用行的索引值訪問行，并進一步使用行名或列索引將行取到特定的值。記住，如前所述，列也存儲為索引從0開始的列表。因此第一列“ PassengerId”出現在索引0。除此之外，我們還看到了loc()和iloc()方法之間的區別。兩者執行相同的任務，但方式不同。

We can also access more than one row at a time with all or some columns. Lets understand how

我們還可以一次訪問全部或部分列的多個行。讓我們了解如何

# To display whole dataset
>> print(titanic_data.loc[:]) # or titanic_data.iloc[:]output:     PassengerId  Survived  Pclass  .....
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1
...
[891 rows x 12 columns]# To display first four rows with Name and Ticket
>> print(titanic_data.loc[:3,["Name","Ticket"]]) # or titanic_data.iloc[:3,[3,8]]output:                                Name            Ticket
0               Braund, Mr. Owen Harris         A/5 21171
1  Cumings, Mrs. John Bradley (Flor...          PC 17599
2               Heikkinen, Miss. Laina          STON/O2. 3101282
3  Futrelle, Mrs. Jacques Heath....             113803

I hope you got an idea to use loc() and iloc() methods, also understood the difference between two methods. With this we come to end of this article. We will continue exploring Pandas library in second part but till then keep practicing. Happy Coding !

希望您對使用loc()和iloc()方法有所了解，也希望您理解兩種方法之間的區別。至此，我們結束了本文。我們將在第二部分中繼續探索Pandas圖書館，但在此之前繼續練習。編碼愉快！

翻譯自: https://medium.com/swlh/pandas-first-step-towards-data-science-91b39beb825c

熊貓數據集

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/390031.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/390031.shtml
英文地址，請注明出處：http://en.pswp.cn/news/390031.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！

SQLServer鎖的機制

SQLServer鎖的機制：共享鎖(S)排它鎖(X)更新鎖(U)意向共享 (IS)意向排它 (IX) 意向排它共享 (SIX)架構修改(Sch-M) 架構穩定性(Sch-S)大容量更新（BU）轉載于:https://www.cnblogs.com/yldIndex/p/8603902.html

你是否具有價值

一個有價值的人往往受歡迎的程度才會高。白天上午花了兩個多小時的時間幫前同事遠程解決了服務器部署時由于防火墻機制問題引起的系統功能失敗的問題。解決完這個問題之后，同事的心情很愉悅，其實我自己的心情也很愉悅，看來人都有幫助別人和被…

為什么選擇做班級管理系統_為什么即使在平衡的班級下準確性也很麻煩

為什么選擇做班級管理系統Accuracy is a go-to metric because it’s highly interpretable and low-cost to evaluate. For this reason, accuracy — perhaps the most simple of machine learning metrics — is (rightfully) commonplace. However, it’s also true that m…

使用Chrome開發者工具調試Android端內網頁(微信，QQ，UC，App內嵌頁等)

使用Chrome開發者工具調試Android端內網頁(微信，QQ，UC，App內嵌頁等) 傳送門轉載于:https://www.cnblogs.com/momozjm/p/9389912.html

517. 超級洗衣機

517. 超級洗衣機假設有 n 臺超級洗衣機放在同一排上。開始的時候，每臺洗衣機內可能有一定量的衣服，也可能是空的。在每一步操作中，你可以選擇任意 m (1 < m < n) 臺洗衣機，與此同時將每臺洗衣機的一件衣服送到相鄰的一臺…

netflix的準實驗面臨的主要挑戰

重點 (Top highlight)Kamer Toker-Yildiz, Colin McFarland, Julia GlickKAMER Toker-耶爾德茲 ， 科林麥克法蘭 ， Julia格里克 At Netflix, when we can’t run A/B experiments we run quasi experiments! We run quasi experiments with various obje…

網站漏洞檢測針對區塊鏈網站安全分析

2019獨角獸企業重金招聘Python工程師標準>>> 目前移動互聯網中，區塊鏈的網站越來越多，在區塊鏈安全上，很多都存在著網站漏洞，區塊鏈的充值，會員賬號的存儲性XSS竊取漏洞，賬號安全，等…

223. 矩形面積

223. 矩形面積給你二維平面上兩個由直線構成的矩形，請你計算并返回兩個矩形覆蓋的總面積。每個矩形由其左下頂點和右上頂點坐標表示： 第一個矩形由其左下頂點 (ax1, ay1) 和右上頂點 (ax2, ay2) 定義。第二個矩形由其左下頂點 (bx1, by1) …

微觀計量經濟學_微觀經濟學與數據科學

微觀計量經濟學什么是經濟學和微觀經濟學？ (What are Economics and Microeconomics?) Economics is a social science concerned with the production, distribution, and consumption of goods and services. It studies how individuals, businesses, governmen…

NPM 重新回爐

官方教程傳送門( 英文 ) 本文主要是官方文章的精煉,適合想了解一些常用操作的同學們 NPM 是基于node的一個包管理工具 , 安裝node環境時會自帶安裝NPM. NPM版本管理查看現有版本 npm -v 安裝最新的穩定版本 npm install npmlatest -g 安裝最新的測試版本 npm install npmn…

1436. 旅行終點站

1436. 旅行終點站給你一份旅游線路圖，該線路圖中的旅行線路用數組 paths 表示，其中 paths[i] [cityAi, cityBi] 表示該線路將會從 cityAi 直接前往 cityBi 。請你找出這次旅行的終點站，即沒有任何可以通往其他城市的線路的城市。題目數據…

如何使用fio模擬線上環境

線上表現這里我想通過fio來模擬線上的IO場景，那么如何模擬呢？ 首先使用iostat看線上某個盤的使用情況，這里我們需要關注的是 avgrq-sz, avgrq-qz. #iostat -dx 1 1000 /dev/sdk Device: rrqm/s wrqm/s r/s w/s rkB/s …

熊貓數據集_熊貓邁向數據科學的第二部分

熊貓數據集If you haven’t read the first article then it is advised that you go through that before continuing with this article. You can find that article here. So far we have learned how to access data in different ways. Now we will learn how to analyze …