熊貓數據集
I started learning Data Science like everyone else by creating my first model using some machine learning technique. My first line of code was :
通過使用某種機器學習技術創建我的第一個模型,我開始像其他所有人一樣學習數據科學。 我的第一行代碼是:
import pandas as pd
Apart from noticing a cuddly bear name, I didn’t pay much attention to this library but used it a lot while creating models. Soon I realized that I was underestimating power of Pandas, it can do more than Kung-fu and that is what we are going to learn through the series of articles where I am going to explore Pandas library to gain skills which can help us analyze data in depth.
除了注意到一個可愛的熊名外,我并沒有過多地關注這個庫,但是在創建模型時經常使用它。 很快,我意識到我低估了熊貓的力量,它比功夫還可以做更多的事情,這就是我們將通過系列文章學習的內容,在這些文章中,我將探索熊貓圖書館以獲得技能,以幫助我們分析數據深入。
In this article, we will understand
在本文中,我們將了解
- How to read data using Pandas? 如何使用熊貓讀取數據?
- How data is stored ? 數據如何存儲?
- How can we access data ? 我們如何訪問數據?
什么是熊貓? (What is Pandas ?)
Pandas is a python library for data analysis and manipulation. That said, pandas revolve all around data. Data that we read through pandas is most commonly in Comma Seperated Values or csv format.
Pandas是用于數據分析和處理的python庫。 就是說,大熊貓圍繞著數據。 我們通過熊貓讀取的數據通常以逗號分隔值或csv格式顯示。
如何讀取數據? (How to read data ?)
We use read_csv() method to read csv file which is first line of code that we all come across when we start using Pandas library. Remember to import pandas before you start coding.
我們使用read_csv()方法讀取csv文件,這是我們開始使用Pandas庫時遇到的第一行代碼。 在開始編碼之前,請記住要導入熊貓。
import pandas as pdtitanic_data = pd.read_csv("../Dataset/titanic.csv")
In this article we are going to use Titanic database, which you can access from here. After reading data using pd.read_csv(), we store it in a variable titanic_data which is of type Dataframe.
在本文中,我們將使用Titanic數據庫,您可以從此處訪問它。 使用pd.read_csv()讀取數據后,我們將其存儲在Dataframe類型的變量titanic_data中。
什么是數據框? (What is a Dataframe ?)
Dataframe is collection of data in rows and columns.Technically, dataframes are made up of individual Series. Series is simply a list of data. Lets understand with some example code
數據框是行和列中數據的集合。從技術上講,數據框由各個Series組成。 系列只是數據列表。 讓我們看一些示例代碼
#We use pd.Series() to create a series in Pandas>> colors = pd.Series(['Blue','Green'])
>> print(colors)output:0 Blue
1 Green
dtype: object>> names_list = ['Ram','Shyam']
>> names = pd.Series(names_list)output:0 Ram
1 Shyam
dtype: object
We provide a list as parameter to pd.Series() method which create a series with index. As default, index starts with 0. However, we can even change index since index is also a series.
我們提供一個列表作為pd.Series()方法的參數,該方法創建帶有索引的序列。 默認情況下,索引以0開頭。但是,由于索引也是一個序列,因此我們甚至可以更改索引。
>> index = pd.Series(["One","Two"])
>> colors = pd.Series(['Blue','Green'],index = index)
>> print(colors)output:One Blue
Two Green
dtype: object
Now coming back to our definition, Dataframe is collection of individual Series. Let us use colors and names series that we initialized above to create a dataframe.
現在回到我們的定義,Dataframe是各個系列的集合。 讓我們使用上面初始化的顏色和名稱系列來創建數據框。
>> df = pd.DataFrame({"Colors":colors,"Names":names})
>> print(df)output: Colors Names
0 Blue Ram
1 Green Shyam
We used pd.DataFrame() to create a dataframe and passed a dictionary to it. Keys of this dictionary represents the column name and values represents corresponding data to that column which is a series. So from above example you can understand that Dataframe is nothing but collection of series. We can also change index of the Dataframe in same manner as we did with series.
我們使用pd.DataFrame()創建一個數據框,并向其傳遞了一個字典。 該字典的鍵代表列名,值代表該列的對應數據,該列是一個序列。 因此,從以上示例中您可以理解,Dataframe只是系列的集合。 我們也可以像處理序列一樣更改Dataframe的索引。
>> index = pd.Series(["One","Two"])
>> colors = pd.Series(['Blue','Green'],index = index)
>> names = pd.Series(['Ram','Shyam'],index = index)# Creating a Dataframe
>> data = pd.DataFrame({"Colors":colors,"Names":names},index=index)
>> print(data)output:Colors Names
One Blue Ram
Two Green Shyam
So far we have understood how we read csv data and how this data is represented. Lets move on to understand how can we access this data.
到目前為止,我們已經了解了如何讀取csv數據以及如何表示該數據。 讓我們繼續了解如何訪問這些數據。
如何從數據框訪問數據? (How to access data from Dataframes ?)
There are two ways to access data from Dataframes :
有兩種方法可以從數據框訪問數據:
- Column-wise 列式
- Row-wise 逐行
列式 (Column-wise)
First of all let us check columns in our Titanic data
首先讓我們檢查一下泰坦尼克號數據中的列
>> print(titanic_data.columns)output:Index(['Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket','Fare', 'Cabin', 'Embarked'],
dtype='object')
We can now access data using column name in two ways either by using column name as property of our dataset object or by using column name as index of our dataset object. Advantage of using column name as index is that we can use columns with names such as “First Name”,”Last Name” which is not possible to use as property.
現在,我們可以通過兩種方式使用列名訪問數據:將列名用作數據集對象的屬性 ,或者將列名用作數據集對象的索引 。 使用列名作為索引的優點是我們可以使用名稱不能使用的列,例如“ First Name”,“ Last Name”。
# Using column name as property>> print(titanic_data.Name)output:0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
....
Name: Name, Length: 891, dtype: object# Using column name as index
>> print(titanic_data['Name'])output:0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
....
Name: Name, Length: 891, dtype: object>> print(titanic_data['Name'][0])output:Braund, Mr. Owen Harris
逐行 (Row-wise)
In order to access data row-wise we use methods like loc() and iloc(). Lets take a look at some example to understand these methods.
為了按行訪問數據,我們使用loc()和iloc()之類的方法。 讓我們看一些例子來了解這些方法。
# Using loc() to display a row
>> print(titanic_data.loc[0])output:PassengerId 1
Survived 0
Pclass 3
Name Braund, Mr. Owen Harris
Sex male
Age 22
SibSp 1
Parch 0
Ticket A/5 21171
Fare 7.25
Cabin NaN
Embarked S
Name: 0, dtype: object# Using iloc() to display a row
>> print(titanic_data.iloc[0])output: same as above>> print(titanic_data.loc[0,'Name'])output:Braund, Mr. Owen Harris>> print(titanic_data.iloc[0,3])output: same as above
As we saw in code above, we access rows using their index values and to further grill down to a specific value in a row we use either column name or column index. Remember as we saw earlier that columns are also stored as list whose index start from 0. So first column “PassengerId” is present at index 0. Apart from this we saw a difference between loc() and iloc() methods. Both perform same task but in a different way.
正如我們在上面的代碼中所看到的,我們使用行的索引值訪問行,并進一步使用行名或列索引將行取到特定的值。 記住,如前所述,列也存儲為索引從0開始的列表。因此第一列“ PassengerId”出現在索引0。除此之外,我們還看到了loc()和iloc()方法之間的區別。 兩者執行相同的任務,但方式不同。
We can also access more than one row at a time with all or some columns. Lets understand how
我們還可以一次訪問全部或部分列的多個行。 讓我們了解如何
# To display whole dataset
>> print(titanic_data.loc[:]) # or titanic_data.iloc[:]output: PassengerId Survived Pclass .....
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
...
[891 rows x 12 columns]# To display first four rows with Name and Ticket
>> print(titanic_data.loc[:3,["Name","Ticket"]]) # or titanic_data.iloc[:3,[3,8]]output: Name Ticket
0 Braund, Mr. Owen Harris A/5 21171
1 Cumings, Mrs. John Bradley (Flor... PC 17599
2 Heikkinen, Miss. Laina STON/O2. 3101282
3 Futrelle, Mrs. Jacques Heath.... 113803
I hope you got an idea to use loc() and iloc() methods, also understood the difference between two methods. With this we come to end of this article. We will continue exploring Pandas library in second part but till then keep practicing. Happy Coding !
希望您對使用loc()和iloc()方法有所了解,也希望您理解兩種方法之間的區別。 至此,我們結束了本文。 我們將在第二部分中繼續探索Pandas圖書館,但在此之前繼續練習。 編碼愉快!
翻譯自: https://medium.com/swlh/pandas-first-step-towards-data-science-91b39beb825c
熊貓數據集
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/390031.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/390031.shtml 英文地址,請注明出處:http://en.pswp.cn/news/390031.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!