熊貓數據集_熊貓邁向數據科學的第二部分

熊貓數據集

If you haven’t read the first article then it is advised that you go through that before continuing with this article. You can find that article here. So far we have learned how to access data in different ways. Now we will learn how to analyze data to get better understanding and then to manipulate it.

如果您還沒有閱讀第一篇文章，那么建議您在繼續閱讀本文之前先進行閱讀。您可以在這里找到該文章。到目前為止，我們已經學習了如何以不同的方式訪問數據。現在，我們將學習如何分析數據以獲得更好的理解，然后進行操作。

So just to give overview, in this article we are going to learn

因此，為了概述，在本文??中我們將學習

How to summarize data?
如何匯總數據？
How to manipulate data?
如何處理數據？

匯總數據 (Summarizing Data)

We have been using different methods to view data which is helpful if we wanted to summarize data for specific rows or columns. However, pandas provide simpler methods to view data.

我們一直在使用不同的方法來查看數據，這對于希望匯總特定行或列的數據很有幫助。但是，熊貓提供了更簡單的方法來查看數據。

If we want to see few data items to understand what kind of data is present in dataset pandas provide methods like head() and tail(). head() provides few rows from the top, by default it provide first five rows and tail(), as you might have guessed, provide rows from bottom of dataset. You can also specify a number to show how many rows you want to display as head(n) or tail(n).

如果我們希望看到很少的數據項以了解數據集中存在的數據類型，熊貓可以提供head()和tail()之類的方法 。 head()從頂部提供幾行，默認情況下，它提供前五行，而您可能已經猜到了tail()，從數據集的底部提供行。您還可以指定一個數字，以顯示要顯示為head(n)或tail(n)的行數。

>> print(titanic_data.head())output : PassengerId  Survived  Pclass  .......
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3
[5 rows x 12 columns]>> print(titanic_data.tail())output :        PassengerId   Survived  Pcl            Name  .........
886          887         0       2     Montvila, Rev. Juozas   
887          888         1       1     Graham, Miss. Margaret Edith   
888          889         0       3  Johnston, Miss. Catherine Hele..
889          890         1       1     Behr, Mr. Karl Howell   
890          891         0       3     Dooley, Mr. Patrick[5 rows x 12 columns]>> print(titanic_data.tail(3))output :        PassengerId   Survived  Pcl            Name  .........
888          889         0       3  Johnston, Miss. Catherine Hele..
889          890         1       1     Behr, Mr. Karl Howell   
890          891         0       3     Dooley, Mr. Patrick[3 rows x 12 columns]

We can also display the data statistics of our dataset. We use describe() method to get statistics for every column. We can also get statistic for a specific column.

我們還可以顯示數據集的數據統計信息。我們使用describe()方法獲取每一列的統計信息。我們還可以獲取特定列的統計信息。

>> print(titanic_data.describe())output :       PassengerId    Survived      Pclass         Age    SibSp  ...
count   891.000000  891.000000  891.000000  714.000000  891.000000   
mean    446.000000    0.383838    2.308642   29.699118    0.523008   
std     257.353842    0.486592    0.836071   14.526497    1.102743   
min       1.000000    0.000000    1.000000    0.420000    0.000000   
25%     223.500000    0.000000    2.000000   20.125000    0.000000   
50%     446.000000    0.000000    3.000000   28.000000    0.000000   
75%     668.500000    1.000000    3.000000   38.000000    1.000000   
max     891.000000    1.000000    3.000000   80.000000    8.000000>> print(titanic_data.Fare.decribe())output :count    891.000000
mean      32.204208
std       49.693429
min        0.000000
25%        7.910400
50%       14.454200
75%       31.000000
max      512.329200
Name: Fare, dtype: float64

Remember, it only return statistical data for numerical columns. It displays statistics like count i.e number of data points in that column, mean of data points, standard deviation and so on. If you do not want to see this whole stats then you can also call on these parameters individually.

請記住，它僅返回數字列的統計數據。它顯示統計信息，例如計數，即該列中數據點的數量，數據點的平均值，標準偏差等。如果您不希望看到整個統計信息，則也可以單獨調用這些參數。

>> print(titanic_data.Fare.mean())output :32.204208

處理數據 (Manipulating Data)

map(): It is use to manipulate data in a Series. We use map() method on a columns of dataset. map() takes a function as parameter and that function takes a data point from specified column as parameter. map() iterates over all data points of a column and then returns new updated series.
map() ：用于處理系列中的數據。我們在dataset. map()的列上使用map()方法dataset. map() dataset. map()將函數作為參數，而該函數將指定列中的數據點作為parameter. map() parameter. map()遍歷列的所有數據點，然后返回新的更新的系列。
apply(): It is used to manipulate data in a Dataframe. It behaves almost same as map() but it takes Series (row or column) as parameter to given function which in return provide updated Series and finally after all iteration of Series, apply() returns a new Dataframe.
apply() ：用于處理數據幀中的數據。它的行為幾乎與map()相同，但是它將Series(行或列)作為給定函數的參數，該函數提供更新的Series，最后在Series的所有迭代之后， apply()返回一個新的Dataframe。

# Here we define a function which will be used as parameter to map()>> def updateUsingMap(data_point):
    '''
    This function make data more readable by changing 
    Survived columns values to Yes if 1 
    and No if 0
    Parameters
    ----------
    data_point : int    Returns
    -------
    data_point : string    '''
    updated_data = ''
    if(data_point==0):
        updated_data = "No"
    else:
        updated_data = "Yes"
    return updated_data>> print(titatic_data.Survived.map(updateUsingMap))output :0       No
1      Yes
2      Yes
3      Yes
4       No
  .....
Name: Survived, Length: 891, dtype: object# Here we define a function which will be used as parameter to apply()def updateUsingApply(row):
    '''
    This function make data more readable by changing 
    Survived columns values to Yes if 1 
    and No if 0
    Parameters
    ----------
    row : Series    Returns
    -------
    row : Series    '''if(row.Survived==0):
        row.Survived = "No"
    else:
        row.Survived = "Yes"
    return row
>> print(titatic_data.apply(updateUsingMap,axis = 'columns'))output :     PassengerId Survived  Pclass  .......
0              1       No       3   
1              2      Yes       1   
2              3      Yes       3   
3              4      Yes       1   
4              5       No       3   
..           ...      ...     ...
[891 rows x 12 columns]

One thing needs to be clear here that these methods do not manipulate or change original data. It creates a new Series or Dataframe. As you noticed that we used another parameter in apply() method that is axis. It is used to specify that we want to change data along the rows. In order to change data along the columns we would have supplied value of axis as index.

需要明確的一點是，這些方法不會操縱或更改原始數據。它創建一個新的系列或數據框。您已經注意到，我們在apply()方法中使用了另一個參數axis。 它用于指定我們要沿行更改數據。為了沿列更改數據，我們將提供軸的值作為索引。

I think it is enough for this article. Let this information sink in and then we can start with next article to explore few more methods in Pandas till then keep practicing. Happy Coding! 😄

我認為這篇文章就足夠了。讓這些信息沉入其中，然后我們可以從下一篇文章開始，探索熊貓中的其他方法，然后繼續練習。編碼愉快！ 😄

普通英語的Python (Python In Plain English)

Did you know that we have three publications and a YouTube channel? Find links to everything at plainenglish.io!

您知道我們有三個出版物和一個YouTube頻道嗎？在plainenglish.io上找到所有內容的鏈接！

翻譯自: https://medium.com/python-in-plain-english/pandas-first-step-towards-data-science-part-2-fd35266deab4

熊貓數據集

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/390018.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/390018.shtml
英文地址，請注明出處：http://en.pswp.cn/news/390018.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！