明確指定數據的類型

通過dtypes屬性進行查看

import pandas as pddf = pd.DataFrame({'A': ['1', '2', '4'],'B': ['9', '-80', '5.3'],'C': ['x', '5.9', '0']})
print("df.dtypes:\n", df.dtypes)
print("df:\n", df)

輸出結果：

df.dtypes:A    object
B    object
C    object
dtype: object
df:A    B    C
0  1    9    x
1  2  -80  5.9
2  4  5.3    0

創建Pandas對象指定數據類型

data = pd.DataFrame({'A': ['1', '2', '4'],'B': ['9', '80', '5']},dtype='int')
print("data:\n", data)
print("data.dtypes:\n", data.dtypes)

輸出結果：

data:A   B
0  1   9
1  2  80
2  4   5
data.dtypes:A    int32
B    int32
dtype: object

轉換數據類型

通過astype()方法強制轉換數據的類型

astype(dypte, copy=True, errors = ‘raise’, **kwargs)

上述方法中部分參數表示的含義如下：

dtype：表示數據類型

copy：是否建立副本，默認為True

errors：錯誤采取的處理方式，可以取值為raise或ignore，默認為raise。其中raise表示允許引發異常，ignore表示抑制異常。

運用astype()方法將DataFrame對象df中B列數據的類型轉換為int類型：

print("df['B']:\n", df['B'])
print("df['B'].astype:\n", df['B'].astype(dtype='float'))

df['B']:0      9
1    -80
2    5.3
Name: B, dtype: object
df['B'].astype:0     9.0
1   -80.0
2     5.3
Name: B, dtype: float64

之所以沒有將所有列進行類型轉換是因為C列中有非數字類型的字符，無法將其轉換為int類型，若強制轉換會出現ValueError異常。（當參數errors取值ignore時可以抑制異常，但抑制異常后輸出結果仍是未轉換類型之前的對象——也就是并未進行數據類型轉換的操作，只是不會報錯罷了）

print("df['C']:\n", df['C'])
print("df['C'].astype(errors='ignore'):\n", df['C'].astype(dtype='float', errors='ignore'))

輸出結果：

df['C']:0      x
1    5.9
2      0
Name: C, dtype: object
df['C'].astype(errors='ignore'):0      x
1    5.9
2      0
Name: C, dtype: object

通過to_numeric()函數轉換數據類型

to_numeric()函數不能直接操作DataFrame對象

pandas.to_numeric(arg, errors=‘raise’, downcast=None)

上述函數中常用參數表示的含義如下：

arg：表示要轉換的數據，可以是list、tuple、Series

errors：錯誤采用的處理方式可以取值除raise、ignore外，還可以取值coerce，默認為raise。其中raise表示允許引發異常，ignore表示抑制異常。

to_numeric()函數較之astype()方法的優勢在于解決了后者的局限性：只要待轉換的數據中存在數字以外的字符，在使用后者進行類型轉換時就會出現錯誤，而to_numeric()函數之所以可以解決這個問題，就源于其errors參數可以取值coerce——當出現非數字字符時，會將其替換為缺失值之后進行數據類型轉換。

se = pd.Series(df['A'])
se1 = pd.Series(df['B'])
se2 = pd.Series(df['C'])
print("df['A']:\n", df['A'])
print("to_numeric(df['A']):\n", pd.to_numeric(se))
print("df['B']:\n", df['B'])
print("to_numeric(df['B']):\n", pd.to_numeric(se1))
print("df['C']:\n", df['C'])
print("to_numeric(df['C'], errors='ignore'):\n", pd.to_numeric(se2, errors='ignore'))
print("to_numeric(df['C'], errors='coerce'):\n", pd.to_numeric(se2, errors='coerce'))

輸出結果：

df['A']:0    1
1    2
2    4
Name: A, dtype: object
to_numeric(df['A']):0    1
1    2
2    4
Name: A, dtype: int64
df['B']:0      9
1    -80
2    5.3
Name: B, dtype: object
to_numeric(df['B']):0     9.0
1   -80.0
2     5.3
Name: B, dtype: float64
df['C']:0      x
1    5.9
2      0
Name: C, dtype: object
to_numeric(df['C'], errors='ignore'):0      x
1    5.9
2      0
Name: C, dtype: object
to_numeric(df['C'], errors='coerce'):0    NaN
1    5.9
2    0.0
Name: C, dtype: float64

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/443857.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/443857.shtml
英文地址，請注明出處：http://en.pswp.cn/news/443857.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！