數據分析之pandas筆記

Pandas

一個用于表示表格類型的內容

課時4：jupyter21 分22 秒
課時5：pandas的內容24 分31 秒
課時6：series內容38 分19 秒
課時7：dataframe25 分50 秒

# 載入pandas庫
import pandas as pd
import numpy as np

s = pd.Series([2,4,6,8,10])

0     2
1     4
2     6
3     8
4    10
dtype: int64

d = pd.DataFrame([[2,4,6,8,10],[7,3,4,7,15],
])d

	0	1	2	3	4
0	2	4	6	8	10
1	7	3	4	7	15

d[0]

0    2
1    7
Name: 0, dtype: int64

這里要注意直接用中括號獲取的是,列,因為比如我們要獲取一個表中的age屬性,通常的拿這age一列的數據出來,所以想要獲取一條數據,需要再中括號一下

獲取一行怎么獲取

d.loc[0]

0     2
1     4
2     6
3     8
4    10
Name: 0, dtype: int64

這個給我們返回的是一個series
實際上這個dataframe是由多個series組成的
所以我們可以這么寫

d2 = pd.DataFrame([pd.Series([2,4,6,8,10]),pd.Series([7,3,4,7,15]),
])
d2

	0	1	2	3	4
0	2	4	6	8	10
1	7	3	4	7	15

class1 = pd.Series({'hong': 50, 'huang': 90, 'qing': 60})# 修改字典索引
class1_values = {'hong': 50, 'huang': 90, 'qing': 60}
class1_index = ['hong', 'lv', 'lan']
# 這個地方的鍵是根據index參數設置的,然后前面的那個字典的鍵就不要了
class1 = pd.Series(class1_values, index=class1_index)
class1

hong    50.0
lv       NaN
lan      NaN
dtype: float64

class1# 值數據，輸出類型為array，還是ndarray數組
class1.values# 索引，輸出index類型（Pandas獨有的索引類型）,本質上就是ndarray
class1.indexclass1.index[2]
class1.index.values

array(['hong', 'lv', 'lan'], dtype=object)

class1_index
class1.hong

50.0

class1[[1,2,0]]

lv       NaN
lan      NaN
hong    50.0
dtype: float64

class1[0:1]

hong    50.0
dtype: float64

# 直接就能記性判斷
class1 > 6
# 這個Nan值你怎么判斷都是False

hong     True
lv      False
lan     False
dtype: bool

# 還能這樣寫
# 這種寫法很類似于數據庫的寫法
class1[class1>6]

hong    50.0
dtype: float64

# 直接就全都加一
class1+1

hong    51.0
lv       NaN
lan      NaN
dtype: float64

這種整體的加一,他是效率非常非常高的
如果是我們的列表,想要實現這個效果,那就得循環這個列表
從列表中獲取一個數據,把這個數據+1,放到新的列表中
而我們這個是將三條數據同時拿出來(就像并發一樣),然后同時進行+1操作
然后在同時放到一個新的里面.
我們可以通過那個運算時間的魔術命令來幫忙驗證一下

%%timeit
# 修改字典索引
class2_values = [1024,3,5,7,9,10,13,115,127,149,221]
# 這個地方的鍵是根據index參數設置的,然后前面的那個字典的鍵就不要了
class2 = pd.Series(class2_values)
class2+1

198 μs ± 9.37 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
class2+1

100 μs ± 3.56 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
for i in range(100000):i+=1

4.12 ms ± 108 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
a = pd.Series(range(100000))
a+1

562 μs ± 72 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

我猜可能是因為這個數據量不夠大,還顯示不出來這個庫的優勢,所以得多試試才行
有的時候需要用GPU來計算,如果用CPU,會非常耗CPU,因為GPU更擅長這種小量的計算,他就相當于一堆小學生,這中加減乘除,小學生比CPU數學家更厲害

# 不僅能夠進行加,減號,還能乘除,取余,底板除
print(class2 // 2)

11.0
11.0

class2 = pd.Series([1024,3,5,7,9,10,13,115,127,149,221])
# 平均數
print(class2.mean())
print(np.mean(class2))
class2

153.0
153.00     1024
1        3
2        5
3        7
4        9
5       10
6       13
7      115
8      127
9      149
10     221
dtype: int64

class3 = pd.Series([1024,13,5,7,9,10,1,115,127,149,221])
# 中位數
# 通過庫中的函數調用
print(np.median(class3))
# 自身屬性調用寫法
print(class3.median())
# 中位數如果有兩個數據,那就是這兩個數據的平均數

13.0
13.0

# 方差
class2.var()

89190.6

# 標準差
class2.std()

298.6479532827908

print(class2)
print("-"*50)
print(class2+1)
print("-"*50)
# 全判斷在不在容器中
# 這個容器包括類似于字典的鍵和值,都都算上,只有有都行,都算存在啊
print(10 in class2)
print("-"*50)
print(5 in class2 + 1)
# 浮點數運算不準的問題

0     1024
1        3
2        5
3        7
4        9
5       10
6       13
7      115
8      127
9      149
10     221
dtype: int64
--------------------------------------------------
0     1025
1        4
2        6
3        8
4       10
5       11
6       14
7      116
8      128
9      150
10     222
dtype: int64
--------------------------------------------------
True
--------------------------------------------------
True

# 然后問我們可以取出來values
print(4 in class2) 
print(4 in class2.values)

True
False

# values值修改
class2['ming'] = 0
class2['hua'] = 0
class2['hong'] = 0class2[['hua','hong']] = 55
class2[['hua','hong']] = [35, 55]
class2['hua','hong'] = [1, 2]  # 一層也可以
class2

0       1024
1          3
2          5
3          7
4          9
5         10
6         13
7        115
8        127
9        149
10       221
ming       0
hua        1
hong       2
dtype: int64

# 深拷貝
class4 = class2.copy()
class4 = class4+1
print(class2)
class4

0       1024
1          3
2          5
3          7
4          9
5         10
6         13
7        115
8        127
9        149
10       221
ming       0
hua        1
hong       2
dtype: int640       1025
1          4
2          6
3          8
4         10
5         11
6         14
7        116
8        128
9        150
10       222
ming       1
hua        2
hong       3
dtype: int64

# 索引也可以單獨的進行修改

class2.index = [22,23,24,28,24,29,1,2,3,4,8,5,9,21]
class2

22    1024
23       3
24       5
28       7
24       9
29      10
1       13
2      115
3      127
4      149
8      221
5        0
9        1
21       2
dtype: int64

# 這個csv路徑不能有中文,否則獲取失敗
df = pd.read_csv("./source/test.csv")
df

	ro	c1	c2	c3	c4	c5	c6	c7	c8	c9	c10	c11	c12	c13	c14	c15	c16	c17	c18
0	a	0	5	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10
1	b	1	6	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11
2	c	2	7	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12
3	d	3	8	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13
4	e	4	9	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14

csv中的數據都是用逗號隔開的,出自:
python:pandas——read_csv方法

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/455199.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/455199.shtml
英文地址，請注明出處：http://en.pswp.cn/news/455199.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！

數據分析之pandas筆記

Pandas

相關文章

Apache日志記錄組件Log4j出現反序列化漏洞黑客可以執行任意代碼所有2.x版本均受影響...

react接收后端文件_React獲取Java后臺文件流并下載Excel文件流程解析

Python常見的170道面試題全解析答案

WMA文件信息格式分析及代碼

[No0000101]JavaScript-基礎課程1

研究人員發現Office Word 0Day攻擊這個漏洞繞過了word宏安全設置綠盟科技、McAfee及FireEye發出警告...

秀米svg點擊顯示另一張圖_SVG的雷，你踩過了沒？

關于C10K問題詳解-突破單機性能是高性能網絡編程

數據中心傳輸需求成以太網市場巨大推動力

Gina DLL

ultilize什么意思_ultilize是什么意思

HTML，CSS的class與id命名規則

主打“云安全” 迅雷系帝恩思登陸新三板

UESTC 1636 夢后樓臺高鎖，酒醒簾幕低垂

wav文件格式分析詳解

pg數據庫開啟遠程連接_Postgresql開啟遠程訪問的步驟全紀錄

Vue.js前后端分離2

WORD列表縮進的文本起始點

無人車火了百度是如何做到的？

測繪技術設計規定最新版_測繪技術設計規定

	ro	c1	c2	c3	c4	c5	c6	c7	c8	c9	c10	c11	c12	c13	c14	c15	c16	c17	c18
0	a	0	5	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10
1	b	1	6	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11
2	c	2	7	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12
3	d	3	8	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13
4	e	4	9	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14

	ro	c1	c2	c3	c4	c5	c6	c7	c8	c9	c10	c11	c12	c13	c14	c15	c16	c17	c18
0	a	0	5	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10
1	b	1	6	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11
2	c	2	7	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12
3	d	3	8	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13
4	e	4	9	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14

	ro	c1	c2	c3	c4	c5	c6	c7	c8	c9	c10	c11	c12	c13	c14	c15	c16	c17	c18
0	a	0	5	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10
1	b	1	6	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11
2	c	2	7	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12
3	d	3	8	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13
4	e	4	9	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14