Pandas index操作索引
索引(index)是 Pandas 的重要工具,通過索引可以從 DataFame 中選擇特定的行數和列數,這種選擇數據的方式稱為“子集選擇”。
在 Pandas 中,索引值也被稱為標簽(label),它在 Jupyter 筆記本中以粗體字進行顯示。索引可以加快數據訪問的速度,它就好比數據的書簽,通過它可以實現數據的快速查找。
創建索引
通過示例對 index 索引做進一步講解。下面創建一個帶有 index 索引的數據,并使用 read_csv() 這些讀取數據:
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx")
print(df)
輸出結果:
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
0 7369 SMITH CLERK 7902.0 1980-12-17 800 NaN 20
1 7499 ALLEN SALESMAN 7698.0 1981-02-20 1600 300.0 30
2 7521 WARD SALESMAN 7698.0 1981-02-22 1250 500.0 30
3 7566 JONES MANAGER 7839.0 1981-04-02 2975 NaN 20
4 7654 MARTIN SALESMAN 7698.0 1981-09-28 1250 1400.0 30
5 7698 BLAKE MANAGER 7839.0 1981-05-01 2850 NaN 30
6 7782 CLARK MANAGER 7839.0 1981-06-09 2450 NaN 10
7 7788 SCOTT ANALYST 7566.0 1987-04-19 3000 NaN 20
8 7839 KING PRESIDENT NaN 1981-11-17 5000 NaN 10
9 7844 TURNER SALESMAN 7698.0 1981-09-08 1500 0.0 30
10 7876 ADAMS CLERK 7788.0 1987-05-23 1100 NaN 20
11 7900 JAMES CLERK 7698.0 1981-12-03 950 NaN 30
12 7902 FORD ANALYST 7566.0 1981-12-03 3000 NaN 20
13 7934 MILLER CLERK 7782.0 1982-01-23 1300 NaN 10
通過列索引(標簽)讀取多列數據。
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx")print('讀取EMPNO、ENAME、JOB、MGR:\n',df[['EMPNO','ENAME','JOB','MGR']])
# 或使用loc函數
#print('讀取EMPNO、ENAME、JOB、MGR:\n',df.loc[:,['EMPNO','ENAME','JOB','MGR']])
輸出結果:
讀取EMPNO、ENAME、JOB、MGR:EMPNO ENAME JOB MGR
0 7369 SMITH CLERK 7902.0
1 7499 ALLEN SALESMAN 7698.0
2 7521 WARD SALESMAN 7698.0
3 7566 JONES MANAGER 7839.0
4 7654 MARTIN SALESMAN 7698.0
5 7698 BLAKE MANAGER 7839.0
6 7782 CLARK MANAGER 7839.0
7 7788 SCOTT ANALYST 7566.0
8 7839 KING PRESIDENT NaN
9 7844 TURNER SALESMAN 7698.0
10 7876 ADAMS CLERK 7788.0
11 7900 JAMES CLERK 7698.0
12 7902 FORD ANALYST 7566.0
13 7934 MILLER CLERK 7782.0
再看一組簡單的示例:
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx",index_col='EMPNO')print('讀取ENAME、JOB、MGR:\n',df[['ENAME','JOB','MGR']])
輸出結果:
讀取EMPNO、ENAME、JOB、MGR:ENAME JOB MGR
EMPNO
7369 SMITH CLERK 7902.0
7499 ALLEN SALESMAN 7698.0
7521 WARD SALESMAN 7698.0
7566 JONES MANAGER 7839.0
7654 MARTIN SALESMAN 7698.0
7698 BLAKE MANAGER 7839.0
7782 CLARK MANAGER 7839.0
7788 SCOTT ANALYST 7566.0
7839 KING PRESIDENT NaN
7844 TURNER SALESMAN 7698.0
7876 ADAMS CLERK 7788.0
7900 JAMES CLERK 7698.0
7902 FORD ANALYST 7566.0
7934 MILLER CLERK 7782.0
設置索引
set_index() 將已存在的列標簽設置為 DataFrame 行索引。除了可以添加索引外,也可以替換已經存在的索引。比如您也可以把 Series 或者一個 DataFrme 設置成另一個 DataFrame 的索引。示例如下:
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx")
print('讀取ENAME、JOB、MGR:\n',df[['EMPNO','ENAME','JOB','MGR']].set_index(['EMPNO']))
輸出結果:
讀取ENAME、JOB、MGR:ENAME JOB MGR
EMPNO
7369 SMITH CLERK 7902.0
7499 ALLEN SALESMAN 7698.0
7521 WARD SALESMAN 7698.0
7566 JONES MANAGER 7839.0
7654 MARTIN SALESMAN 7698.0
7698 BLAKE MANAGER 7839.0
7782 CLARK MANAGER 7839.0
7788 SCOTT ANALYST 7566.0
7839 KING PRESIDENT NaN
7844 TURNER SALESMAN 7698.0
7876 ADAMS CLERK 7788.0
7900 JAMES CLERK 7698.0
7902 FORD ANALYST 7566.0
7934 MILLER CLERK 7782.0
重置索引
您可以使用 reset_index() 來恢復初始行索引,示例如下:
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx")
print('讀取ENAME、JOB、MGR:\n',df[['EMPNO','ENAME','JOB','MGR']].set_index(['EMPNO']))
print('重置后的索引:\n',df[['EMPNO','ENAME','JOB','MGR']].reset_index())
輸出結果:
讀取ENAME、JOB、MGR:ENAME JOB MGR
EMPNO
7369 SMITH CLERK 7902.0
7499 ALLEN SALESMAN 7698.0
7521 WARD SALESMAN 7698.0
7566 JONES MANAGER 7839.0
7654 MARTIN SALESMAN 7698.0
7698 BLAKE MANAGER 7839.0
7782 CLARK MANAGER 7839.0
7788 SCOTT ANALYST 7566.0
7839 KING PRESIDENT NaN
7844 TURNER SALESMAN 7698.0
7876 ADAMS CLERK 7788.0
7900 JAMES CLERK 7698.0
7902 FORD ANALYST 7566.0
7934 MILLER CLERK 7782.0
重置后的索引:index EMPNO ENAME JOB MGR
0 0 7369 SMITH CLERK 7902.0
1 1 7499 ALLEN SALESMAN 7698.0
2 2 7521 WARD SALESMAN 7698.0
3 3 7566 JONES MANAGER 7839.0
4 4 7654 MARTIN SALESMAN 7698.0
5 5 7698 BLAKE MANAGER 7839.0
6 6 7782 CLARK MANAGER 7839.0
7 7 7788 SCOTT ANALYST 7566.0
8 8 7839 KING PRESIDENT NaN
9 9 7844 TURNER SALESMAN 7698.0
10 10 7876 ADAMS CLERK 7788.0
11 11 7900 JAMES CLERK 7698.0
12 12 7902 FORD ANALYST 7566.0
13 13 7934 MILLER CLERK 7782.0