Python酷庫之旅-第三方庫Pandas(004)

一、用法精講

5、pandas.DataFrame.to_csv函數

5-1、語法

5-2、參數

5-3、功能

5-4、返回值

5-5、說明

5-6、用法

5-6-1、代碼示例

5-6-2、結果輸出

6、pandas.read_fwf函數

6-1、語法

6-2、參數

6-3、功能

6-4、返回值

6-5、說明

6-6、用法

6-6-1、代碼示例

6-6-2、結果輸出?

二、推薦閱讀

1、Python筑基之旅

2、Python函數之旅

3、Python算法之旅

4、Python魔法之旅

5、博客個人主頁

一、用法精講

5、pandas.DataFrame.to_csv函數

5-1、語法

# 5、pandas.DataFrame.to_csv函數
DataFrame.to_csv(path_or_buf=None, *, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', lineterminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors='strict', storage_options=None)
Write object to a comma-separated values (csv) file.Parameters:
path_or_bufstr, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.sepstr, default ‘,’
String of length 1. Field delimiter for the output file.na_repstr, default ‘’
Missing data representation.float_formatstr, Callable, default None
Format string for floating point numbers. If a Callable is given, it takes precedence over other numeric formatting parameters, like decimal.columnssequence, optional
Columns to write.headerbool or list of str, default True
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.indexbool, default True
Write row names (index).index_labelstr or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.mode{‘w’, ‘x’, ‘a’}, default ‘w’
Forwarded to either open(mode=) or fsspec.open(mode=) to control the file opening. Typical values include:‘w’, truncate the file first.‘x’, exclusive creation, failing if the file already exists.‘a’, append to the end of file if it exists.encodingstr, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.compressionstr or dict, default ‘infer’
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.New in version 1.5.0: Added support for .tar files.May be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.Passing compression options as keys in dict is supported for compression modes ‘gzip’, ‘bz2’, ‘zstd’, and ‘zip’.quotingoptional constant from csv module
Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.quotecharstr, default ‘"’
String of length 1. Character used to quote fields.lineterminatorstr, optional
The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (’\n’ for linux, ‘\r\n’ for Windows, i.e.).Changed in version 1.5.0: Previously was line_terminator, changed for consistency with read_csv and the standard library ‘csv’ module.chunksizeint or None
Rows to write at a time.date_formatstr, default None
Format string for datetime objects.doublequotebool, default True
Control quoting of quotechar inside a field.escapecharstr, default None
String of length 1. Character used to escape sep and quotechar when appropriate.decimalstr, default ‘.’
Character recognized as decimal separator. E.g. use ‘,’ for European data.errorsstr, default ‘strict’
Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.Returns:
None or str
If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.

5-2、參數

5-2-1、path_or_buf(可選，默認值為None)：指定要寫入的文件路徑(字符串或路徑對象)或任何文件狀對象。如果為None，則輸出將作為字符串返回，而不是寫入文件。

5-2-2、sep(可選，默認值為',')：字段之間的分隔符，可以根據需要更改為其他字符，如制表符('\t')用于制表符分隔的值(TSV)。

5-2-3、na_rep(可選，默認值為'')：缺失值(NaN)的表示，你可以指定任何你想要的字符串來表示缺失值。

5-2-4、float_format(可選，默認值為None)：浮點數的格式字符串。例如，'%.2f'會將浮點數格式化為保留兩位小數的字符串。

5-2-5、columns(可選，默認值為None)：要寫入的列名列表。如果為None，則寫入所有列。

5-2-6、header(可選，默認值為True)：是否將列名寫入文件作為第一行。如果為False，則不寫入列名；也可以是一個字符串列表，用于指定要作為文件頭部寫入的列名(注意：這可能會改變列的順序)。

5-2-7、index(可選，默認值為True)：是否將行索引寫入文件。如果為False，則不寫入索引。

5-2-8、index_label(可選，默認值為None)：如果需要，可以使用此參數來更改索引列的列名。如果為False，則不寫入索引名稱。如果為字符串或字符串序列，則用作索引的列名。

5-2-9、mode(可選，默認值為'w')：文件打開模式，若執行寫入模式，如果文件已存在則覆蓋。

5-2-10、encoding(可選，默認值為None)：指定文件的編碼方式。

5-2-11、compression(可選，默認值為'infer')：指定壓縮的字符串(如'gzip'、'bz2'、'zip'、'xz')，或者一個包含壓縮選項的字典。如果為'infer'并且文件擴展名是.gz、.bz2、.zip或.xz，則自動推斷壓縮方式。

5-2-12、quoting(可選，默認值為None)：控制字段中引號的使用。

5-2-13、quotechar(可選，默認值為"")：引號字符，用于包圍字段中的特殊字符。

5-2-14、lineterminator(可選，默認值為None)：行結束符。

5-2-15、chunksize(可選，默認值為None)：如果設置了，則文件將被寫入指定的塊大小，這對于大文件可能很有用，因為它可以減少內存使用量。

5-2-16、date_format(可選，默認值為None)：日期時間對象的格式字符串。

5-2-17、doublequote(可選，默認值為True)：控制是否將字段內的quotechar(引號字符)加倍(即當字段內容中已包含引號字符時，使用雙引號來包圍該字段)，這在處理需要被引號包圍且內容中已包含引號的字段時非常有用。

5-2-18、escapechar(可選，默認值為None)：轉義字符，用于轉義引號字符(如果quoting參數不是csv.QUOTE_NONE且字段中包含引號字符時)。如果指定了escapechar，則quotechar字符前的escapechar會被用來轉義quotechar，而不是加倍quotechar。

5-2-19、decimal(可選，默認值為'.')：用于表示浮點數的小數點字符，這在處理不同地域的數據時非常有用，因為某些地區可能使用逗號(,)作為小數點字符。

5-2-20、errors(可選，默認值為'strict')：指定如何處理編碼錯誤。有效選項包括'strict'、'ignore'、'replace'、'surrogatepass'等，'strict'(默認值)將引發異常，'ignore'將忽略錯誤，'replace'將使用?替換錯誤字符，'surrogatepass'將允許通過代理對(surrogate pairs)表示UTF-16字符，這可能在某些情況下導致不可預見的錯誤。

5-2-21、storage_options(可選，默認值為None)：對于支持額外存儲選項的文件系統(如S3、GCS等)，此參數允許你傳遞額外的選項給底層的存儲系統。例如，在寫入S3時，你可以使用storage_options={'key':'secret','bucket_name':'mybucket'}來傳遞認證信息和桶名。

5-3、功能

????????將DataFrame中的數據寫入到指定的文件路徑或文件狀對象中。

5-4、返回值

5-4-1、如果path_or_buf參數是一個文件路徑或文件狀對象，則DataFrame.to_csv()函數通常沒有返回值(即返回None)，因為它直接將數據寫入到指定的文件中。

5-4-2、如果path_or_buf參數為None，則函數返回一個字符串，該字符串包含了DataFrame的CSV表示形式，這允許你在不直接寫入文件的情況下獲取CSV格式的字符串數據。

5-5、說明

? ? ? ? 無

5-6、用法

5-6-1、代碼示例

# 5、pandas.DataFrame.to_csv函數
# 5-1、無返回值
import pandas as pd
# 創建一個簡單的DataFrame
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],'Age': [24, 27, 22],'City': ['New York', 'Los Angeles', 'Chicago']
})
# 將DataFrame導出為CSV文件
csv_str = df.to_csv('people.csv', index=False)  # 注意：這里沒有返回值
print(csv_str)# 5-2、有返回值
import pandas as pd
# 創建一個包含數據的字典
data = {'Name': ['Alice', 'Bob', 'Charlie'],'Age': [24, 27, 22],'City': ['New York', 'Los Angeles', 'Chicago']
}
# 使用字典創建DataFrame
df = pd.DataFrame(data)
# 將DataFrame轉換為CSV格式的字符串
# index=False: 不包含行索引
# sep=';': 使用分號作為分隔符
# na_rep='N/A': 用'N/A'表示缺失值
# line_terminator='\n': 使用換行符分隔行
csv_string = df.to_csv(index=False, sep=';', na_rep='N/A', lineterminator='\n')
# 打印CSV字符串
print(csv_string)# 5-3、指定文件路徑
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [24, 27, 22], 'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
csv_string = df.to_csv('data.csv', index=False)
print(csv_string)# 5-4、使用文件對象
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [24, 27, 22], 'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
with open('data.csv', 'w') as file:csv_string = df.to_csv(file, index=False)
print(csv_string)# 5-5、使用StringIO
import pandas as pd
from io import StringIO
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [24, 27, 22], 'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
buffer = StringIO()
df.to_csv(buffer, index=False)
csv_string = buffer.getvalue()
print(csv_string)

5-6-2、結果輸出

# 5-1、無返回值
None# 5-2、有返回值
Name;Age;City
Alice;24;New York
Bob;27;Los Angeles
Charlie;22;Chicago# 5-3、指定文件路徑
None# 5-4、使用文件對象
None# 5-5、使用StringIO
Name,Age,City
Alice,24,New York
Bob,27,Los Angeles
Charlie,22,Chicago

6、pandas.read_fwf函數

6-1、語法

# 6、pandas.read_fwf函數
pandas.read_fwf(filepath_or_buffer, *, colspecs='infer', widths=None, infer_nrows=100, dtype_backend=_NoDefault.no_default, iterator=False, chunksize=None, **kwds)
Read a table of fixed-width formatted lines into DataFrame.Also supports optionally iterating or breaking of the file into chunks.Additional help can be found in the online docs for IO Tools.Parameters:
filepath_or_bufferstr, path object, or file-like object
String, path object (implementing os.PathLike[str]), or file-like object implementing a text read() function.The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.colspecslist of tuple (int, int) or ‘infer’. optional
A list of tuples giving the extents of the fixed-width fields of each line as half-open intervals (i.e., [from, to[ ). String value ‘infer’ can be used to instruct the parser to try detecting the column specifications from the first 100 rows of the data which are not being skipped via skiprows (default=’infer’).widthslist of int, optional
A list of field widths which can be used instead of ‘colspecs’ if the intervals are contiguous.infer_nrowsint, default 100
The number of rows to consider when letting the parser determine the colspecs.dtype_backend{‘numpy_nullable’, ‘pyarrow’}, default ‘numpy_nullable’
Back-end data type applied to the resultant DataFrame (still experimental). Behaviour is as follows:"numpy_nullable": returns nullable-dtype-backed DataFrame (default)."pyarrow": returns pyarrow-backed nullable ArrowDtype DataFrame.New in version 2.0.**kwdsoptional
Optional keyword arguments can be passed to TextFileReader.Returns:
DataFrame or TextFileReader
A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes.

6-2、參數

6-2-1、filepath_or_buffer(必須)：字符串或文件對象，表示要讀取的文件路徑或文件對象。如果是文件路徑，需要確保Pandas能夠訪問到這個文件。

6-2-2、colspecs(可選，默認值為'infer')：指定列寬的規范。可以是一個整數列表，表示每列的起始位置(索引從0開始)，或者是一個元組列表，每個元組包含兩個整數，分別表示每列的起始和結束位置(不包括結束位置)。如果設置為 'infer'，Pandas會嘗試自動推斷列寬。

6-2-3、widths(可選，默認值為None)：與colspecs參數類似，但widths接收的是一個整數列表，直接指定每列的寬度(即每列的結束位置相對于起始位置的偏移量)。如果同時指定了colspecs和widths，則colspecs會被優先使用。

6-2-4、infer_nrows(可選，默認值為100)：用于推斷列寬時讀取的行數。當colspecs='infer'時，Pandas會讀取文件的前infer_nrows行來嘗試推斷出列寬，這個值可以根據文件大小和復雜性進行調整。

6-2-5、dtype_backend(可選)：這個參數通常不需要用戶直接設置，它是用來指定數據類型推斷的后端，Pandas內部使用它來優化數據類型的推斷過程。

6-2-6、iterator(可選，默認值為False)：布爾值，如果設置為True，則返回一個TextFileReader對象，該對象可以迭代地讀取文件塊(chunk)，而不是一次性將整個文件讀入內存，這對于處理大文件很有用。

6-2-7、chunksize(可選，默認值為None)：當iterator=True時，這個參數指定了每個文件塊(chunk)的行數。如果設置為None，則chunksize會被設置為infer_nrows的值。

6-2-8、*kwds(可選)：其他關鍵字參數，這些參數會傳遞給底層的TextParser對象。常用的有header(指定列名的行位置，默認為None，表示沒有列名)、names(自定義的列名列表，當文件中沒有列名時使用)等。

6-3、功能

????????將固定寬度格式的文本文件解析成Pandas的DataFrame對象。

6-4、返回值

????????返回值是一個DataFrame對象。

6-5、說明

????????從Pandas 1.0.0開始，dtype_backend參數已被棄用，并且可能在未來的版本中移除。在大多數情況下，用戶不需要直接設置這個參數。

6-6、用法

6-6-1、代碼示例

# 6、pandas.read_fwf函數
# 6-1、創建測試用的.txt文件
# 直接使用Python的文件操作寫入字符串
with open('example.txt', 'w') as f:f.write('12345John Doe  25  New York\n')f.write('67890Jane Smith30  Los Angeles\n')# 6-2、基礎用法
import pandas as pd
# 假設列寬分別為 5, 10, 2, 14
colspecs = [(0, 5), (5, 15), (15, 17), (17, 31)]
# 讀取文件
df = pd.read_fwf('example.txt', colspecs=colspecs, header=None, names=['ID', 'Name', 'Age', 'City'])
# 顯示DataFrame
print(df)# 6-3、自動推斷列寬
import pandas as pd
# 嘗試自動推斷列寬，這里假設前100行足夠用來推斷
df = pd.read_fwf('example.txt', colspecs='infer', header=None, names=['ID', 'Name', 'Age', 'City'], infer_nrows=100)
# 顯示DataFrame
print(df)# 6-4、使用widths參數
import pandas as pd
# 使用 widths 參數指定列寬
widths = [5, 10, 2, 14]  # 分別對應ID, Name, Age, City的寬度
# 讀取文件
df = pd.read_fwf('example.txt', widths=widths, header=None, names=['ID', 'Name', 'Age', 'City'])
# 顯示DataFrame
print(df)

6-6-2、結果輸出?

# 6-1、創建測試用的.txt文件
# None# 6-2、基礎用法
#       ID        Name  Age         City
# 0  12345    John Doe   25     New York
# 1  67890  Jane Smith   30  Los Angeles# 6-3、自動推斷列寬
#           ID     Name  Age     City
# 0  12345John  Doe  25  New     York
# 1  67890Jane  Smith30  Los  Angeles# 6-4、使用widths參數
#       ID        Name  Age         City
# 0  12345    John Doe   25     New York
# 1  67890  Jane Smith   30  Los Angeles