aws python庫
Many Python developers in the financial world are tasked with creating Excel documents for analysis by non-technical users.
金融界的許多Python開發人員的任務是創建Excel文檔,以供非技術用戶進行分析。
This is actually a lot harder than it sounds. From sourcing the data to formatting the spreadsheet to deploying the final doc in a central location, there are plenty of steps involved in the process.
這實際上比聽起來要難得多。 從獲取數據到格式化電子表格,再到在中心位置部署最終文檔,該過程涉及很多步驟。
In this tutorial, I'm going to show you how to create Excel spreadsheets using Python that:
在本教程中,我將向您展示如何使用Python創建Excel電子表格:
- Use stock market data from IEX Cloud 使用來自IEX Cloud的股市數據
- Are deployed in a centralized S3 bucket so that anyone with the right URL can access them 部署在集中式S3存儲桶中,因此任何具有正確URL的人都可以訪問它們
Automatically update daily using the
cron
command line utility使用
cron
命令行實用程序每天自動更新
步驟1:使用IEX Cloud創建帳戶 (Step 1: Create an Account with IEX Cloud)
IEX Cloud is the data provider subsidiary of the IEX stock exchange.
IEX Cloud是IEX股票交易所的數據提供商子公司。
In case you're unfamiliar with IEX, it is an acronym for "The Investor's Exchange". IEX was founded by Brad Katsuyama to build a better stock exchange that avoids investor-unfriendly behavior like front-running and high-frequency trading. Katsuyama's exploits were famously chronicled in Michael Lewis' best-selling book Flash Boys.
如果您不熟悉IEX,則它是“投資者交易所”的縮寫。 IEX由布拉德·勝山(Brad Katsuyama)創立,旨在建立一個更好的證券交易所,避免投資者不友好的行為,例如前期交易和高頻交易。 勝山的功績在邁克爾·劉易斯(Michael Lewis)的暢銷書《 Flash Boys 》中被記載。
I have investigated many financial data providers and IEX Cloud has the best combination of:
我調查了許多金融數據提供商,并且IEX Cloud具有以下最佳組合:
- High-quality data 高質量數據
- Affordable price 可接受的價格
Their prices are below:
其價格如下:
The $9/month Launch plan is plenty for many use cases.
每月9美元的啟動計劃可滿足許多用例。
A warning on using IEX Cloud (and any other pay-per-use data provider): it is very important that you set usage budgets from the beginning. These budgets lock you out of your account once you hit a certain dollar cost for the month.
關于使用IEX Cloud(和任何其他按使用量付費數據提供者)的警告:從一開始就設置使用預算非常重要。 一旦您當月達到一定的美元費用,這些預算就會使您無法進入帳戶。
When I first started using IEX Cloud, I accidentally created an infinite loop on a Friday afternoon that contained an API call to IEX Cloud. These API calls are priced on a cost-per-call basis...which resulted in a terrifying email from IEX:
剛開始使用IEX Cloud時,我意外地在星期五下午創建了一個無限循環,其中包含對IEX Cloud的API調用。 這些API調用是按每次調用費用定價的,這導致來自IEX的電子郵件令人恐懼:
It is a testament to IEX's customer-centricity that they agreed to reset my usage as long as I set usage budgets moving forward. Go IEX!
IEX以客戶為中心的證明是,只要我設定未來的使用預算,他們就同意重置我的使用情況。 去IEX!
As with most API subscriptions, the main benefit of creating an IEX Cloud account is having an API key.
與大多數API訂閱一樣,創建IEX Cloud帳戶的主要好處是擁有API密鑰。
For obvious reasons, I will not share an API key in this article.
出于明顯的原因,我不會在本文中共享API密鑰。
However, you can still work through this tutorial with your own API key as long as you assign it to the following variable name:
但是,只要將其分配給以下變量名,您仍然可以使用自己的API密鑰來完成本教程:
IEX_API_Key
You will see the blank IEX_API_Key
variable in my code blocks throughout the rest of this tutorial.
在本教程的其余部分中,您將在我的代碼塊中看到空白的IEX_API_Key
變量。
第2步:編寫Python腳本 (Step 2: Write Your Python Script)
Now that you have access to the API key that you'll need to gather financial data, it's time to write your Python script.
現在您已經可以訪問收集財務數據所需的API密鑰,現在該編寫Python腳本了。
This will be the longest section of this tutorial. It is also the most flexible - we are going to create a Python script that satisfies certain pre-specified criteria, but you could modify this section to really create any spreadsheet you want!
這將是本教程中最長的部分。 它也是最靈活的-我們將創建一個滿足某些預先指定條件的Python腳本,但是您可以修改此部分以真正創建所需的任何電子表格!
To start, let's lay out our goal posts. We are going to write a Python script that generates an Excel file of stock market data with the following characteristics:
首先,讓我們布置目標職位。 我們將編寫一個Python腳本,該腳本生成具有以下特征的股市數據的Excel文件:
- It will include the 10 largest stocks in the United States 它將包括美國十大股票
- It will contain four columns: stock ticker, company name, share price, and dividend yield. 它包含四列:股票行情,股票名稱,股價和股息收益率。
It will be formatted such that the header's background color is
#135485
and text is white, while the spreadsheet body's background is#DADADA
and the font color is black (the default).格式將設置為:標題的背景顏色為
#135485
,文本為白色,而電子表格正文的背景為#DADADA
,字體顏色為黑色(默認值)。
Let's start by importing our first package.
讓我們從導入第一個包開始。
Since spreadsheets are essentially just data structures with rows and columns, then the pandas
library - including its built-in DataFrame
object - is a perfect candidate for manipulating data in this tutorial.
由于電子表格本質上只是具有行和列的數據結構,因此pandas
庫(包括其內置的DataFrame
對象)是本教程中處理數據的理想選擇。
We'll start by importing pandas
under the alias pd
like this:
我們將以這樣的別名pd
導入pandas
:
import pandas as pd
Next, we'll specify our IEX Cloud API key. As I mentioned before, I'm not going to really include my API key, so you'll have to grab your own API key from your IEX account and include it here:
接下來,我們將指定IEX Cloud API密鑰。 如前所述,我不會真正包含我的API密鑰,因此您必須從IEX帳戶中獲取自己的API密鑰并將其包含在此處:
IEX_API_Key = ''
Our next step is to determine the ten largest companies in the United States.
我們的下一步是確定美國十大公司。
You can answer this question with a quick Google search.
您可以通過快速的Google搜索回答此問題。
For brevity, I have included the companies (or rather, their stock tickers) in the following Python list:
為簡便起見,我將公司(或更確切地說,它們的股票行情自動收錄器)包括在以下Python列表中:
Next, it is time to figure out how to ping the IEX Cloud API to pull in the metrics we need for each company.
接下來,是時候弄清楚如何對IEX Cloud API進行ping操作,以獲取每個公司所需的指標了。
The IEX Cloud API returns JSON objects in response to HTTP requests. Since we are working with more than 1 ticker in this tutorial, we will use IEX Cloud's batch API call functionality, which allows you to request data on more than one ticker at a time. Using batch API calls has two benefits:
IEX Cloud API返回JSON對象以響應HTTP請求。 由于我們在本教程中使用多個代碼,因此我們將使用IEX Cloud的批處理API調用功能,該功能可讓您一次在多個代碼上請求數據。 使用批處理API調用有兩個好處:
- It reduces the number of HTTP requests you need to make, which will make your code more performant. 它減少了您需要發出的HTTP請求的數量,這將使您的代碼更具性能。
- The pricing for batch API calls is slightly better with most data providers. 對于大多數數據提供者,批處理API調用的定價略高。
Here is an example of what the HTTP request might look like, with a few placeholder words where we'll need to customize the request:
這是HTTP請求的外觀示例,其中有一些占位符,我們需要自定義該請求:
https://cloud.iexapis.com/stable/stock/market/batch?symbols=TICKERS&types=ENDPOINTS&range=RANGE&token=IEX_API_Key
In this URL, we'll replace these variables with the following values:
在此URL中,我們將這些變量替換為以下值:
TICKERS
will be replaced by a string that contains each of our tickers separated by a comma.TICKERS
將替換為包含以逗號分隔的每個TICKERS
代碼的字符串。ENDPOINTS
will be replaced by a string that contains each of the IEX Cloud endpoints we want to hit, separated by a comma.ENDPOINTS
將替換為包含我們要命中的每個IEX Cloud端點的字符串,并用逗號分隔。RANGE
will be replaced by1y
. These endpoints each contain point-in-time data and not time series data, so this range can really be whatever you want.RANGE
將替換為1y
。 這些端點每個都包含時間點數據而不是時間序列數據,因此此范圍實際上可以是您想要的任何范圍。
Let's put this URL into a variable called HTTP_request
for us to modify later:
讓我們將此URL放入一個名為HTTP_request
的變量中,以便我們稍后進行修改:
HTTP_request = 'https://cloud.iexapis.com/stable/stock/market/batch?symbols=TICKERS&types=ENDPOINTS&range=RANGE&token=IEX_API_Key'
Let's work through each of these variables one-by-one to determine the exact URL that we need to hit.
讓我們逐一研究這些變量,以確定需要命中的確切URL。
For the TICKERS
variable, we can generate a real Python variable (and not just a placeholder word) with a simple for
loop:
對于TICKERS
變量,我們可以使用簡單的for
循環生成一個真實的Python變量(而不僅僅是占位符):
#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:ticker_string += tickerticker_string += ','#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]
Now we can interpolate our ticker_string
variable into the HTTP_request
variable that we created earlier using an f-string:
現在,我們可以將我們的ticker_string
變量插值到我們先前使用f字符串創建的HTTP_request
變量中:
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types=ENDPOINTS&range=RANGE&token=IEX_API_Key'
Next, we need to determine which IEX Cloud endpoints we need to ping.
接下來,我們需要確定需要ping的IEX Cloud端點。
Some quick investigation into the IEX Cloud documentation reveals that we only need the price
and stats
endpoints to create our spreadsheet.
對IEX Cloud文檔的一些快速調查顯示,我們只需要price
和stats
端點即可創建電子表格。
Thus, we can replace the placeholder ENDPOINTS
word from our original HTTP request with the following variable:
因此,我們可以將原始HTTP請求中的占位符ENDPOINTS
單詞替換為以下變量:
endpoints = 'price,stats'
Like we did with our ticker_string
variable, let's substitute the endpoints
variable into the ticker_string
variable:
就像我們對ticker_string
變量所做的一樣,讓我們??將endpoints
變量替換為ticker_string
變量:
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=RANGE&token=IEX_API_Key'
The last placeholder we need to replace is RANGE
. We will not replace with this a variable. Instead, we can hardcode a 1y
directly into the URL path like this:
我們需要替換的最后一個占位符是RANGE
。 我們不會用這個替換變量。 相反,我們可以將1y
直接硬編碼到URL路徑中,如下所示:
https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token=IEX_API_Key
We've done a lot so far, so let's recap our code base:
到目前為止,我們已經做了很多工作,所以讓我們回顧一下代碼庫:
import pandas as pdIEX_API_Key = ''#Specify the stock tickers that will be included in our spreadsheet
tickers = ['MSFT','AAPL','AMZN','GOOG','FB','BRK.B','JNJ','WMT','V','PG']#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:ticker_string += tickerticker_string += ','#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]#Create the endpoint strings
endpoints = 'price,stats'#Interpolate the endpoint strings into the HTTP_request string
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token={IEX_API_Key}'
It is now time to ping the API and save its data into a data structure within our Python application.
現在是時候ping API并將其數據保存到我們的Python應用程序中的數據結構中了。
We can read ?JSON objects with pandas' read_json
method. In our case, we'll save the JSON data to a pandas DataFrame
called raw_data
, like this:
我們可以使用pandas的read_json
方法讀取JSON對象。 在本例中,我們將JSON數據保存到名為raw_data
的pandas DataFrame
,如下所示:
raw_data = pd.read_json(HTTP_request)
Let's take a moment now to make sure that the data has been imported in a nice format for our application.
現在讓我們花一點時間來確保已為我們的應用程序以一種不錯的格式導入了數據。
If you're working through this tutorial in a Jupyter Notebook, you can simply type the name of the pandas DataFrame
variable on the last line of a code cell, and Jupyter will nicely render an image of the data, like this:
如果您正在Jupyter Notebook中完成本教程,則只需在代碼單元的最后一行鍵入pandas DataFrame
變量的名稱,Jupyter就會很好地呈現數據圖像,如下所示:
As you can see, the pandas DataFrame
contains a column for each stock ticker and two rows: one for the stats
endpoint and one for the price
endpoint. We will need to parse this DataFrame to get the four metrics we want. Let's work through the metrics one-by-one in the steps below.
如您所見, pandas DataFrame
的每個股票行情pandas DataFrame
包含一列,兩行分別是: stats
端點和price
端點。 我們將需要解析此DataFrame以獲得所需的四個指標。 讓我們在以下步驟中一步一步地研究指標。
指標1:股票行情 (Metric 1: Stock Ticker)
This step is very straightforward since the stock tickers are contained in the columns of the pandas DataFrame
. We can access them through the columns
attribute of the pandas DataFrame
like this:
由于股票行情收錄器包含在pandas DataFrame
的列中,因此這一步驟非常簡單。 我們可以通過pandas DataFrame
的columns
屬性訪問它們,如下所示:
raw_data.columns
To access the other metrics in raw_data
, we will create a for
loop that loops through each ticker in raw_data.columns
. In each iteration of the loop we will add the data to a new pandas DataFrame
object called output_data
.
要訪問raw_data
的其他指標,我們將創建一個for
循環,循環遍歷raw_data.columns
每個報價器。 在循環的每次迭代中,我們都將數據添加到名為output_data
的新pandas DataFrame
對象中。
First we'll need to create output_data
, which should be an empty pandas DataFrame
with four columns. Here's how to do this:
首先,我們需要創建output_data
,它應該是一個具有四列的空pandas DataFrame
。 這樣做的方法如下:
output_data = pd.DataFrame(pd.np.empty((0,4)))
This creates an empty pandas DataFrame
with 0 rows and 4 columns.
這將創建一個空的pandas DataFrame
其中包含0行和4列。
Now that this object has been created, here's how we can structure this for
loop:
現在已經創建了該對象,這是我們如何構造此for
循環的方法:
for ticker in raw_data.columns:#Parse the company's name - not completed yetcompany_name = ''#Parse the company's stock price - not completed yetstock_price = 0#Parse the company's dividend yield - not completed yetdividend_yield = 0new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)
Next, let's determine how to parse the company_name
variable from the raw_data
object.
接下來,讓我們確定如何從raw_data
對象中解析company_name
變量。
指標2:公司名稱 (Metric 2: Company Name)
The company_name
variable is the first variable will need to be parsed from the raw_data
object. As a quick recap, here's what raw_data
looks like:
company_name
變量是第一個需要從raw_data
對象解析的變量。 快速回顧一下, raw_data
如下所示:
The company_name
variable is held within the stats
endpoint under the dictionary key companyName
. To parse this data point out of raw_data
, we can use these indexes:
company_name
變量保存在stats
端點內的字典鍵companyName
。 要從raw_data
解析此數據點,我們可以使用以下索引:
raw_data[ticker]['stats']['companyName']
Including this in our for
loop from before gives this:
從之前將其包含在我們的for
循環中將得到以下結果:
output_data = pd.DataFrame(pd.np.empty((0,4)))for ticker in raw_data.columns:#Parse the company's name - not completed yetcompany_name = raw_data[ticker]['stats']['companyName']#Parse the company's stock price - not completed yetstock_price = 0#Parse the company's dividend yield - not completed yetdividend_yield = 0new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)
Let's move on to parsing stock_price
.
讓我們繼續分析stock_price
。
指標3:股價 (Metric 3: Stock Price)
The stock_price
variable is contained within the price
endpoint, which returns only a single value. This means we do not need to chain together indexes like we did with company_name
.
stock_price
變量包含在price
端點內,該端點僅返回單個值。 這意味著我們不需要像對company_name
那樣將索引鏈接在一起。
Here's how we could parse stock_price
from raw_data
:
這是我們從raw_data
解析stock_price
:
raw_data[ticker]['price']
Including this in our for
loop gives us:
在我們的for
循環中包含它for
使我們:
output_data = pd.DataFrame(pd.np.empty((0,4)))for ticker in raw_data.columns:#Parse the company's name - not completed yetcompany_name = raw_data[ticker]['stats']['companyName']#Parse the company's stock price - not completed yetstock_price = raw_data[ticker]['price']#Parse the company's dividend yield - not completed yetdividend_yield = 0new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)
The last metric we need to parse is dividend_yield
.
我們需要解析的最后一個指標是dividend_yield
。
指標4:股息收益率 (Metric 4: Dividend Yield)
Like company_name
, dividend_yield
is contained in the stats
endpoint. It is held under the dividendYield
dictionary key.
像company_name
一樣, dividend_yield
端點也包含在stats
端點中。 它被保存在dividendYield
字典關鍵字下。
Here is how we could parse it out of raw_data
:
這是我們可以從raw_data
解析出來的方法:
raw_data[ticker]['stats']['dividendYield']
Adding this to our for
loop gives us:
將其添加到我們的for
循環中可以使我們:
output_data = pd.DataFrame(pd.np.empty((0,4)))for ticker in raw_data.columns:#Parse the company's name - not completed yetcompany_name = raw_data[ticker]['stats']['companyName']#Parse the company's stock price - not completed yetstock_price = raw_data[ticker]['price']#Parse the company's dividend yield - not completed yetdividend_yield = raw_data[ticker]['stats']['dividendYield']new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)
Let's print out our output_data
object to see what the data looks like:
讓我們打印出output_data
對象,看看數據是什么樣的:
So far so good! The next two steps are to name the columns of the pandas DataFrame
and to change its index.
到目前為止,一切都很好! 接下來的兩個步驟是命名pandas DataFrame
的列并更改其索引。
如何命名熊貓數據框的列 (How to Name the Columns of a Pandas DataFrame)
We can update the column names of our output_data
object by creating a list of column names and assigning it to the output_data.columns
attribute, like this:
我們可以通過創建列名列表并將其分配給output_data.columns
屬性來更新output_data
對象的列名,如下所示:
output_data.columns = ['Ticker', 'Company Name', 'Stock Price', 'Dividend Yield']
Let's print out our output_data
object to see what the data looks like:
讓我們打印出output_data
對象,看看數據是什么樣的:
Much better! Let's change the index of output_data
next.
好多了! 接下來讓我們更改output_data
的索引。
如何更改熊貓數據框的索引 (How to Change the Index of a Pandas DataFrame)
The index of a pandas DataFrame
is a special column that is somewhat similar to the primary key of a SQL database table. In our output_data
object, we want to set the Ticker
column as the DataFrame
's index.
pandas DataFrame
的索引是一個特殊的列,與SQL數據庫表的主鍵有些相似。 在我們的output_data
對象中,我們想要將Ticker
列設置為DataFrame
的索引。
Here's how we can do this using the set_index
method:
使用set_index
方法的方法如下:
output_data.set_index('Ticker', inplace=True)
Let's print out our output_data
object to see what the data looks like:
讓我們打印出output_data
對象,看看數據是什么樣的:
Another incremental improvement!
另一個增量的改進!
Next, let's deal with the missing data in output_data
.
接下來,讓我們處理output_data
丟失的數據。
如何處理Pandas DataFrame中的缺失數據 (How to Handle Missing Data in Pandas DataFrames)
If you take a close look at output_data
, you will notice that there are several None
values in the Dividend Yield
column:
如果仔細查看output_data
,您會注意到“ Dividend Yield
列中有多個“ None
值:
These None
values simply indicate that the company for that row does not currently pay a dividend. While None
is one way of representing a non-dividend stock, it is more common to show a Dividend Yield
of 0
.
這些“ None
值僅表示該行的公司當前不支付股息。 雖然None
是表示非股息股票的一種方式,但更常見的是將Dividend Yield
顯示為0
。
Fortunately, the fix for this is quite straightforward. The pandas
library includes an excellent fillna
method that allows us to replace missing values in a pandas DataFrame
.
幸運的是,解決方法非常簡單。 pandas
庫包含出色的fillna
方法,該方法使我們可以替換pandas DataFrame
缺失值。
Here's how we can use the fillna
method to replace our Dividend Yield
column's None
values with 0
:
這是我們可以使用fillna
方法將“ Dividend Yield
列的None
值替換為0
:
output_data['Dividend Yield'].fillna(0,inplace=True)
The output_data
object looks much cleaner now:
現在, output_data
對象看起來更加干凈:
We are now ready to export our DataFrame to an Excel document! As a quick recap, here is our Python script to date:
現在,我們準備將DataFrame導出到Excel文檔中! 快速回顧一下,這是迄今為止的Python腳本:
import pandas as pdIEX_API_Key = ''#Specify the stock tickers that will be included in our spreadsheet
tickers = ['MSFT','AAPL','AMZN','GOOG','FB','BRK.B','JNJ','WMT','V','PG']#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:ticker_string += tickerticker_string += ','#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]#Create the endpoint strings
endpoints = 'price,stats'#Interpolate the endpoint strings into the HTTP_request string
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token={IEX_API_Key}'#Create an empty pandas DataFrame to append our parsed values into during our for loop
output_data = pd.DataFrame(pd.np.empty((0,4)))for ticker in raw_data.columns:#Parse the company's namecompany_name = raw_data[ticker]['stats']['companyName']#Parse the company's stock pricestock_price = raw_data[ticker]['price']#Parse the company's dividend yielddividend_yield = raw_data[ticker]['stats']['dividendYield']new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)#Change the column names of output_data
output_data.columns = ['Ticker', 'Company Name', 'Stock Price', 'Dividend Yield']#Change the index of output_data
output_data.set_index('Ticker', inplace=True)#Replace the missing values of the 'Dividend Yield' column with 0
output_data['Dividend Yield'].fillna(0,inplace=True)#Print the DataFrame
output_data
如何使用XlsxWriter從Pandas DataFrame導出樣式化的Excel文檔 (How to Export A Styled Excel Document From a Pandas DataFrame using XlsxWriter)
There are multiple ways to export an xlsx
file from a pandas DataFrame
.
有多種方法可以從pandas DataFrame
導出xlsx
文件。
The easiest way is to use the built-in function to_excel
. As an example, here's how we could export output_data
to an Excel file:
最簡單的方法是使用內置函數to_excel
。 例如,這是我們將output_data
導出到Excel文件的方法:
output_data.to_excel('my_excel_document.xlsx)
The problem with this approach is that the Excel file has no format whatsoever. The output looks like this:
這種方法的問題在于Excel文件沒有任何格式。 輸出如下所示:
The lack of formatting in this document makes it hard to interpret.
本文檔缺乏格式,使其難以解釋。
What is the solution?
解決辦法是什么?
We can use the Python package XlsxWriter
to generate nicely-formatted Excel files. To start, we'll want to add the following import to the beginning of our Python script:
我們可以使用Python包XlsxWriter
生成格式正確的Excel文件。 首先,我們要在Python腳本的開頭添加以下導入:
import xlsxwriter
Next, we need to create our actual Excel file. The XlsxWriter package actually has a dedicated documentation page for how to work with pandas DataFrames
, which is available here.
接下來,我們需要創建實際的Excel文件。 XlsxWriter軟件包實際上有一個專門的文檔頁面,介紹如何使用pandas DataFrames
,可在此處找到 。
Our first step is to call the pd.ExcelWriter
function and pass in the desired name of our xlsx
file as the first argument and engine='xlsxwriter
as the second argument. We will assign this to a variable called writer
:
我們的第一步是調用pd.ExcelWriter
函數,并將我們的xlsx
文件的所需名稱作為第一個參數傳遞,并將engine='xlsxwriter
作為第二個參數engine='xlsxwriter
。 我們將其分配給一個稱為writer
的變量:
writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')
From there, we need to call the to_excel
method on our pandas DataFrame
. This time, instead of passing in the name of the file that we're trying to export, we'll pass in the writer
object that we just created:
從那里,我們需要在pandas DataFrame
上調用to_excel
方法。 這次,我們將傳遞剛剛創建的writer
對象,而不是傳遞我們試圖導出的文件名:
output_data.to_excel(writer, sheet_name='Sheet1')
Lastly, we will call the save
method on our writer
object, which saves the xlsx
file to our current working directory. When all this is done, here is the section of our Python script that saves output_data
to an Excel file.
最后,我們將在writer
對象上調用save
方法,該方法將xlsx
文件保存到當前工作目錄中。 完成所有這些操作后,這就是我們的Python腳本部分,該部分將output_data
保存到Excel文件。
writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')output_data.to_excel(writer, sheet_name='Sheet1')writer.save()
All of the formatting code that we will include in our xlsx
file needs to be contained between the creation of the ExcelWriter
object and the writer.save()
statement.
在創建ExcelWriter
對象和writer.save()
語句之間,必須包含將包含在xlsx
文件中的所有格式代碼。
如何為使用Python創建的xlsx
文件設置樣式 (How to Style an xlsx
File Created with Python)
It is actually harder than you might think to style an Excel file using Python.
實際上,這比使用Python設置Excel文件樣式的難度要大。
This is partially because of some of the limitations of the XlsxWriter package. Its documentation states:
部分原因是XlsxWriter軟件包的某些限制。 其文檔指出:
'XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates or datetimes. In addition it isn’t possible to format any cells that already have a default format applied.
“ XlsxWriter和Pandas除了默認格式(如標頭和索引單元格以及任何包含日期或日期時間的單元格)外,幾乎沒有支持格式化數據幀中的輸出數據。 此外,無法格式化已應用默認格式的任何單元格。
If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas. However, some formatting options are available.'
如果您需要對數據幀輸出進行嚴格控制的格式設置,那么最好直接使用Xlsxwriter處理來自Pandas的原始數據。 但是,有些格式選項可用。
In my experience, the most flexible way to style cells in an xlsx
file created by XlsxWriter is to use conditional formatting that only applies styling when a cell is not equal to None
.
以我的經驗,在XlsxWriter創建的xlsx
文件中設置單元格樣式的最靈活的方法是使用條件格式,該條件格式僅在單元格不等于None
時才應用樣式設置。
This has three advantages:
這具有三個優點:
- It provides more styling flexibility than the normal formatting options available in XlsxWriter. 與XlsxWriter中可用的常規格式設置選項相比,它提供了更多的樣式靈活性。
You do not need to manually loop through each data point and import them into the
writer
object one-by-one.您無需手動遍歷每個數據點,并將它們一個接一個地導入到
writer
對象中。It allows you to easily see when
None
values have made their way into your finalizedxlsx
files, since they'll be missing the required formatting.它使您可以輕松地查看
None
值何時進入了最終的xlsx
文件,因為它們將缺少所需的格式。
To apply styling using conditional formatting, we first need to create a few style templates. Specifically, we will need four templates:
要使用條件格式應用樣式,我們首先需要創建一些樣式模板。 具體來說,我們將需要四個模板:
One
header_template
that will be applied to the column names at the top of the spreadsheet一個
header_template
將應用于電子表格頂部的列名稱One
string_template
that will be applied to theTicker
andCompany Name
columns一個將應用于“
Ticker
和“Company Name
列的string_template
One
dollar_template
that will be applied to theStock Price
column一
dollar_template
將應用于“Stock Price
列One
percent_template
that will be applied to theDividend Yield
column將應用于“
Dividend Yield
列的一個percent_template
Each of these format templates need to be added to the writer
object in dictionaries that resemble CSS syntax. Here's what I mean:
這些格式模板中的每一個都需要以類似于CSS語法的字典添加到writer
對象。 這就是我的意思:
header_template = writer.book.add_format({'font_color': '#ffffff','bg_color': '#135485','border': 1})string_template = writer.book.add_format({'bg_color': '#DADADA','border': 1})dollar_template = writer.book.add_format({'num_format':'$0.00','bg_color': '#DADADA','border': 1})percent_template = writer.book.add_format({'num_format':'0.0%','bg_color': '#DADADA','border': 1})
To apply these formats to specific cells in our xlsx
file, we need to call the package's conditional_format
method on ?writer.sheets['Stock Market Data']
. Here is an example:
要將這些格式應用于xlsx
文件中的特定單元格,我們需要在writer.sheets['Stock Market Data']
上調用程序包的conditional_format
方法。 這是一個例子:
writer.sheets['Stock Market Data'].conditional_format('A2:B11', {'type': 'cell','criteria': '<>','value': '"None"','format': string_template})
If we generalize this formatting to the other three formats we're applying, here's what the formatting section of our Python script becomes:
如果將這種格式概括為我們將要應用的其他三種格式,這就是Python腳本的格式部分:
writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')output_data.to_excel(writer, sheet_name='Stock Market Data')header_template = writer.book.add_format({'font_color': '#ffffff','bg_color': '#135485','border': 1})string_template = writer.book.add_format({'bg_color': '#DADADA','border': 1})dollar_template = writer.book.add_format({'num_format':'$0.00','bg_color': '#DADADA','border': 1})percent_template = writer.book.add_format({'num_format':'0.0%','bg_color': '#DADADA','border': 1})#Format the header of the spreadsheet
writer.sheets['Stock Market Data'].conditional_format('A1:D1', {'type': 'cell','criteria': '<>','value': '"None"','format': header_template})#Format the 'Ticker' and 'Company Name' columns
writer.sheets['Stock Market Data'].conditional_format('A2:B11', {'type': 'cell','criteria': '<>','value': '"None"','format': string_template})#Format the 'Stock Price' column
writer.sheets['Stock Market Data'].conditional_format('C2:C11', {'type': 'cell','criteria': '<>','value': '"None"','format': dollar_template})#Format the 'Dividend Yield' column
writer.sheets['Stock Market Data'].conditional_format('D2:D11', {'type': 'cell','criteria': '<>','value': '"None"','format': percent_template})writer.save()
Let's take a look at our Excel document to see how its looking:
讓我們看一下我們的Excel文檔,看看它的外觀:
So far so good! The last incremental improvement that we can make to this document is to make its columns a bit wider.
到目前為止,一切都很好! 我們可以對本文檔進行的最后一個增量改進是使其列更寬。
We can specify column widths by calling the set_column
method on writer.sheets['Stock Market Data']
.
我們可以通過調用writer.sheets['Stock Market Data']
的set_column
方法來指定列寬。
Here's what we'll add to our Python script to do this:
這是我們將添加到Python腳本中的操作:
#Specify all column widths
writer.sheets['Stock Market Data'].set_column('B:B', 32)
writer.sheets['Stock Market Data'].set_column('C:C', 18)
writer.sheets['Stock Market Data'].set_column('D:D', 20)
Here's the final version of the spreadsheet:
這是電子表格的最終版本:
Voila! We are good to go! You can access the final version of this Python script on GitHub here. The file is named stock_market_data.py
.
瞧! 我們很好走! 您可以在GitHub上訪問此Python腳本的最終版本 。 該文件名為stock_market_data.py
。
步驟3:設置AWS EC2虛擬機以運行Python腳本 (Step 3: Set Up an AWS EC2 Virtual Machine to Run Your Python Script)
Your Python script is finalized and ready to run.
您的Python腳本已完成并可以運行。
However, we do not want to simply run this on our local machine on an ad hoc basis.
但是,我們不想臨時在本地計算機上簡單地運行它。
Instead, we are going to set up a virtual machine using Amazon Web Services' Elastic Compute Cloud (EC2) service.
相反,我們將使用Amazon Web Services的Elastic Compute Cloud (EC2)服務來設置虛擬機。
You'll need to create an AWS account first if you do not already have one. To do this, navigate to this URL and click the "Create an AWS Account" in the top-right corner:
如果您還沒有一個AWS賬戶,則需要先創建一個。 為此,請導航至該URL,然后單擊右上角的“創建AWS賬戶”:
AWS' web application will guide you through the steps to create an account.
AWS的Web應用程序將指導您完成創建帳戶的步驟。
Once your account is created, ?you'll need to create an EC2 instance. This is simply a virtual server for running code on AWS infrastructure.
創建帳戶后,您需要創建一個EC2實例。 這只是一個用于在AWS基礎架構上運行代碼的虛擬服務器。
EC2 instances come in various operating systems and sizes, ranging from very small servers that qualify for AWS' free tier to very large servers capable of running complex applications.
EC2實例具有各種操作系統和大小,范圍從符合AWS免費層標準的小型服務器到能夠運行復雜應用程序的大型服務器。
We will use AWS' smallest server to run the Python script that we wrote in this article. To get started, navigate to EC2 within the AWS management console. Once you've arrived within EC2, click Launch Instance
:
我們將使用AWS最小的服務器來運行我們在本文中編寫的Python腳本。 要開始使用,請在AWS管理控制臺中導航到EC2。 進入EC2后,點擊Launch Instance
:
This will bring you to a screen that contains all of the available instance types within AWS EC2. Any machine that qualifies for AWS' free tier will be sufficient.
這將帶您到一個包含AWS EC2內所有可用實例類型的屏幕。 符合AWS免費套餐條件的任何計算機就足夠了。
I chose the Amazon Linux 2 AMI (HVM)
:
我選擇了Amazon Linux 2 AMI (HVM)
:
Click Select
to proceed.
單擊Select
繼續。
On the next page, AWS will ask you to select the specifications for your machine. The fields you can select include:
在下一頁上,AWS將要求您選擇計算機的規格。 您可以選擇的字段包括:
Family
Family
Type
Type
vCPUs
vCPUs
Memory
Memory
Instance Storage (GB)
Instance Storage (GB)
EBS-Optimized
EBS-Optimized
Network Performance
Network Performance
IPv6 Support
IPv6 Support
For the purpose of this tutorial, we simply want to select the single machine that is free tier eligible. It is characterized by a small green label that looks like this:
就本教程而言,我們只想選擇符合免費套餐資格的單臺計算機。 它的特征是帶有一個小的綠色標簽,如下所示:
Once you have selected a free tier eligible machine, click Review and Launch
at the bottom of the screen to proceed. The next screen will present the details of your new instance for you to review. Quickly review the machine's specifications, then click Launch
in the bottom right-hand corner.
選擇符合條件的免費套餐計算機后,請單擊屏幕底部的“ Review and Launch
”以繼續。 下一個屏幕將顯示新實例的詳細信息供您查看。 快速查看機器的規格,然后單擊右下角的Launch
。
Clicking the Launch
button will trigger a popup that asks you to Select an existing key pair or create a new key pair
. A key pair is comprised of a public key that AWS holds and a private key that you must download and store within a .pem
file. You must have access to that .pem
file in order to access your EC2 instance (typically via SSH). You also have the option to proceed without a key pair, but this is not recommended for security reasons.
單擊Launch
按鈕將觸發一個彈出窗口,要求您Select an existing key pair or create a new key pair
。 密鑰對由AWS持有的公共密鑰和必須下載并存儲在.pem
文件中的私有密鑰組成。 您必須有權訪問該.pem
文件才能訪問您的EC2實例(通常通過SSH)。 您還可以選擇不使用密鑰對繼續進行操作,但是出于安全原因, 不建議這樣做。
Once you have selected or created a key pair for this EC2 instance and click the radio button for I acknowledge that I have access to the selected private key file (data-feeds.pem), and that without this file, I won't be able to log into my instance
, you can click Launch Instances
to proceed.
為該EC2實例選擇或創建密鑰對后,請單擊單選按鈕,因為I acknowledge that I have access to the selected private key file (data-feeds.pem), and that without this file, I won't be able to log into my instance
,您可以單擊Launch Instances
繼續。
Your instance will now begin to launch. It can take some time for these instances to boot up, but once its ready, its Instance State
will show as running
in your EC2 dashboard.
您的實例現在將開始啟動。 這些實例啟動可能需要一些時間,但是一旦準備就緒,其Instance State
將在您的EC2儀表板中顯示為running
。
Next, you will need to push your Python script into your EC2 instance. Here is a generic command state statement that allows you to move a file into an EC2 instance:
接下來,您將需要將Python腳本推送到EC2實例中。 這是一條通用的命令狀態語句,它使您可以將文件移動到EC2實例中:
scp -i path/to/.pem_file path/to/file username@host_address.amazonaws.com:/path_to_copy
Run this statement with the necessary replacements to move stock_market_data.py
into the EC2 instance.
運行此語句并進行必要的替換,以將stock_market_data.py
移入EC2實例。
Trying to run stock_market_data.py
at this point will actually result in an error because the EC2 instance does not come with the necessary Python packages.
此時嘗試運行stock_market_data.py
實際上會導致錯誤,因為EC2實例未附帶必需的Python軟件包。
To fix this, you can either export a requirements.txt
file and import the proper packages using pip
, or you can simply run the following:
要解決此問題,您可以導出requirements.txt
文件并使用pip
導入適當的軟件包,也可以簡單地運行以下命令:
sudo yum install python3-pip
pip3 install pandas
pip3 install xlsxwriter
Once this is done, you can SSH into the EC2 instance and run the Python script from the command line with the following statement:
完成此操作后,您可以通過SSH進入EC2實例,并使用以下語句從命令行運行Python腳本:
python3 stock_market_data.py
步驟4:創建一個AWS S3存儲桶以保存完成的Python腳本 (Step 4: Create an AWS S3 Bucket to Hold the Finished Python Script)
With the work that we have completed so far, our Python script can be executed inside of our EC2 instance.
到目前為止,我們已經完成了工作,可以在EC2實例內部執行Python腳本。
The problem with this is that the xlsx
file will be saved to the AWS virtual server.
問題是xlsx
文件將保存到AWS虛擬服務器。
It is not accessible to anyone but us in that server, which limits its usefulness.
該服務器中除我們以外的任何人都無法訪問它,這限制了它的實用性。
To fix this, we are going to create a public bucket on AWS S3 where we can save the xlsx
file. Anyone who has the right URL will be able to download this file once this change is made.
為了解決這個問題,我們將在AWS S3上創建一個公共存儲桶,在其中可以保存xlsx
文件。 進行此更改后,擁有正確URL的任何人都可以下載此文件。
To start, navigate to AWS S3 from within the AWS Management Console. Click Create bucket
in the top right:
首先,從AWS管理控制臺中導航到AWS S3。 點擊右上角的Create bucket
:
On the next screen, you will need to pick a name for your bucket and an AWS region for the bucket to be hosted in. The bucket name must be unique and cannot contain spaces or uppercase letters. The region does not matter much for the purpose of this tutorial, so I will be using the default region of US East (Ohio) us-east-2)
.
在下一個屏幕上,您將需要為存儲桶選擇一個名稱,并為要托管的存儲桶選擇一個AWS區域。存儲桶名稱必須唯一,并且不能包含空格或大寫字母。 該區域對于本教程而言并不重要,因此我將使用US East (Ohio) us-east-2)
的默認區域。
You will need to change the Public Access settings in the next section to match this configuration:
您將需要在下一部分中更改“公共訪問”設置以匹配此配置:
Click Create bucket
to create your bucket and conclude this step of this tutorial!
單擊Create bucket
以創建您的存儲桶,并完成本教程的這一步!
步驟5:修改Python腳本以將xlsx文件推送到AWS S3 (Step 5: Modify Your Python Script to Push the xlsx File to AWS S3)
Our AWS S3 bucket is now ready to hold our finalized xlsx
document. We will now make a small change to our stock_market_data.py
file to push the finalized document to our S3 bucket.
現在,我們的AWS S3存儲桶已準備好保存我們最終確定的xlsx
文檔。 現在,我們stock_market_data.py
文件進行少量更改,以將最終文檔推送到我們的S3存儲桶中。
We will need to use the boto3
package to do this. boto3
is the AWS Software Development Kit (SDK) for Python, allowing Python developers to write software that connects to AWS services. To start, you'll need to install boto3
on your EC2 virtual machine. Run the following command line statement to do this:
我們將需要使用boto3
軟件包來執行此操作。 boto3
是用于Python的AWS軟件開發套件(SDK),允許Python開發人員編寫連接到AWS服務的軟件。 首先,您需要在EC2虛擬機上安裝boto3
。 運行以下命令行語句以執行此操作:
pip3 install boto3
You will also need to import the library into stock_market_data.py
by adding the following statement to the top of the Python script.
您還需要通過將以下語句添加到Python腳本的頂部,將庫導入stock_market_data.py
。
import boto3
We will need to add a few lines of code to the end of stock_market_data.py
to push the final document to AWS S3.
我們將需要在stock_market_data.py
的末尾添加幾行代碼,以將最終文檔推送到AWS S3。
s3 = boto3.resource('s3')
s3.meta.client.upload_file('stock_market_data.xlsx', 'my-S3-bucket', 'stock_market_data.xlsx', ExtraArgs={'ACL':'public-read'})
The first line of this code, s3 = boto3.resource('s3')
, allows our Python script to connect to Amazon Web Services.
此代碼的第一行s3 = boto3.resource('s3')
允許我們的Python腳本連接到Amazon Web Services。
The second line of code calls a method from boto3
that actually uploads our file to S3. It takes four arguments:
第二行代碼從boto3
調用一個方法,該方法實際上將我們的文件上傳到S3。 它包含四個參數:
stock_market_data.xlsx
- the name of the file on our local machine.stock_market_data.xlsx
我們本地計算機上文件的名稱。my-S3-bucket
- the name of the S3 bucket that we're uploading our file to.my-S3-bucket
我們要將文件上傳到的S3存儲桶的名稱。stock_market_data.xlsx
- the desired name of the file within the S3 bucket. In most cases, this will have the same value as the first argument passed into this method.stock_market_data.xlsx
-S3存儲桶中文件的所需名稱。 在大多數情況下,該值與傳遞給此方法的第一個參數的值相同。ExtraArgs={'ACL':'public-read'}
- this is an optional argument that tells AWS to make the uploaded file publicly-readable.ExtraArgs={'ACL':'public-read'}
-這是一個可選參數,告訴AWS使上傳的文件公開可讀。
步驟6:安排Python腳本使用Cron定期運行 (Step 6: Schedule Your Python Script to Run Periodically Using Cron)
So far, we have completed the following:
到目前為止,我們已經完成了以下工作:
- Built our Python script 構建我們的Python腳本
- Created an EC2 instance and deployed our code there 創建一個EC2實例并將代碼部署在那里
Created an S3 bucket where we can push the final
xlsx
document創建了一個S3存儲桶,我們可以在其中推送最終的
xlsx
文檔Modified the original Python script to upload the finalized
stock_market_data.xlsx
file to an AWS S3 bucket修改了原始Python腳本以將最終的
stock_market_data.xlsx
文件上傳到AWS S3存儲桶
The only step that is left is to schedule the Python script to run periodically.
剩下的唯一步驟是安排Python腳本定期運行。
We can do this using a command-line utility called cron
. To start, we will need to create a cron
expression that tells the utility when to run the code. The crontab guru website is an excellent resource for this.
我們可以使用名為cron
的命令行實用程序來執行此操作。 首先,我們將需要創建一個cron
表達式來告訴實用程序何時運行代碼。 crontab大師網站是一個很好的資源。
Here's how you can use crontab guru to get cron
expression that means every day at noon
:
這是使用crontab專家獲取cron
表達式的方法,這意味著every day at noon
:
Now we need to instruct our EC2 instance's cron
daemon to run stock_market_data.py
at this time each day.
現在,我們需要指示EC2實例的cron
守護進程每天每天的這個時候運行stock_market_data.py
。
To do this, we will first create a new file in our EC2 instance called stock_market_data.cron
.
為此,我們將首先在EC2實例中創建一個名為stock_market_data.cron
的新文件。
Open up this file and type in our cron expression followed by the statement that should be executed at the command line at that specified time.
打開此文件,然后鍵入cron表達式,然后輸入應在指定時間在命令行執行的語句。
Our command line statement is python3 stock_market_data.py
, so here is what should be contained in stock_market_data.cron
:
我們的命令行語句是python3 stock_market_data.py
,所以這是stock_market_data.cron
應包含的stock_market_data.cron
:
00 12 * * * python3 stock_market_data.py
If you run an ls
command in your EC2 instance, you should now see two files:
如果在EC2實例中運行ls
命令,現在應該看到兩個文件:
stock_market_data.py stock_market_data.cron
The last step of this tutorial is to load stock_market_data.cron
into the crontab
. You can think of the crontab
as a file that contains commands and instructions for the cron
daemon to execute. In other words, the crontab
contains batches of cron
jobs.
本教程的最后一步是將stock_market_data.cron
加載到crontab
。 您可以將crontab
視為一個文件,其中包含cron
守護程序要執行的命令和說明。 換句話說, crontab
包含一批cron
作業。
First, let's see what's in our crontab
. It should be empty since we have not put anything in it! You can view the contents of your crontab
with the following command:
首先,讓我們看看crontab
。 它應該是空的,因為我們沒有在其中放任何東西! 您可以使用以下命令查看crontab
的內容:
crontab -l
To load stock_market_data.cron
into the crontab
, run the following statement on the command line:
要將stock_market_data.cron
加載到crontab
,請在命令行上運行以下語句:
crontab stock_market_data.cron
Now when you run crontab -l
, you should see:
現在,當您運行crontab -l
,您應該看到:
00 12 * * * python3 stock_market_data.py
Our stock_market_data.py
script will now run at noon every day on our AWS EC2 virtual machine!
現在,我們的stock_market_data.py
腳本每天中午將在我們的AWS EC2虛擬機上運行!
最后的想法 (Final Thoughts)
In this article, you learned how to create automatically-updating Excel spreadsheets of financial data using Python, IEX Cloud, and Amazon Web Services.
在本文中,您學習了如何使用Python,IEX Cloud和Amazon Web Services創建自動更新財務數據的Excel電子表格。
Here are the specific steps we covered in this tutorial:
以下是我們在本教程中介紹的具體步驟:
- How to create an account with IEX Cloud 如何使用IEX Cloud創建帳戶
- How to write a Python script that generates beautiful Excel documents using pandas and XlsxWriter 如何編寫使用Pandas和XlsxWriter生成漂亮的Excel文檔的Python腳本
- How to launch an AWS EC2 instance and deploy code on it 如何啟動AWS EC2實例并在其上部署代碼
- How to create an AWS S3 bucket 如何創建一個AWS S3存儲桶
- How to push files to an AWS S3 bucket from within a Python script 如何從Python腳本中將文件推送到AWS S3存儲桶
How to schedule code to run using the
cron
software utility如何安排代碼使用
cron
軟件實用程序運行
This article was published by Nick McCullum, who teaches people how to code on his website.
這篇文章由尼克·麥卡魯姆(Nick McCullum)發表,他教人們如何在其網站上進行編碼 。
翻譯自: https://www.freecodecamp.org/news/auto-updating-excel-python-aws/
aws python庫