aws python庫_如何使用Python，AWS和IEX Cloud創建自動更新股市數據的Excel電子表格

aws python庫

Many Python developers in the financial world are tasked with creating Excel documents for analysis by non-technical users.

金融界的許多Python開發人員的任務是創建Excel文檔，以供非技術用戶進行分析。

This is actually a lot harder than it sounds. From sourcing the data to formatting the spreadsheet to deploying the final doc in a central location, there are plenty of steps involved in the process.

這實際上比聽起來要難得多。從獲取數據到格式化電子表格，再到在中心位置部署最終文檔，該過程涉及很多步驟。

In this tutorial, I'm going to show you how to create Excel spreadsheets using Python that:

在本教程中，我將向您展示如何使用Python創建Excel電子表格：

Use stock market data from IEX Cloud
使用來自IEX Cloud的股市數據
Are deployed in a centralized S3 bucket so that anyone with the right URL can access them
部署在集中式S3存儲桶中，因此任何具有正確URL的人都可以訪問它們
Automatically update daily using the cron command line utility
使用cron命令行實用程序每天自動更新

步驟1：使用IEX Cloud創建帳戶 (Step 1: Create an Account with IEX Cloud)

IEX Cloud is the data provider subsidiary of the IEX stock exchange.

IEX Cloud是IEX股票交易所的數據提供商子公司。

In case you're unfamiliar with IEX, it is an acronym for "The Investor's Exchange". IEX was founded by Brad Katsuyama to build a better stock exchange that avoids investor-unfriendly behavior like front-running and high-frequency trading. Katsuyama's exploits were famously chronicled in Michael Lewis' best-selling book Flash Boys.

如果您不熟悉IEX，則它是“投資者交易所”的縮寫。 IEX由布拉德·勝山(Brad Katsuyama)創立，旨在建立一個更好的證券交易所，避免投資者不友好的行為，例如前期交易和高頻交易。勝山的功績在邁克爾·劉易斯(Michael Lewis)的暢銷書《 Flash Boys 》中被記載。

I have investigated many financial data providers and IEX Cloud has the best combination of:

我調查了許多金融數據提供商，并且IEX Cloud具有以下最佳組合：

High-quality data
高質量數據
Affordable price
可接受的價格

Their prices are below:

其價格如下：

The $9/month Launch plan is plenty for many use cases.

每月9美元的啟動計劃可滿足許多用例。

A warning on using IEX Cloud (and any other pay-per-use data provider): it is very important that you set usage budgets from the beginning. These budgets lock you out of your account once you hit a certain dollar cost for the month.

關于使用IEX Cloud(和任何其他按使用量付費數據提供者)的警告：從一開始就設置使用預算非常重要。一旦您當月達到一定的美元費用，這些預算就會使您無法進入帳戶。

When I first started using IEX Cloud, I accidentally created an infinite loop on a Friday afternoon that contained an API call to IEX Cloud. These API calls are priced on a cost-per-call basis...which resulted in a terrifying email from IEX:

剛開始使用IEX Cloud時，我意外地在星期五下午創建了一個無限循環，其中包含對IEX Cloud的API調用。這些API調用是按每次調用費用定價的，這導致來自IEX的電子郵件令人恐懼：

It is a testament to IEX's customer-centricity that they agreed to reset my usage as long as I set usage budgets moving forward. Go IEX!

IEX以客戶為中心的證明是，只要我設定未來的使用預算，他們就同意重置我的使用情況。去IEX！

As with most API subscriptions, the main benefit of creating an IEX Cloud account is having an API key.

與大多數API訂閱一樣，創建IEX Cloud帳戶的主要好處是擁有API密鑰。

For obvious reasons, I will not share an API key in this article.

出于明顯的原因，我不會在本文中共享API密鑰。

However, you can still work through this tutorial with your own API key as long as you assign it to the following variable name:

但是，只要將其分配給以下變量名，您仍然可以使用自己的API密鑰來完成本教程：

IEX_API_Key

You will see the blank IEX_API_Key variable in my code blocks throughout the rest of this tutorial.

在本教程的其余部分中，您將在我的代碼塊中看到空白的IEX_API_Key變量。

第2步：編寫Python腳本 (Step 2: Write Your Python Script)

Now that you have access to the API key that you'll need to gather financial data, it's time to write your Python script.

現在您已經可以訪問收集財務數據所需的API密鑰，現在該編寫Python腳本了。

This will be the longest section of this tutorial. It is also the most flexible - we are going to create a Python script that satisfies certain pre-specified criteria, but you could modify this section to really create any spreadsheet you want!

這將是本教程中最長的部分。它也是最靈活的-我們將創建一個滿足某些預先指定條件的Python腳本，但是您可以修改此部分以真正創建所需的任何電子表格！

To start, let's lay out our goal posts. We are going to write a Python script that generates an Excel file of stock market data with the following characteristics:

首先，讓我們布置目標職位。我們將編寫一個Python腳本，該腳本生成具有以下特征的股市數據的Excel文件：

It will include the 10 largest stocks in the United States
它將包括美國十大股票
It will contain four columns: stock ticker, company name, share price, and dividend yield.
它包含四列：股票行情，股票名稱，股價和股息收益率。
It will be formatted such that the header's background color is #135485 and text is white, while the spreadsheet body's background is #DADADA and the font color is black (the default).
格式將設置為：標題的背景顏色為#135485 ，文本為白色，而電子表格正文的背景為#DADADA ，字體顏色為黑色(默認值)。

Let's start by importing our first package.

讓我們從導入第一個包開始。

Since spreadsheets are essentially just data structures with rows and columns, then the pandas library - including its built-in DataFrame object - is a perfect candidate for manipulating data in this tutorial.

由于電子表格本質上只是具有行和列的數據結構，因此pandas庫(包括其內置的DataFrame對象)是本教程中處理數據的理想選擇。

We'll start by importing pandas under the alias pd like this:

我們將以這樣的別名pd導入pandas ：

import pandas as pd

Next, we'll specify our IEX Cloud API key. As I mentioned before, I'm not going to really include my API key, so you'll have to grab your own API key from your IEX account and include it here:

接下來，我們將指定IEX Cloud API密鑰。如前所述，我不會真正包含我的API密鑰，因此您必須從IEX帳戶中獲取自己的API密鑰并將其包含在此處：

IEX_API_Key = ''

Our next step is to determine the ten largest companies in the United States.

我們的下一步是確定美國十大公司。

You can answer this question with a quick Google search.

您可以通過快速的Google搜索回答此問題。

For brevity, I have included the companies (or rather, their stock tickers) in the following Python list:

為簡便起見，我將公司(或更確切地說，它們的股票行情自動收錄器)包括在以下Python列表中：

Next, it is time to figure out how to ping the IEX Cloud API to pull in the metrics we need for each company.

接下來，是時候弄清楚如何對IEX Cloud API進行ping操作，以獲取每個公司所需的指標了。

The IEX Cloud API returns JSON objects in response to HTTP requests. Since we are working with more than 1 ticker in this tutorial, we will use IEX Cloud's batch API call functionality, which allows you to request data on more than one ticker at a time. Using batch API calls has two benefits:

IEX Cloud API返回JSON對象以響應HTTP請求。由于我們在本教程中使用多個代碼，因此我們將使用IEX Cloud的批處理API調用功能，該功能可讓您一次在多個代碼上請求數據。使用批處理API調用有兩個好處：

It reduces the number of HTTP requests you need to make, which will make your code more performant.
它減少了您需要發出的HTTP請求的數量，這將使您的代碼更具性能。
The pricing for batch API calls is slightly better with most data providers.
對于大多數數據提供者，批處理API調用的定價略高。

Here is an example of what the HTTP request might look like, with a few placeholder words where we'll need to customize the request:

這是HTTP請求的外觀示例，其中有一些占位符，我們需要自定義該請求：

https://cloud.iexapis.com/stable/stock/market/batch?symbols=TICKERS&types=ENDPOINTS&range=RANGE&token=IEX_API_Key

In this URL, we'll replace these variables with the following values:

在此URL中，我們將這些變量替換為以下值：

TICKERS will be replaced by a string that contains each of our tickers separated by a comma.
TICKERS將替換為包含以逗號分隔的每個TICKERS代碼的字符串。
ENDPOINTS will be replaced by a string that contains each of the IEX Cloud endpoints we want to hit, separated by a comma.
ENDPOINTS將替換為包含我們要命中的每個IEX Cloud端點的字符串，并用逗號分隔。
RANGE will be replaced by 1y. These endpoints each contain point-in-time data and not time series data, so this range can really be whatever you want.
RANGE將替換為1y 。這些端點每個都包含時間點數據而不是時間序列數據，因此此范圍實際上可以是您想要的任何范圍。

Let's put this URL into a variable called HTTP_request for us to modify later:

讓我們將此URL放入一個名為HTTP_request的變量中，以便我們稍后進行修改：

HTTP_request = 'https://cloud.iexapis.com/stable/stock/market/batch?symbols=TICKERS&types=ENDPOINTS&range=RANGE&token=IEX_API_Key'

Let's work through each of these variables one-by-one to determine the exact URL that we need to hit.

讓我們逐一研究這些變量，以確定需要命中的確切URL。

For the TICKERS variable, we can generate a real Python variable (and not just a placeholder word) with a simple for loop:

對于TICKERS變量，我們可以使用簡單的for循環生成一個真實的Python變量(而不僅僅是占位符)：

#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:ticker_string += tickerticker_string += ','#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]

Now we can interpolate our ticker_string variable into the HTTP_request variable that we created earlier using an f-string:

現在，我們可以將我們的ticker_string變量插值到我們先前使用f字符串創建的HTTP_request變量中：

HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types=ENDPOINTS&range=RANGE&token=IEX_API_Key'

Next, we need to determine which IEX Cloud endpoints we need to ping.

接下來，我們需要確定需要ping的IEX Cloud端點。

Some quick investigation into the IEX Cloud documentation reveals that we only need the price and stats endpoints to create our spreadsheet.

對IEX Cloud文檔的一些快速調查顯示，我們只需要price和stats端點即可創建電子表格。

Thus, we can replace the placeholder ENDPOINTS word from our original HTTP request with the following variable:

因此，我們可以將原始HTTP請求中的占位符ENDPOINTS單詞替換為以下變量：

endpoints = 'price,stats'

Like we did with our ticker_string variable, let's substitute the endpoints variable into the ticker_string variable:

就像我們對ticker_string變量所做的一樣，讓我們??將endpoints變量替換為ticker_string變量：

HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=RANGE&token=IEX_API_Key'

The last placeholder we need to replace is RANGE. We will not replace with this a variable. Instead, we can hardcode a 1y directly into the URL path like this:

我們需要替換的最后一個占位符是RANGE 。我們不會用這個替換變量。相反，我們可以將1y直接硬編碼到URL路徑中，如下所示：

https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token=IEX_API_Key

We've done a lot so far, so let's recap our code base:

到目前為止，我們已經做了很多工作，所以讓我們回顧一下代碼庫：

import pandas as pdIEX_API_Key = ''#Specify the stock tickers that will be included in our spreadsheet
tickers = ['MSFT','AAPL','AMZN','GOOG','FB','BRK.B','JNJ','WMT','V','PG']#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:ticker_string += tickerticker_string += ','#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]#Create the endpoint strings
endpoints = 'price,stats'#Interpolate the endpoint strings into the HTTP_request string
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token={IEX_API_Key}'

It is now time to ping the API and save its data into a data structure within our Python application.

現在是時候ping API并將其數據保存到我們的Python應用程序中的數據結構中了。

We can read ?JSON objects with pandas' read_json method. In our case, we'll save the JSON data to a pandas DataFrame called raw_data, like this:

我們可以使用pandas的read_json方法讀取JSON對象。在本例中，我們將JSON數據保存到名為raw_data的pandas DataFrame ，如下所示：

raw_data = pd.read_json(HTTP_request)

Let's take a moment now to make sure that the data has been imported in a nice format for our application.

現在讓我們花一點時間來確保已為我們的應用程序以一種不錯的格式導入了數據。

If you're working through this tutorial in a Jupyter Notebook, you can simply type the name of the pandas DataFrame variable on the last line of a code cell, and Jupyter will nicely render an image of the data, like this:

如果您正在Jupyter Notebook中完成本教程，則只需在代碼單元的最后一行鍵入pandas DataFrame變量的名稱，Jupyter就會很好地呈現數據圖像，如下所示：

As you can see, the pandas DataFrame contains a column for each stock ticker and two rows: one for the stats endpoint and one for the price endpoint. We will need to parse this DataFrame to get the four metrics we want. Let's work through the metrics one-by-one in the steps below.

如您所見， pandas DataFrame的每個股票行情pandas DataFrame包含一列，兩行分別是： stats端點和price端點。我們將需要解析此DataFrame以獲得所需的四個指標。讓我們在以下步驟中一步一步地研究指標。

指標1：股票行情 (Metric 1: Stock Ticker)

This step is very straightforward since the stock tickers are contained in the columns of the pandas DataFrame. We can access them through the columns attribute of the pandas DataFrame like this:

由于股票行情收錄器包含在pandas DataFrame的列中，因此這一步驟非常簡單。我們可以通過pandas DataFrame的columns屬性訪問它們，如下所示：

raw_data.columns

To access the other metrics in raw_data, we will create a for loop that loops through each ticker in raw_data.columns. In each iteration of the loop we will add the data to a new pandas DataFrame object called output_data.

要訪問raw_data的其他指標，我們將創建一個for循環，循環遍歷raw_data.columns每個報價器。在循環的每次迭代中，我們都將數據添加到名為output_data的新pandas DataFrame對象中。

First we'll need to create output_data, which should be an empty pandas DataFrame with four columns. Here's how to do this:

首先，我們需要創建output_data ，它應該是一個具有四列的空pandas DataFrame 。這樣做的方法如下：

output_data = pd.DataFrame(pd.np.empty((0,4)))

This creates an empty pandas DataFrame with 0 rows and 4 columns.

這將創建一個空的pandas DataFrame其中包含0行和4列。

Now that this object has been created, here's how we can structure this for loop:

現在已經創建了該對象，這是我們如何構造此for循環的方法：

for ticker in raw_data.columns:#Parse the company's name - not completed yetcompany_name = ''#Parse the company's stock price - not completed yetstock_price = 0#Parse the company's dividend yield - not completed yetdividend_yield = 0new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)

Next, let's determine how to parse the company_name variable from the raw_data object.

接下來，讓我們確定如何從raw_data對象中解析company_name變量。

指標2：公司名稱 (Metric 2: Company Name)

The company_name variable is the first variable will need to be parsed from the raw_data object. As a quick recap, here's what raw_data looks like:

company_name變量是第一個需要從raw_data對象解析的變量。快速回顧一下， raw_data如下所示：

The company_name variable is held within the stats endpoint under the dictionary key companyName. To parse this data point out of raw_data, we can use these indexes:

company_name變量保存在stats端點內的字典鍵companyName 。要從raw_data解析此數據點，我們可以使用以下索引：

raw_data[ticker]['stats']['companyName']

Including this in our for loop from before gives this:

從之前將其包含在我們的for循環中將得到以下結果：

output_data = pd.DataFrame(pd.np.empty((0,4)))for ticker in raw_data.columns:#Parse the company's name - not completed yetcompany_name = raw_data[ticker]['stats']['companyName']#Parse the company's stock price - not completed yetstock_price = 0#Parse the company's dividend yield - not completed yetdividend_yield = 0new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)

Let's move on to parsing stock_price.

讓我們繼續分析stock_price 。

指標3：股價 (Metric 3: Stock Price)

The stock_price variable is contained within the price endpoint, which returns only a single value. This means we do not need to chain together indexes like we did with company_name.

stock_price變量包含在price端點內，該端點僅返回單個值。這意味著我們不需要像對company_name那樣將索引鏈接在一起。

Here's how we could parse stock_price from raw_data:

這是我們從raw_data解析stock_price ：

raw_data[ticker]['price']

Including this in our for loop gives us:

在我們的for循環中包含它for使我們：

output_data = pd.DataFrame(pd.np.empty((0,4)))for ticker in raw_data.columns:#Parse the company's name - not completed yetcompany_name = raw_data[ticker]['stats']['companyName']#Parse the company's stock price - not completed yetstock_price = raw_data[ticker]['price']#Parse the company's dividend yield - not completed yetdividend_yield = 0new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)

The last metric we need to parse is dividend_yield.

我們需要解析的最后一個指標是dividend_yield 。

指標4：股息收益率 (Metric 4: Dividend Yield)

Like company_name, dividend_yield is contained in the stats endpoint. It is held under the dividendYield dictionary key.

像company_name一樣， dividend_yield端點也包含在stats端點中。它被保存在dividendYield字典關鍵字下。

Here is how we could parse it out of raw_data:

這是我們可以從raw_data解析出來的方法：

raw_data[ticker]['stats']['dividendYield']

Adding this to our for loop gives us:

將其添加到我們的for循環中可以使我們：

output_data = pd.DataFrame(pd.np.empty((0,4)))for ticker in raw_data.columns:#Parse the company's name - not completed yetcompany_name = raw_data[ticker]['stats']['companyName']#Parse the company's stock price - not completed yetstock_price = raw_data[ticker]['price']#Parse the company's dividend yield - not completed yetdividend_yield = raw_data[ticker]['stats']['dividendYield']new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)

Let's print out our output_data object to see what the data looks like:

讓我們打印出output_data對象，看看數據是什么樣的：

So far so good! The next two steps are to name the columns of the pandas DataFrame and to change its index.

到目前為止，一切都很好！接下來的兩個步驟是命名pandas DataFrame的列并更改其索引。

如何命名熊貓數據框的列 (How to Name the Columns of a Pandas DataFrame)

We can update the column names of our output_data object by creating a list of column names and assigning it to the output_data.columns attribute, like this:

我們可以通過創建列名列表并將其分配給output_data.columns屬性來更新output_data對象的列名，如下所示：

output_data.columns = ['Ticker', 'Company Name', 'Stock Price', 'Dividend Yield']

Let's print out our output_data object to see what the data looks like:

讓我們打印出output_data對象，看看數據是什么樣的：

Much better! Let's change the index of output_data next.

好多了！接下來讓我們更改output_data的索引。

如何更改熊貓數據框的索引 (How to Change the Index of a Pandas DataFrame)

The index of a pandas DataFrame is a special column that is somewhat similar to the primary key of a SQL database table. In our output_data object, we want to set the Ticker column as the DataFrame's index.

pandas DataFrame的索引是一個特殊的列，與SQL數據庫表的主鍵有些相似。在我們的output_data對象中，我們想要將Ticker列設置為DataFrame的索引。

Here's how we can do this using the set_index method:

使用set_index方法的方法如下：

output_data.set_index('Ticker', inplace=True)

Let's print out our output_data object to see what the data looks like:

讓我們打印出output_data對象，看看數據是什么樣的：

Another incremental improvement!

另一個增量的改進！

Next, let's deal with the missing data in output_data.

接下來，讓我們處理output_data丟失的數據。

如何處理Pandas DataFrame中的缺失數據 (How to Handle Missing Data in Pandas DataFrames)

If you take a close look at output_data, you will notice that there are several None values in the Dividend Yield column:

如果仔細查看output_data ，您會注意到“ Dividend Yield列中有多個“ None值：

These None values simply indicate that the company for that row does not currently pay a dividend. While None is one way of representing a non-dividend stock, it is more common to show a Dividend Yield of 0.

這些“ None值僅表示該行的公司當前不支付股息。雖然None是表示非股息股票的一種方式，但更常見的是將Dividend Yield顯示為0 。

Fortunately, the fix for this is quite straightforward. The pandas library includes an excellent fillna method that allows us to replace missing values in a pandas DataFrame.

幸運的是，解決方法非常簡單。 pandas庫包含出色的fillna方法，該方法使我們可以替換pandas DataFrame缺失值。

Here's how we can use the fillna method to replace our Dividend Yield column's None values with 0:

這是我們可以使用fillna方法將“ Dividend Yield列的None值替換為0 ：

output_data['Dividend Yield'].fillna(0,inplace=True)

The output_data object looks much cleaner now:

現在， output_data對象看起來更加干凈：

We are now ready to export our DataFrame to an Excel document! As a quick recap, here is our Python script to date:

現在，我們準備將DataFrame導出到Excel文檔中！快速回顧一下，這是迄今為止的Python腳本：

import pandas as pdIEX_API_Key = ''#Specify the stock tickers that will be included in our spreadsheet
tickers = ['MSFT','AAPL','AMZN','GOOG','FB','BRK.B','JNJ','WMT','V','PG']#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:ticker_string += tickerticker_string += ','#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]#Create the endpoint strings
endpoints = 'price,stats'#Interpolate the endpoint strings into the HTTP_request string
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token={IEX_API_Key}'#Create an empty pandas DataFrame to append our parsed values into during our for loop
output_data = pd.DataFrame(pd.np.empty((0,4)))for ticker in raw_data.columns:#Parse the company's namecompany_name = raw_data[ticker]['stats']['companyName']#Parse the company's stock pricestock_price = raw_data[ticker]['price']#Parse the company's dividend yielddividend_yield = raw_data[ticker]['stats']['dividendYield']new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])output_data = output_data.append(new_column, ignore_index = True)#Change the column names of output_data
output_data.columns = ['Ticker', 'Company Name', 'Stock Price', 'Dividend Yield']#Change the index of output_data
output_data.set_index('Ticker', inplace=True)#Replace the missing values of the 'Dividend Yield' column with 0
output_data['Dividend Yield'].fillna(0,inplace=True)#Print the DataFrame
output_data

如何使用XlsxWriter從Pandas DataFrame導出樣式化的Excel文檔 (How to Export A Styled Excel Document From a Pandas DataFrame using XlsxWriter)

There are multiple ways to export an xlsx file from a pandas DataFrame.

有多種方法可以從pandas DataFrame導出xlsx文件。

The easiest way is to use the built-in function to_excel. As an example, here's how we could export output_data to an Excel file:

最簡單的方法是使用內置函數to_excel 。例如，這是我們將output_data導出到Excel文件的方法：

output_data.to_excel('my_excel_document.xlsx)

The problem with this approach is that the Excel file has no format whatsoever. The output looks like this:

這種方法的問題在于Excel文件沒有任何格式。輸出如下所示：

The lack of formatting in this document makes it hard to interpret.

本文檔缺乏格式，使其難以解釋。

What is the solution?

解決辦法是什么？

We can use the Python package XlsxWriter to generate nicely-formatted Excel files. To start, we'll want to add the following import to the beginning of our Python script:

我們可以使用Python包XlsxWriter生成格式正確的Excel文件。首先，我們要在Python腳本的開頭添加以下導入：

import xlsxwriter

Next, we need to create our actual Excel file. The XlsxWriter package actually has a dedicated documentation page for how to work with pandas DataFrames, which is available here.

接下來，我們需要創建實際的Excel文件。 XlsxWriter軟件包實際上有一個專門的文檔頁面，介紹如何使用pandas DataFrames ，可在此處找到。

Our first step is to call the pd.ExcelWriter function and pass in the desired name of our xlsx file as the first argument and engine='xlsxwriter as the second argument. We will assign this to a variable called writer:

我們的第一步是調用pd.ExcelWriter函數，并將我們的xlsx文件的所需名稱作為第一個參數傳遞，并將engine='xlsxwriter作為第二個參數engine='xlsxwriter 。我們將其分配給一個稱為writer的變量：

writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')

From there, we need to call the to_excel method on our pandas DataFrame. This time, instead of passing in the name of the file that we're trying to export, we'll pass in the writer object that we just created:

從那里，我們需要在pandas DataFrame上調用to_excel方法。這次，我們將傳遞剛剛創建的writer對象，而不是傳遞我們試圖導出的文件名：

output_data.to_excel(writer, sheet_name='Sheet1')

Lastly, we will call the save method on our writer object, which saves the xlsx file to our current working directory. When all this is done, here is the section of our Python script that saves output_data to an Excel file.

最后，我們將在writer對象上調用save方法，該方法將xlsx文件保存到當前工作目錄中。完成所有這些操作后，這就是我們的Python腳本部分，該部分將output_data保存到Excel文件。

writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')output_data.to_excel(writer, sheet_name='Sheet1')writer.save()

All of the formatting code that we will include in our xlsx file needs to be contained between the creation of the ExcelWriter object and the writer.save() statement.

在創建ExcelWriter對象和writer.save()語句之間，必須包含將包含在xlsx文件中的所有格式代碼。

如何為使用Python創建的`xlsx`文件設置樣式 (How to Style an `xlsx` File Created with Python)

It is actually harder than you might think to style an Excel file using Python.

實際上，這比使用Python設置Excel文件樣式的難度要大。

This is partially because of some of the limitations of the XlsxWriter package. Its documentation states:

部分原因是XlsxWriter軟件包的某些限制。其文檔指出：

'XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates or datetimes. In addition it isn’t possible to format any cells that already have a default format applied.
“ XlsxWriter和Pandas除了默認格式(如標頭和索引單元格以及任何包含日期或日期時間的單元格)外，幾乎沒有支持格式化數據幀中的輸出數據。 此外，無法格式化已應用默認格式的任何單元格。

If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas. However, some formatting options are available.'
如果您需要對數據幀輸出進行嚴格控制的格式設置，那么最好直接使用Xlsxwriter處理來自Pandas的原始數據。 但是，有些格式選項可用。

In my experience, the most flexible way to style cells in an xlsx file created by XlsxWriter is to use conditional formatting that only applies styling when a cell is not equal to None.

以我的經驗，在XlsxWriter創建的xlsx文件中設置單元格樣式的最靈活的方法是使用條件格式，該條件格式僅在單元格不等于None時才應用樣式設置。

This has three advantages:

這具有三個優點：

It provides more styling flexibility than the normal formatting options available in XlsxWriter.
與XlsxWriter中可用的常規格式設置選項相比，它提供了更多的樣式靈活性。
You do not need to manually loop through each data point and import them into the writer object one-by-one.
您無需手動遍歷每個數據點，并將它們一個接一個地導入到writer對象中。
It allows you to easily see when None values have made their way into your finalized xlsx files, since they'll be missing the required formatting.
它使您可以輕松地查看None值何時進入了最終的xlsx文件，因為它們將缺少所需的格式。

To apply styling using conditional formatting, we first need to create a few style templates. Specifically, we will need four templates:

要使用條件格式應用樣式，我們首先需要創建一些樣式模板。具體來說，我們將需要四個模板：

One header_template that will be applied to the column names at the top of the spreadsheet
一個header_template將應用于電子表格頂部的列名稱
One string_template that will be applied to the Ticker and Company Name columns
一個將應用于“ Ticker和“ Company Name列的string_template
One dollar_template that will be applied to the Stock Price column
一dollar_template將應用于“ Stock Price列
One percent_template that will be applied to the Dividend Yield column
將應用于“ Dividend Yield列的一個percent_template

Each of these format templates need to be added to the writer object in dictionaries that resemble CSS syntax. Here's what I mean:

這些格式模板中的每一個都需要以類似于CSS語法的字典添加到writer對象。這就是我的意思：

header_template = writer.book.add_format({'font_color': '#ffffff','bg_color': '#135485','border': 1})string_template = writer.book.add_format({'bg_color': '#DADADA','border': 1})dollar_template = writer.book.add_format({'num_format':'$0.00','bg_color': '#DADADA','border': 1})percent_template = writer.book.add_format({'num_format':'0.0%','bg_color': '#DADADA','border': 1})

To apply these formats to specific cells in our xlsx file, we need to call the package's conditional_format method on ?writer.sheets['Stock Market Data']. Here is an example:

要將這些格式應用于xlsx文件中的特定單元格，我們需要在writer.sheets['Stock Market Data']上調用程序包的conditional_format方法。這是一個例子：

writer.sheets['Stock Market Data'].conditional_format('A2:B11', {'type':     'cell','criteria': '<>','value':    '"None"','format':   string_template})

If we generalize this formatting to the other three formats we're applying, here's what the formatting section of our Python script becomes:

如果將這種格式概括為我們將要應用的其他三種格式，這就是Python腳本的格式部分：

writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')output_data.to_excel(writer, sheet_name='Stock Market Data')header_template = writer.book.add_format({'font_color': '#ffffff','bg_color': '#135485','border': 1})string_template = writer.book.add_format({'bg_color': '#DADADA','border': 1})dollar_template = writer.book.add_format({'num_format':'$0.00','bg_color': '#DADADA','border': 1})percent_template = writer.book.add_format({'num_format':'0.0%','bg_color': '#DADADA','border': 1})#Format the header of the spreadsheet
writer.sheets['Stock Market Data'].conditional_format('A1:D1', {'type':     'cell','criteria': '<>','value':    '"None"','format':   header_template})#Format the 'Ticker' and 'Company Name' columns
writer.sheets['Stock Market Data'].conditional_format('A2:B11', {'type':     'cell','criteria': '<>','value':    '"None"','format':   string_template})#Format the 'Stock Price' column
writer.sheets['Stock Market Data'].conditional_format('C2:C11', {'type':     'cell','criteria': '<>','value':    '"None"','format':   dollar_template})#Format the 'Dividend Yield' column
writer.sheets['Stock Market Data'].conditional_format('D2:D11', {'type':     'cell','criteria': '<>','value':    '"None"','format':   percent_template})writer.save()

Let's take a look at our Excel document to see how its looking:

讓我們看一下我們的Excel文檔，看看它的外觀：

So far so good! The last incremental improvement that we can make to this document is to make its columns a bit wider.

到目前為止，一切都很好！我們可以對本文檔進行的最后一個增量改進是使其列更寬。

We can specify column widths by calling the set_column method on writer.sheets['Stock Market Data'].

我們可以通過調用writer.sheets['Stock Market Data']的set_column方法來指定列寬。

Here's what we'll add to our Python script to do this:

這是我們將添加到Python腳本中的操作：

#Specify all column widths
writer.sheets['Stock Market Data'].set_column('B:B', 32)
writer.sheets['Stock Market Data'].set_column('C:C', 18)
writer.sheets['Stock Market Data'].set_column('D:D', 20)

Here's the final version of the spreadsheet:

這是電子表格的最終版本：

Voila! We are good to go! You can access the final version of this Python script on GitHub here. The file is named stock_market_data.py.

瞧！我們很好走！您可以在GitHub上訪問此Python腳本的最終版本。該文件名為stock_market_data.py 。

步驟3：設置AWS EC2虛擬機以運行Python腳本 (Step 3: Set Up an AWS EC2 Virtual Machine to Run Your Python Script)

Your Python script is finalized and ready to run.

您的Python腳本已完成并可以運行。

However, we do not want to simply run this on our local machine on an ad hoc basis.

但是，我們不想臨時在本地計算機上簡單地運行它。

Instead, we are going to set up a virtual machine using Amazon Web Services' Elastic Compute Cloud (EC2) service.

相反，我們將使用Amazon Web Services的Elastic Compute Cloud (EC2)服務來設置虛擬機。

You'll need to create an AWS account first if you do not already have one. To do this, navigate to this URL and click the "Create an AWS Account" in the top-right corner:

如果您還沒有一個AWS賬戶，則需要先創建一個。為此，請導航至該URL，然后單擊右上角的“創建AWS賬戶”：

AWS' web application will guide you through the steps to create an account.

AWS的Web應用程序將指導您完成創建帳戶的步驟。

Once your account is created, ?you'll need to create an EC2 instance. This is simply a virtual server for running code on AWS infrastructure.

創建帳戶后，您需要創建一個EC2實例。這只是一個用于在AWS基礎架構上運行代碼的虛擬服務器。

EC2 instances come in various operating systems and sizes, ranging from very small servers that qualify for AWS' free tier to very large servers capable of running complex applications.

EC2實例具有各種操作系統和大小，范圍從符合AWS免費層標準的小型服務器到能夠運行復雜應用程序的大型服務器。

We will use AWS' smallest server to run the Python script that we wrote in this article. To get started, navigate to EC2 within the AWS management console. Once you've arrived within EC2, click Launch Instance:

我們將使用AWS最小的服務器來運行我們在本文中編寫的Python腳本。要開始使用，請在AWS管理控制臺中導航到EC2。進入EC2后，點擊Launch Instance ：

This will bring you to a screen that contains all of the available instance types within AWS EC2. Any machine that qualifies for AWS' free tier will be sufficient.

這將帶您到一個包含AWS EC2內所有可用實例類型的屏幕。符合AWS免費套餐條件的任何計算機就足夠了。

I chose the Amazon Linux 2 AMI (HVM):

我選擇了Amazon Linux 2 AMI (HVM) ：

Click Select to proceed.

單擊Select繼續。

On the next page, AWS will ask you to select the specifications for your machine. The fields you can select include:

在下一頁上，AWS將要求您選擇計算機的規格。您可以選擇的字段包括：

Family
Family
Type
Type
vCPUs
vCPUs
Memory
Memory
Instance Storage (GB)
Instance Storage (GB)
EBS-Optimized
EBS-Optimized
Network Performance
Network Performance
IPv6 Support
IPv6 Support

For the purpose of this tutorial, we simply want to select the single machine that is free tier eligible. It is characterized by a small green label that looks like this:

就本教程而言，我們只想選擇符合免費套餐資格的單臺計算機。它的特征是帶有一個小的綠色標簽，如下所示：

Once you have selected a free tier eligible machine, click Review and Launch at the bottom of the screen to proceed. The next screen will present the details of your new instance for you to review. Quickly review the machine's specifications, then click Launch in the bottom right-hand corner.

選擇符合條件的免費套餐計算機后，請單擊屏幕底部的“ Review and Launch ”以繼續。下一個屏幕將顯示新實例的詳細信息供您查看。快速查看機器的規格，然后單擊右下角的Launch 。

Clicking the Launch button will trigger a popup that asks you to Select an existing key pair or create a new key pair. A key pair is comprised of a public key that AWS holds and a private key that you must download and store within a .pem file. You must have access to that .pem file in order to access your EC2 instance (typically via SSH). You also have the option to proceed without a key pair, but this is not recommended for security reasons.

單擊Launch按鈕將觸發一個彈出窗口，要求您Select an existing key pair or create a new key pair 。密鑰對由AWS持有的公共密鑰和必須下載并存儲在.pem文件中的私有密鑰組成。您必須有權訪問該.pem文件才能訪問您的EC2實例(通常通過SSH)。您還可以選擇不使用密鑰對繼續進行操作，但是出于安全原因，不建議這樣做。

Once you have selected or created a key pair for this EC2 instance and click the radio button for I acknowledge that I have access to the selected private key file (data-feeds.pem), and that without this file, I won't be able to log into my instance, you can click Launch Instances to proceed.

為該EC2實例選擇或創建密鑰對后，請單擊單選按鈕，因為I acknowledge that I have access to the selected private key file (data-feeds.pem), and that without this file, I won't be able to log into my instance ，您可以單擊Launch Instances繼續。

Your instance will now begin to launch. It can take some time for these instances to boot up, but once its ready, its Instance State will show as running in your EC2 dashboard.

您的實例現在將開始啟動。這些實例啟動可能需要一些時間，但是一旦準備就緒，其Instance State將在您的EC2儀表板中顯示為running 。

Next, you will need to push your Python script into your EC2 instance. Here is a generic command state statement that allows you to move a file into an EC2 instance:

接下來，您將需要將Python腳本推送到EC2實例中。這是一條通用的命令狀態語句，它使您可以將文件移動到EC2實例中：

scp -i path/to/.pem_file path/to/file   username@host_address.amazonaws.com:/path_to_copy

Run this statement with the necessary replacements to move stock_market_data.py into the EC2 instance.

運行此語句并進行必要的替換，以將stock_market_data.py移入EC2實例。

Trying to run stock_market_data.py at this point will actually result in an error because the EC2 instance does not come with the necessary Python packages.

此時嘗試運行stock_market_data.py實際上會導致錯誤，因為EC2實例未附帶必需的Python軟件包。

To fix this, you can either export a requirements.txt file and import the proper packages using pip, or you can simply run the following:

要解決此問題，您可以導出requirements.txt文件并使用pip導入適當的軟件包，也可以簡單地運行以下命令：

sudo yum install python3-pip
pip3 install pandas
pip3 install xlsxwriter

Once this is done, you can SSH into the EC2 instance and run the Python script from the command line with the following statement:

完成此操作后，您可以通過SSH進入EC2實例，并使用以下語句從命令行運行Python腳本：

python3 stock_market_data.py

步驟4：創建一個AWS S3存儲桶以保存完成的Python腳本 (Step 4: Create an AWS S3 Bucket to Hold the Finished Python Script)

With the work that we have completed so far, our Python script can be executed inside of our EC2 instance.

到目前為止，我們已經完成了工作，可以在EC2實例內部執行Python腳本。

The problem with this is that the xlsx file will be saved to the AWS virtual server.

問題是xlsx文件將保存到AWS虛擬服務器。

It is not accessible to anyone but us in that server, which limits its usefulness.

該服務器中除我們以外的任何人都無法訪問它，這限制了它的實用性。

To fix this, we are going to create a public bucket on AWS S3 where we can save the xlsx file. Anyone who has the right URL will be able to download this file once this change is made.

為了解決這個問題，我們將在AWS S3上創建一個公共存儲桶，在其中可以保存xlsx文件。進行此更改后，擁有正確URL的任何人都可以下載此文件。

To start, navigate to AWS S3 from within the AWS Management Console. Click Create bucket in the top right:

首先，從AWS管理控制臺中導航到AWS S3。點擊右上角的Create bucket ：

On the next screen, you will need to pick a name for your bucket and an AWS region for the bucket to be hosted in. The bucket name must be unique and cannot contain spaces or uppercase letters. The region does not matter much for the purpose of this tutorial, so I will be using the default region of US East (Ohio) us-east-2).

在下一個屏幕上，您將需要為存儲桶選擇一個名稱，并為要托管的存儲桶選擇一個AWS區域。存儲桶名稱必須唯一，并且不能包含空格或大寫字母。該區域對于本教程而言并不重要，因此我將使用US East (Ohio) us-east-2)的默認區域。

You will need to change the Public Access settings in the next section to match this configuration:

您將需要在下一部分中更改“公共訪問”設置以匹配此配置：

Click Create bucket to create your bucket and conclude this step of this tutorial!

單擊Create bucket以創建您的存儲桶，并完成本教程的這一步！

步驟5：修改Python腳本以將xlsx文件推送到AWS S3 (Step 5: Modify Your Python Script to Push the xlsx File to AWS S3)

Our AWS S3 bucket is now ready to hold our finalized xlsx document. We will now make a small change to our stock_market_data.py file to push the finalized document to our S3 bucket.

現在，我們的AWS S3存儲桶已準備好保存我們最終確定的xlsx文檔。現在，我們stock_market_data.py文件進行少量更改，以將最終文檔推送到我們的S3存儲桶中。

We will need to use the boto3 package to do this. boto3 is the AWS Software Development Kit (SDK) for Python, allowing Python developers to write software that connects to AWS services. To start, you'll need to install boto3 on your EC2 virtual machine. Run the following command line statement to do this:

我們將需要使用boto3軟件包來執行此操作。 boto3是用于Python的AWS軟件開發套件(SDK)，允許Python開發人員編寫連接到AWS服務的軟件。首先，您需要在EC2虛擬機上安裝boto3 。運行以下命令行語句以執行此操作：

pip3 install boto3

You will also need to import the library into stock_market_data.py by adding the following statement to the top of the Python script.

您還需要通過將以下語句添加到Python腳本的頂部，將庫導入stock_market_data.py 。

import boto3

We will need to add a few lines of code to the end of stock_market_data.py to push the final document to AWS S3.

我們將需要在stock_market_data.py的末尾添加幾行代碼，以將最終文檔推送到AWS S3。

s3 = boto3.resource('s3')
s3.meta.client.upload_file('stock_market_data.xlsx', 'my-S3-bucket', 'stock_market_data.xlsx', ExtraArgs={'ACL':'public-read'})

The first line of this code, s3 = boto3.resource('s3'), allows our Python script to connect to Amazon Web Services.

此代碼的第一行s3 = boto3.resource('s3')允許我們的Python腳本連接到Amazon Web Services。

The second line of code calls a method from boto3 that actually uploads our file to S3. It takes four arguments:

第二行代碼從boto3調用一個方法，該方法實際上將我們的文件上傳到S3。它包含四個參數：

stock_market_data.xlsx - the name of the file on our local machine.
stock_market_data.xlsx我們本地計算機上文件的名稱。
my-S3-bucket - the name of the S3 bucket that we're uploading our file to.
my-S3-bucket我們要將文件上傳到的S3存儲桶的名稱。
stock_market_data.xlsx - the desired name of the file within the S3 bucket. In most cases, this will have the same value as the first argument passed into this method.
stock_market_data.xlsx -S3存儲桶中文件的所需名稱。在大多數情況下，該值與傳遞給此方法的第一個參數的值相同。
ExtraArgs={'ACL':'public-read'} - this is an optional argument that tells AWS to make the uploaded file publicly-readable.
ExtraArgs={'ACL':'public-read'} -這是一個可選參數，告訴AWS使上傳的文件公開可讀。

步驟6：安排Python腳本使用Cron定期運行 (Step 6: Schedule Your Python Script to Run Periodically Using Cron)

So far, we have completed the following:

到目前為止，我們已經完成了以下工作：

Built our Python script
構建我們的Python腳本
Created an EC2 instance and deployed our code there
創建一個EC2實例并將代碼部署在那里
Created an S3 bucket where we can push the final xlsx document
創建了一個S3存儲桶，我們可以在其中推送最終的xlsx文檔
Modified the original Python script to upload the finalized stock_market_data.xlsx file to an AWS S3 bucket
修改了原始Python腳本以將最終的stock_market_data.xlsx文件上傳到AWS S3存儲桶

The only step that is left is to schedule the Python script to run periodically.

剩下的唯一步驟是安排Python腳本定期運行。

We can do this using a command-line utility called cron. To start, we will need to create a cron expression that tells the utility when to run the code. The crontab guru website is an excellent resource for this.

我們可以使用名為cron的命令行實用程序來執行此操作。首先，我們將需要創建一個cron表達式來告訴實用程序何時運行代碼。 crontab大師網站是一個很好的資源。

Here's how you can use crontab guru to get cron expression that means every day at noon:

這是使用crontab專家獲取cron表達式的方法，這意味著every day at noon ：

Now we need to instruct our EC2 instance's cron daemon to run stock_market_data.py at this time each day.

現在，我們需要指示EC2實例的cron守護進程每天每天的這個時候運行stock_market_data.py 。

To do this, we will first create a new file in our EC2 instance called stock_market_data.cron.

為此，我們將首先在EC2實例中創建一個名為stock_market_data.cron的新文件。

Open up this file and type in our cron expression followed by the statement that should be executed at the command line at that specified time.

打開此文件，然后鍵入cron表達式，然后輸入應在指定時間在命令行執行的語句。

Our command line statement is python3 stock_market_data.py, so here is what should be contained in stock_market_data.cron:

我們的命令行語句是python3 stock_market_data.py ，所以這是stock_market_data.cron應包含的stock_market_data.cron ：

00 12 * * * python3 stock_market_data.py

If you run an ls command in your EC2 instance, you should now see two files:

如果在EC2實例中運行ls命令，現在應該看到兩個文件：

stock_market_data.py	stock_market_data.cron

The last step of this tutorial is to load stock_market_data.cron into the crontab. You can think of the crontab as a file that contains commands and instructions for the cron daemon to execute. In other words, the crontab contains batches of cron jobs.

本教程的最后一步是將stock_market_data.cron加載到crontab 。您可以將crontab視為一個文件，其中包含cron守護程序要執行的命令和說明。換句話說， crontab包含一批cron作業。

First, let's see what's in our crontab. It should be empty since we have not put anything in it! You can view the contents of your crontab with the following command:

首先，讓我們看看crontab 。它應該是空的，因為我們沒有在其中放任何東西！您可以使用以下命令查看crontab的內容：

crontab -l

To load stock_market_data.cron into the crontab, run the following statement on the command line:

要將stock_market_data.cron加載到crontab ，請在命令行上運行以下語句：

crontab stock_market_data.cron

Now when you run crontab -l, you should see:

現在，當您運行crontab -l ，您應該看到：

00 12 * * * python3 stock_market_data.py

Our stock_market_data.py script will now run at noon every day on our AWS EC2 virtual machine!

現在，我們的stock_market_data.py腳本每天中午將在我們的AWS EC2虛擬機上運行！

最后的想法 (Final Thoughts)

In this article, you learned how to create automatically-updating Excel spreadsheets of financial data using Python, IEX Cloud, and Amazon Web Services.

在本文中，您學習了如何使用Python，IEX Cloud和Amazon Web Services創建自動更新財務數據的Excel電子表格。

Here are the specific steps we covered in this tutorial:

以下是我們在本教程中介紹的具體步驟：

How to create an account with IEX Cloud
如何使用IEX Cloud創建帳戶
How to write a Python script that generates beautiful Excel documents using pandas and XlsxWriter
如何編寫使用Pandas和XlsxWriter生成漂亮的Excel文檔的Python腳本
How to launch an AWS EC2 instance and deploy code on it
如何啟動AWS EC2實例并在其上部署代碼
How to create an AWS S3 bucket
如何創建一個AWS S3存儲桶
How to push files to an AWS S3 bucket from within a Python script
如何從Python腳本中將文件推送到AWS S3存儲桶
How to schedule code to run using the cron software utility
如何安排代碼使用cron軟件實用程序運行

This article was published by Nick McCullum, who teaches people how to code on his website.

這篇文章由尼克·麥卡魯姆(Nick McCullum)發表，他教人們如何在其網站上進行編碼。