期權數據獲取

by Harry Sauers

哈里·紹爾斯(Harry Sauers)

我如何免費獲得期權數據 (How I get options data for free)

網頁抓取金融簡介 (An introduction to web scraping for finance)

Ever wished you could access historical options data, but got blocked by a paywall? What if you just want it for research, fun, or to develop a personal trading strategy?

曾經希望您可以訪問歷史期權數據，但是卻被付費專區阻止了嗎？如果您只是想將其用于研究，娛樂或制定個人交易策略該怎么辦？

In this tutorial, you’ll learn how to use Python and BeautifulSoup to scrape financial data from the Web and build your own dataset.

在本教程中，您將學習如何使用Python和BeautifulSoup從Web刮取財務數據并構建自己的數據集。

入門 (Getting Started)

You should have at least a working knowledge of Python and Web technologies before beginning this tutorial. To build these up, I highly recommend checking out a site like codecademy to learn new skills or brush up on old ones.

在開始本教程之前，您應該至少具有Python和Web技術的工作知識。要建立這些基礎，我強烈建議您訪問codecademy之類的網站，以學習新技能或學習舊技能。

First, let’s spin up your favorite IDE. Normally, I use PyCharm but, for a quick script like this Repl.it will do the job too. Add a quick print (“Hello world”) to ensure your environment is set up correctly.

首先，讓我們啟動您最喜歡的IDE。通常，我使用PyCharm，但是對于像Repl.it這樣的快速腳本也可以完成此工作。添加快速打印(“ Hello world”)以確保正確設置您的環境。

Now we need to figure out a data source.

現在我們需要找出一個數據源。

Unfortunately, Cboe’s awesome options chain data is pretty locked down, even for current delayed quotes. Luckily, Yahoo Finance has solid enough options data here. We’ll use it for this tutorial, as web scrapers often need some content awareness, but it is easily adaptable for any data source you want.

不幸的是，即使對于當前的延遲報價， Cboe令人敬畏的期權鏈數據也已被鎖定。幸運的是，Yahoo Finance 在這里擁有足夠可靠的期權數據。我們將在本教程中使用它，因為網絡抓取工具通常需要一些內容意識，但是它很容易適應您想要的任何數據源。

依存關系 (Dependencies)

We don’t need many external dependencies. We just need the Requests and BeautifulSoup modules in Python. Add these at the top of your program:

我們不需要很多外部依賴。我們只需要Python中的Requests和BeautifulSoup模塊。將這些添加到程序頂部：

from bs4 import BeautifulSoupimport requests

Create a main method:

創建一個main方法：

def main():  print(“Hello World!”)if __name__ == “__main__”:  main()

刮HTML (Scraping HTML)

Now you’re ready to start scraping! Inside main(), add these lines to fetch the page’s full HTML:

現在您就可以開始抓取了！在main()內部，添加以下行以獲取頁面的完整HTML ：

data_url = “https://finance.yahoo.com/quote/SPY/options"data_html = requests.get(data_url).contentprint(data_html)

This fetches the page’s full HTML content, so we can find the data we want in it. Feel free to give it a run and observe the output.

這將獲取頁面的完整HTML內容，因此我們可以在其中找到所需的數據。隨意運行并觀察輸出。

Feel free to comment out print statements as you go — these are just there to help you understand what the program is doing at any given step.

隨時隨地注釋打印語句-這些語句可以幫助您了解程序在任何給定步驟中的操作。

BeautifulSoup is the perfect tool for working with HTML data in Python. Let’s narrow down the HTML to just the options pricing tables so we can better understand it:

BeautifulSoup是在Python中處理HTML數據的理想工具。讓我們將HTML的范圍縮小到期權定價表，以便我們可以更好地理解它：

content = BeautifulSoup(data_html, “html.parser”) # print(content)

options_tables = content.find_all(“table”) print(options_tables)

That’s still quite a bit of HTML — we can’t get much out of that, and Yahoo’s code isn’t the most friendly to web scrapers. Let’s break it down into two tables, for calls and puts:

那仍然是HTML大部分-我們不能從中得到很多，而且Yahoo的代碼對網絡抓取工具并不是最友好的。讓我們將其分解為兩個表，用于看漲期權和看跌期權：

options_tables = [] tables = content.find_all(“table”) for i in range(0, len(content.find_all(“table”))):   options_tables.append(tables[i])

print(options_tables)

Yahoo’s data contains options that are pretty deep in- and out-of-the-money, which might be great for certain purposes. I’m only interested in near-the-money options, namely the two calls and two puts closest to the current price.

雅虎的數據包含大量的價內和價外選項，對于某些用途而言可能非常有用。我只對近價期權感興趣，即最接近當前價格的兩個看漲期權和兩個看跌期權。

Let’s find these, using BeautifulSoup and Yahoo’s differential table entries for in-the-money and out-of-the-money options:

讓我們使用BeautifulSoup和Yahoo的差異表條目來選擇價內和價外選項，以找到這些：

expiration = datetime.datetime.fromtimestamp(int(datestamp)).strftime(“%Y-%m-%d”)

calls = options_tables[0].find_all(“tr”)[1:] # first row is header

itm_calls = []otm_calls = []

for call_option in calls:    if “in-the-money” in str(call_option):  itm_calls.append(call_option)  else:    otm_calls.append(call_option)

itm_call = itm_calls[-1]otm_call = otm_calls[0]

print(str(itm_call) + “ \n\n “ + str(otm_call))

Now, we have the table entries for the two options nearest to the money in HTML. Let’s scrape the pricing data, volume, and implied volatility from the first call option:

現在，我們有了最接近HTML的money的兩個選項的表條目。讓我們從第一個看漲期權中抓取定價數據，數量和隱含波動率：

itm_call_data = [] for td in BeautifulSoup(str(itm_call), “html.parser”).find_all(“td”):   itm_call_data.append(td.text)

print(itm_call_data)

itm_call_info = {‘contract’: itm_call_data[0], ‘strike’: itm_call_data[2], ‘last’: itm_call_data[3],  ‘bid’: itm_call_data[4], ‘ask’: itm_call_data[5], ‘volume’: itm_call_data[8], ‘iv’: itm_call_data[10]}

print(itm_call_info)

Adapt this code for the next call option:

將此代碼改編為下一個調用選項：

# otm callotm_call_data = []for td in BeautifulSoup(str(otm_call), “html.parser”).find_all(“td”):  otm_call_data.append(td.text)

# print(otm_call_data)

otm_call_info = {‘contract’: otm_call_data[0], ‘strike’: otm_call_data[2], ‘last’: otm_call_data[3],  ‘bid’: otm_call_data[4], ‘ask’: otm_call_data[5], ‘volume’: otm_call_data[8], ‘iv’: otm_call_data[10]}

print(otm_call_info)

Give your program a run!

運行您的程序！

You now have dictionaries of the two near-the-money call options. It’s enough just to scrape the table of put options for this same data:

現在，您將擁有兩個近乎全額認購期權的字典。只需為這些相同的數據刮入看跌期權表即可：

puts = options_tables[1].find_all("tr")[1:]  # first row is header

itm_puts = []  otm_puts = []

for put_option in puts:    if "in-the-money" in str(put_option):      itm_puts.append(put_option)    else:      otm_puts.append(put_option)

itm_put = itm_puts[0]  otm_put = otm_puts[-1]

# print(str(itm_put) + " \n\n " + str(otm_put) + "\n\n")

itm_put_data = []  for td in BeautifulSoup(str(itm_put), "html.parser").find_all("td"):    itm_put_data.append(td.text)

# print(itm_put_data)

itm_put_info = {'contract': itm_put_data[0],                  'last_trade': itm_put_data[1][:10],                  'strike': itm_put_data[2], 'last': itm_put_data[3],                   'bid': itm_put_data[4], 'ask': itm_put_data[5], 'volume': itm_put_data[8], 'iv': itm_put_data[10]}

# print(itm_put_info)

# otm put  otm_put_data = []  for td in BeautifulSoup(str(otm_put), "html.parser").find_all("td"):    otm_put_data.append(td.text)

# print(otm_put_data)

otm_put_info = {'contract': otm_put_data[0],                  'last_trade': otm_put_data[1][:10],                  'strike': otm_put_data[2], 'last': otm_put_data[3],                   'bid': otm_put_data[4], 'ask': otm_put_data[5], 'volume': otm_put_data[8], 'iv': otm_put_data[10]}

Congratulations! You just scraped data for all near-the-money options of the S&P 500 ETF, and can view them like this:

恭喜你！您只需收集S＆P 500 ETF所有近價期權的數據，就可以像這樣查看它們：

print("\n\n") print(itm_call_info) print(otm_call_info) print(itm_put_info) print(otm_put_info)

Give your program a run — you should get data like this printed to the console:

運行您的程序-您應該將這樣的數據打印到控制臺：

{‘contract’: ‘SPY190417C00289000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘289.00’, ‘last’: ‘1.46’, ‘bid’: ‘1.48’, ‘ask’: ‘1.50’, ‘volume’: ‘4,646’, ‘iv’: ‘8.94%’}{‘contract’: ‘SPY190417C00290000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘290.00’, ‘last’: ‘0.80’, ‘bid’: ‘0.82’, ‘ask’: ‘0.83’, ‘volume’: ‘38,491’, ‘iv’: ‘8.06%’}{‘contract’: ‘SPY190417P00290000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘290.00’, ‘last’: ‘0.77’, ‘bid’: ‘0.75’, ‘ask’: ‘0.78’, ‘volume’: ‘11,310’, ‘iv’: ‘7.30%’}{‘contract’: ‘SPY190417P00289000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘289.00’, ‘last’: ‘0.41’, ‘bid’: ‘0.40’, ‘ask’: ‘0.42’, ‘volume’: ‘44,319’, ‘iv’: ‘7.79%’}

設置定期數據收集 (Setting up recurring data collection)

Yahoo, by default, only returns the options for the date you specify. It’s this part of the URL: https://finance.yahoo.com/quote/SPY/options?date=1555459200

默認情況下，Yahoo僅返回您指定日期的選項。這是URL的這一部分： https: //finance.yahoo.com/quote/SPY/options ? date = 1555459200

This is a Unix timestamp, so we’ll need to generate or scrape one, rather than hardcoding it in our program.

這是Unix時間戳，因此我們需要生成或刮取一個時間戳，而不是在程序中對其進行硬編碼。

Add some dependencies:

添加一些依賴項：

import datetime, time

Let’s write a quick script to generate and verify a Unix timestamp for our next set of options:

讓我們編寫一個快速腳本來為下一組選項生成并驗證Unix時間戳：

def get_datestamp():  options_url = “https://finance.yahoo.com/quote/SPY/options?date="  today = int(time.time())  # print(today)  date = datetime.datetime.fromtimestamp(today)  yy = date.year  mm = date.month  dd = date.day

The above code holds the base URL of the page we are scraping and generates a datetime.date object for us to use in the future.

上面的代碼保存了我們要抓取的頁面的基本URL，并生成了datetime.date對象供我們將來使用。

Let’s increment this date by one day, so we don’t get options that have already expired.

讓我們將此日期增加一天，這樣我們就不會得到已經到期的選項。

dd += 1

Now, we need to convert it back into a Unix timestamp and make sure it’s a valid date for options contracts:

現在，我們需要將其轉換回Unix時間戳，并確保它是期權合約的有效日期：

options_day = datetime.date(yy, mm, dd) datestamp = int(time.mktime(options_day.timetuple())) # print(datestamp) # print(datetime.datetime.fromtimestamp(options_stamp))

# vet timestamp, then return if valid for i in range(0, 7):   test_req = requests.get(options_url + str(datestamp)).content   content = BeautifulSoup(test_req, “html.parser”)   # print(content)   tables = content.find_all(“table”)

if tables != []:   # print(datestamp)   return str(datestamp) else:   # print(“Bad datestamp!”)   dd += 1   options_day = datetime.date(yy, mm, dd)   datestamp = int(time.mktime(options_day.timetuple()))  return str(-1)

Let’s adapt our fetch_options method to use a dynamic timestamp to fetch options data, rather than whatever Yahoo wants to give us as the default.

讓我們調整fetch_options方法以使用動態時間戳來獲取選項數據，而不是Yahoo想要給我們的默認值。

Change this line:

更改此行：

data_url = “https://finance.yahoo.com/quote/SPY/options"

To this:

對此：

datestamp = get_datestamp()data_url = “https://finance.yahoo.com/quote/SPY/options?date=" + datestamp

Congratulations! You just scraped real-world options data from the web.

恭喜你！您只是從網上抓取了真實的期權數據。

Now we need to do some simple file I/O and set up a timer to record this data each day after market close.

現在，我們需要執行一些簡單的文件I / O，并設置一個計時器，以在收市后每天記錄此數據。

改善程序 (Improving the program)

Rename main() to fetch_options() and add these lines to the bottom:

將main()重命名為fetch_options()并將這些行添加到底部：

options_list = {‘calls’: {‘itm’: itm_call_info, ‘otm’: otm_call_info}, ‘puts’: {‘itm’: itm_put_info, ‘otm’: otm_put_info}, ‘date’: datetime.date.fromtimestamp(time.time()).strftime(“%Y-%m-%d”)}return options_list

Create a new method called schedule(). We’ll use this to control when we scrape for options, every twenty-four hours after market close. Add this code to schedule our first job at the next market close:

創建一個名為schedule()的新方法。市場收盤后每隔24小時，我們將使用它來控制何時刮取期權。添加以下代碼以安排我們在下一個市場收盤時的第一份工作：

from apscheduler.schedulers.background import BackgroundScheduler

scheduler = BackgroundScheduler()

def schedule():  scheduler.add_job(func=run, trigger=”date”, run_date = datetime.datetime.now())  scheduler.start()

In your if __name__ == “__main__”: statement, delete main() and add a call to schedule() to set up your first scheduled job.

在if __name__ == “__main__”:語句中，刪除main()并添加對schedule()的調用以設置您的第一個計劃作業。

Create another method called run(). This is where we’ll handle the bulk of our operations, including actually saving the market data. Add this to the body of run():

創建另一個名為run()方法。我們將在這里處理大部分業務，包括實際保存市場數據。將此添加到run()的主體中：

today = int(time.time()) date = datetime.datetime.fromtimestamp(today) yy = date.year mm = date.month dd = date.day

# must use 12:30 for Unix time instead of 4:30 NY time next_close = datetime.datetime(yy, mm, dd, 12, 30)

# do operations here “”” This is where we’ll write our last bit of code. “””

# schedule next job scheduler.add_job(func=run, trigger=”date”, run_date = next_close)

print(“Job scheduled! | “ + str(next_close))

This lets our code call itself in the future, so we can just put it on a server and build up our options data each day. Add this code to actually fetch data under “”” This is where we’ll write our last bit of code. “””

這樣一來，我們的代碼就可以在將來自行調用，因此我們可以將其放在服務器上，并每天建立選項數據。添加此代碼以實際獲取“”” This is where we'll write our last bit of code. “””下的數據。 “”” This is where we'll write our last bit of code. “”” “”” This is where we'll write our last bit of code. “””

options = {}

# ensures option data doesn’t break the program if internet is out try:   if next_close > datetime.datetime.now():     print(“Market is still open! Waiting until after close…”)   else:     # ensures program was run after market hours     if next_close < datetime.datetime.now():      dd += 1       next_close = datetime.datetime(yy, mm, dd, 12, 30)       options = fetch_options()       print(options)       # write to file       write_to_csv(options)except:  print(“Check your connection and try again.”)

保存數據 (Saving data)

You may have noticed that write_to_csv isn’t implemented yet. No worries — let’s take care of that here:

您可能已經注意到write_to_csv尚未實現。不用擔心-讓我們在這里解決：

def write_to_csv(options_data):  import csv  with open(‘options.csv’, ‘a’, newline=’\n’) as csvfile:  spamwriter = csv.writer(csvfile, delimiter=’,’)  spamwriter.writerow([str(options_data)])

打掃干凈 (Cleaning up)

As options contracts are time-sensitive, we might want to add a field for their expiration date. This capability is not included in the raw HTML we scraped.

由于期權合約對時間敏感，因此我們可能想為其到期日添加一個字段。此功能未包含在我們抓取的原始HTML中。

Add this line of code to save and format the expiration date towards the top of fetch_options():

添加以下代碼行以保存到期日期并將其格式化為fetch_options()的頂部：

expiration =  datetime.datetime.fromtimestamp(int(get_datestamp())).strftime("%Y-%m-%d")

Add ‘expiration’: expiration to the end of each option_info dictionary like so:

在每個option_info字典的末尾添加'expiration': expiration ，如下所示：

itm_call_info = {'contract': itm_call_data[0],  'strike': itm_call_data[2], 'last': itm_call_data[3],   'bid': itm_call_data[4], 'ask': itm_call_data[5], 'volume': itm_call_data[8], 'iv': itm_call_data[10], 'expiration': expiration}

Give your new program a run — it’ll scrape the latest options data and write it to a .csv file as a string representation of a dictionary. The .csv file will be ready to be parsed by a backtesting program or served to users through a webapp. Congratulations!

運行您的新程序-它會刮擦最新的選項數據，并將其作為字典的字符串表示形式寫入.csv文件。 .csv文件將可以通過回測程序進行解析，也可以通過網絡應用程序提供給用戶。恭喜你！