機器學習股票
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
Towards Data Science編輯的注意事項: 盡管我們允許獨立作者按照我們的 規則和指南 發表文章 ,但我們不認可每位作者的貢獻。 您不應在未征求專業意見的情況下依賴作者的作品。 有關 詳細信息, 請參見我們的 閱讀器條款 。
Probabilistic Machine Learning comes hand in hand with Stock Trading: Probabilistic Machine Learning uses past instances to predict probabilities of certain events happening in future instances. This can be directly applied to stock trading, to predict future stock prices.
概率機器學習與股票交易息息相關:概率機器學習使用過去的實例來預測未來實例中發生的某些事件的概率。 這可以直接應用于股票交易,以預測未來的股票價格。
這個概念: (The Concept:)
This program will use Gaussian Naive Bayes to classify data into increasing stock price, or decreasing stock price.
該程序將使用高斯樸素貝葉斯將數據分類為股票價格上漲或股票價格下跌。
Because of the volatility of the stocks, I will not be using the closing price of the stock to predict it, but rather be using the ratio between the past and current closing prices. To understand how the program works, we must first understand the underling algorithm at play:
由于股票的波動性,我將不會使用股票的收盤價來預測它,而是會使用過去和當前收盤價之間的比率。 要了解程序的工作方式,我們必須首先了解實際的基礎算法:
什么是高斯樸素貝葉斯分類器? (What is Gaussian Naive Bayes Classifier?)
Gaussian Naive Bayes is an algorithm that classifies data by extrapolating data using Gaussian Distribution (identical to Normal Distribution) as well as Bayes theorem.
高斯樸素貝葉斯算法是一種算法,它通過使用高斯分布(與正態分布相同)以及貝葉斯定理外推數據來對數據進行分類。
優點: (Advantages:)
- Works on small datasets 適用于小型數據集
Unlike traditional neural networks in which each neuron was directly connected to every other neuron, the probabilities are assumed to be independent.
與傳統的神經網絡不同,在傳統的神經網絡中,每個神經元都直接與每個其他神經元相連,因此概率被認為是獨立的。
- Not computationally intensive 不需要大量計算
Since the Naive Bayes Classifier is deterministic, The parameters for the Naive Bayes Classifier does not change every iteration, unlike the weights that power a Neural Network. This makes the algorithm much less computationally intensive.
由于樸素貝葉斯分類器是確定性的,因此與樸素的神經網絡權重不同,樸素貝葉斯分類器的參數不會每次迭代都更改。 這使算法的計算強度大大降低。
缺點: (Disadvantages:)
- Fails at learning Big Data 學習大數據失敗
The complex mapping of a Neural Network outmatches the simple architecture of the Naive Bayes Algorithm when the data is enough to optimize all the parameters.
當數據足以優化所有參數時,神經網絡的復雜映射將不滿足樸素貝葉斯算法的簡單體系結構。
代碼: (The Code:)
With a better understanding of how the Gaussian Naive Bayes algorithm works, let’s get to the program:
更好地了解了高斯樸素貝葉斯算法的工作原理,讓我們進入程序:
步驟1 | 先決條件: (Step 1| Prerequisites:)
import yfinance
from scipy import statsaapl = yfinance.download('AAPL','2016-1-1','2020-1-1')
These are the two libraries that I will use for the project: yfinance is for downloading stock data and scipy is to create gaussian distributions.
這是我將用于該項目的兩個庫:yfinance用于下載股票數據,scipy用于創建高斯分布。
I downloaded Apple stock data, from 2016 to 2020, for reproducible results.
我下載了2016年至2020年的Apple股票數據,以獲得可重復的結果。
Step 2| Converting to Gaussian Distributions:
步驟2 | 轉換為高斯分布:
def calculate_prereq(values):
std = np.std(values)
mean = np.mean(values)
return std,meandef calculate_distribution(mean,std):
norm = stats.norm(mean, std)
return normdef extrapolate(norm,x):
return norm.pdf(x)def values_to_norm(dicts):
for dictionary in dicts:
for term in dictionary:
std,mean = calculate_prereq(dictionary[term])
norm = calculate_distribution(mean,std)
dictionary[term] = norm
return dicts
The “calculate_prereq” function helps to calculate the standard deviation and the mean: The two things needed to create a Gaussian distribution.
“ calculate_prereq”函數有助于計算標準偏差和均值:創建高斯分布所需的兩件事。
I would make the function to create a Gaussian distribution from scratch, but scipy’s functions have been highly optimized and would therefore work better on datasets with more features.
我將使用該函數從頭開始創建高斯分布,但是scipy的函數已經過高度優化,因此可以在具有更多特征的數據集上更好地工作。
Gaussian distributions are approximations of general probabilistic data. Take the example of the IQ test spectrum. Most people have an average IQ score of 100. Therefore, the peak of the Gaussian distribution would be at 100. On both ends of the spectrum, the number of people getting extremely low and extremely high scores decrease as the scores become more extreme. With a Gaussian distribution, one can extrapolate a probability of a person getting a certain value and therefore gain insight on it.
高斯分布是一般概率數據的近似值。 以IQ測試頻譜為例。 大多數人的平均智商得分為100。因此,高斯分布的峰值將為100。在光譜的兩端,得分變得越來越低,變得越來越低的人數也越來越少。 使用高斯分布,可以推斷一個人獲得某個價值的概率,從而獲得對價值的洞察力。
步驟3 | 比較可能性: (Step 3| Compare Possibilities:)
def compare_possibilities(dicts,x):
probabilities = []
for dictionary in dicts:
dict_probs = []
for i in range(len(x)):
value = x[i]
dict_probs.append(extrapolate(dictionary[i],value))
probabilities.append(np.prod(dict_probs))
return probabilities.index(max(probabilities))
This function simply runs through the dictionaries (the different classes) and calculates the probability of the price increasing or dropping, given the ratios between the price of the last ten days. It then returns an index in the list of dictionaries the class that the Bayes Classifier calculates to have the highest probability.
該函數僅遍歷字典(不同類別),并根據最近十天價格之間的比率來計算價格上漲或下跌的概率。 然后,它返回字典列表中的索引,該字典是貝葉斯分類器計算出的具有最高概率的類。
步驟4 | 運行程序: (Step 4| Run the Program:)
drop = {}
increase = {}
for day in range(10,len(aapl)-1):
previous_close = aapl['Close'][day-10:day]
ratios = []
for i in range(1,len(previous_close)):
ratios.append(previous_close[i]/previous_close[i-1])
if aapl['Close'][day+1] > aapl['Close'][day]:
for i in range(len(ratios)):
if i in increase:
increase[i] += (ratios[i],)
else:
increase[i] = ()
elif aapl['Close'][day+1] < aapl['Close'][day]:
for i in range(len(ratios)):
if i in drop:
drop[i] += (ratios[i],)
else:
drop[i] = ()
new_close = aapl['Close'][-11:-1]
ratios = []
for i in range(1,len(new_close)):
ratios.append(new_close[i]/new_close[i-1])
for i in range(len(ratios)):
if i in increase:
increase[i] += (ratios[i],)
else:
increase[i] = ()
X = ratios
print(X)
dicts = [increase,drop]
dicts = values_to_norm(dicts)
compare_possibilities(dicts,X)
This last part runs all the functions together, and gathers the 9 ratios for the stock price in the last 10 days. It then executes the program and returns if the price will increase, or drop. The value it returns is the index of the dictionary in the list dicts. If it is 1, the price is predicted to drop. If it is 0, the price is predicted to increase.
最后一部分將所有功能運行在一起,并收集了最近10天股票價格的9個比率。 然后,它執行程序并返回價格是否上漲或下跌。 它返回的值是列表字典中字典的索引。 如果為1,則價格預計會下降。 如果為0,則預計價格會上漲。
結論: (Conclusion:)
This program is just the basic framework of a Gaussian Naive Bayes algorithm. Here are a few ways that you can improve my program:
該程序只是高斯樸素貝葉斯算法的基本框架。 您可以通過以下幾種方法來改進我的程序:
- Increase the number of features 增加功能數量
You can include features such as volume and opening price, to increase the scope of the data. However, an overload of data could cause Gaussian Naive Bayes to be less effective, as it does not perform well with big data.
您可以包括數量和開盤價之類的功能,以擴大數據范圍。 但是,數據過載可能會導致高斯樸素貝葉斯效率降低,因為它在大數據方面表現不佳。
- Link to Alpaca API 鏈接到Alpaca API
The alpaca API is a great platform to test trading strategies. Try linking this program to make buy or sell trades, based on the predictions of the model!
羊駝API是測試交易策略的絕佳平臺。 根據模型的預測,嘗試鏈接此程序以進行買賣交易!
翻譯自: https://medium.com/analytics-vidhya/using-probabilistic-machine-learning-to-improve-your-stock-trading-b40782f3710d
機器學習股票
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389552.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389552.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389552.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!