如何擊敗騰訊
個人項目 (Personal Proyects)
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
Towards Data Science編輯的注意事項: 盡管我們允許獨立作者按照我們的 規則和指南 發表文章 ,但我們不認可每位作者的貢獻。 您不應在未征求專業意見的情況下依賴作者的作品。 有關 詳細信息, 請參見我們的 閱讀器條款 。
摘要 (Summary)
This is a personal project in which I have tried to develop a trading application using machine learning tools. Starting with data modelling along with a categorisation based on distribution and machine learning techniques, I have developed a trading strategy for beginner investors to generate low-risk profit with the help of this application.
這是一個個人項目,我嘗試使用機器學習工具開發交易應用程序。 從數據建模以及基于分布和機器學習技術的分類開始,我已經開發了一種交易策略,供初學者使用此應用程序產生低風險利潤。
介紹 (Introduction)
The market analysis is both interesting and complex as it can be seen in the following link [1]. Nevertheless, there are several works carried out with machine-learning which try to shed light on this field.
市場分析既有趣又復雜,因為可以在以下鏈接中看到[1] 。 然而,機器學習方面進行了一些工作,試圖為這一領域提供啟示。
In this piece of work, I have created an application consisting of two main points:
在本文中,我創建了一個包含兩個要點的應用程序:
A screen where stock market index may be analysed over different temporal horizons. Here it can be found a candlestick chart; a chart to analyse technical indicators [2]; a line chart which shows the percentage of price change between days, as well as a box-plot representing this last chart in order to understand that distribution.
一個可以在不同時間范圍內分析股市指數的屏幕。 在這里可以找到一個燭臺圖 ; 分析技術指標的圖表[2] ; 線形圖顯示了兩天之間價格變化的百分比,以及表示最后一個圖的箱形圖,以了解這種分布。
A screen where the analysis of the trading strategy which I have developed (Strategyone) can be done. This strategy is divided into two different parts: the first one consisting of the prediction of stock market index movements by means of machine learning, whereas the second one involves the comparison between the current data vectors prediction and what happened in the past. The chosen temporal horizons range from 7, 14, 21 to 28 days.
可以在此屏幕上分析我開發的交易策略( Strategyone )。 該策略分為兩個不同的部分:第一個部分包括通過機器學習預測股市指數運動,而第二個則包括當前數據向量預測與過去發生的情況之間的比較 。 所選的時間范圍為7、14、21至28天。
This last section is explained thoroughly in “How to beat the market” and “Trading strategy”
最后一部分在“如何戰勝市場”和“交易策略”中進行了詳細說明。
Data has been obtained through the Alpha Vantage API [3], while a list of the stock market index from the Finnhub API [4].
數據是通過Alpha Vantage API [3]獲得的 ,而股票市場指數則是從Finnhub API [4]獲得的 。
語境 (Context)
As a physicist I have been always fascinated by the complex systems world: how certain formulae can be applied to and have interesting results either to biological systems or financial ones, as well as to a group of several electrons.
作為物理學家,我一直著迷于復雜的系統世界:如何將某些公式應用于生物系統或金融系統以及一組多個電子,并對它們產生有趣的結果。
Likewise, how the individual study of an element of the system might result into a different behaviour when it is studied within the system.
同樣,當在系統中進行研究時,對系統元素的個別研究可能如何導致不同的行為。
Consequently, this project emerges from the curiosity about the stock market in addition to the software and intellectual challenge that implies to understand such a complex system as the market is.
因此,除了對理解市場這樣一個復雜系統的軟件和知識挑戰之外,該項目還來自對股票市場的好奇心。
The project has gone through three stages: the first version of this work was developed as the final thesis of the Master’s Degree in Data Science which I attended in [5], and whose aim was only the creation of classification model which could predict the future of an stock in the market using machine learning. The second version was designed externally to the Master’s and it tried to improve the first one. Finally, the third version is the one here discussed, and it offers a significant improvement, the development of a trading strategy.
該項目經歷了三個階段:這項工作的第一個版本是我在[5]中參加的數據科學碩士學位的最終論文,其目的僅僅是建立可以預測未來的分類模型。使用機器學習來分析市場中的股票 第二個版本是在碩士課程外部設計的,它試圖改進第一個版本。 最后,這里討論的是第三個版本,它提供了顯著的改進,即交易策略的發展。
如何打敗市場 (How to beat the market)
In order to use a classification model to predict market movements, I needed to categorise the data. These prediction categories have been called “Strong bull”, predictions in which the price increase is significant; “Bull”, when there is a price increase; “Keep”, the price remains the same; “Bear”, a decrease on the price, and “Strong bear”, the price decrease is significant [6].
為了使用分類模型來預測市場走勢,我需要對數據進行分類。 這些預測類別被稱為“強牛”,即價格上漲顯著的預測。 “牛”,當價格上漲時; “保持”,價格保持不變; 價格下降的“熊市”和價格下跌的“強熊市”很明顯[6] 。
How are the stock market index categories chosen?
如何選擇股市指數類別?
This have done through the distribution of percentage variation in the stock price. As our aim is predicting the future, in the registers, the percentage variation column needs the daily information about how the price varies in relation to the temporal horizon that we want to predict.
這是通過分配股票價格的百分比變化來實現的 。 由于我們的目標是預測未來,因此在價格記錄中,百分比變化列需要有關價格如何相對于我們要預測的時間范圍變化的每日信息 。
Therefore, the variation percentage to be categorised is compared to the last 4-month distribution, and one of the categories abovementioned will be selected based on the range of the percentiles in relation to that distribution.
因此,將要分類的變化百分比與最近4個月的分布進行比較,并且將基于相對于該分布的百分位數范圍選擇上述類別之一。
In this way, we could categorise all the data given a temporal horizon, and this will always be about the future.
這樣,我們可以在時間范圍內對所有數據進行分類,而這將永遠與未來有關。
Once the categorisation is done, the next step was getting to know which the best way to apply an algorithm of classification with more precision is. After a number of trials and different ideas, the selected process was scaling the data by means of the robust scaler technique and Random Forest as classification algorithm. These were the chosen ones since they provide an average higher precision upon all the categories.
分類完成后,下一步就是知道哪種方法更準確地應用分類算法。 經過大量的試驗和不同的想法,選擇的過程是通過健壯的縮放器技術和隨機森林作為分類算法來縮放數據。 選擇它們是因為它們在所有類別上均提供了平均較高的精度。
Only following these steps, we can obtain a model which is able to predict “Strong bull” with a 40 % level of accuracy.
僅按照這些步驟,我們就可以得到能夠以40%的準確度預測“強牛”的模型。
交易策略 (Trading Strategy)
The trading strategy will be based on what happened in the past and on the idea that we guess correctly provided that we win, omitting that in order to win we must also guess the right predicted category.
交易策略將基于過去發生的情況以及我們能夠正確猜出的想法(前提是我們獲勝),而忽略了為了獲勝,我們還必須猜出正確的預測類別 。
That is, if the prediction is “Bull”, we carry out a long position operation and the resulting outcome is actually “Strong bull”, our prediction will be considered as accurate. Likewise, if we predict “Strong bull” and the result is “Bull” or when the prediction is “Strong bear”, we carry out a short position movement and the outcome achieved is “Bear” and the other way round.
也就是說,如果預測為“牛”,我們執行多頭頭寸操作,而結果實際上為“強牛”,我們的預測將被認為是準確的。 同樣,如果我們預測“強牛”而結果為“牛”,或者當預測為“強熊”時,我們進行空頭頭寸移動并且獲得的結果為“熊”,反之亦然。
If none of the abovementioned cases take place, the operation will be considered as a fail.
如果上述情況均未發生,則該操作將被視為失敗。
Having this in mind, the strategy will only consist of long position operation and when the model predicts “Strong bull” given that it is the category with higher accuracy from the classification model.
考慮到這一點, 該策略將僅包括多頭頭寸操作,并且當模型預測“強牛”時 , 該策略將被認為是分類模型中具有較高準確性的類別。
How does the strategy work?
該策略如何運作?
Once the robust scaler is applied to all the registers, the category is predicted and the actual categorisation, a PCA is applied to reduce the number of dimensions to 4 maintaining the 95 % of data variability. Therefore, we have other 4 variables together with the prediction linked to the register and its actual category. How the variables are can be known when something is predicted in relation to the real category, so we arrange the prediction and the category, and we calculate the median associated to each profile curve to understand how to describe each one.
一旦將魯棒縮放器應用于所有寄存器,預測了類別并進行了實際分類, 便會應用PCA將維數減少到4,以保持95%的數據可變性。 因此,我們還有其他4個變量以及鏈接到寄存器及其實際類別的預測。 當預測與真實類別有關的某物時可以知道變量的方式 ,因此我們安排了預測和類別,并計算了與每個輪廓曲線相關的中位數以了解如何描述每個輪廓。
As a result, we will have described the variables in which “Strong bull” is predicted” and the actual outcome was “Strong bull” or any other category.
結果,我們將描述“強牛”被預測的變量,而實際結果是“強牛”或任何其他類別。
All of this will be limited to the last 6-month-data in relation to the prediction day in order to avoid the influence of an old market state on the strategy. The results obtained are summarised below:
所有這些都將僅限于與預測日相關的最后6個月的數據,以避免舊市場狀況對策略的影響。 獲得的結果總結如下:

The interpretation of this table is that in the last 6 months before the prediction of “Strong bull” and the category was guess correctly, the variables of the main components had as the median.
該表的解釋是,在“強牛”的預測出現之前的最近6個月中,該類別被正確猜出,主要成分的變量為中位數。
Consequently, in order to carry out a operation, we must apply the data of the day in which we are doing the prediction a robust scaler and a PCA,
因此,為了執行操作,我們必須應用進行預測的當天的數據,魯棒的縮放器和PCA,
If the prediction obtained is “Strong bull”, we will have reached the first step to carry out the operation. The second step is checking which profile of the previous curves is more similar to the data that is being predicted. This will be done using the cosine similarity which will allow us to observe the more similar vector to the data. If it corresponds to “Strong bull-strong bull”, we will have the key to perform a safer operation.
如果獲得的預測是“堅強的公牛”,我們將到達執行該操作的第一步。 第二步是檢查先前曲線的輪廓與正在預測的數據更相似。 這將使用余弦相似度完成,這將使我們能夠觀察到與數據更相似的向量。 如果它對應于“強牛-強牛”,我們將擁有執行更安全操作的鑰匙。
Following this trading strategy, we will obtain almost a 50 % level of accuracy, but, as it was mentioned at the beginning, guessing correctly does not imply guessing the category too.
按照這種交易策略,我們將獲得幾乎50%的準確度 ,但是,正如開頭提到的那樣,正確猜測并不意味著也猜測類別。
Guessing correctly does not imply guessing the category too
正確猜測并不意味著也猜測類別
Under our circumstances, a correct guessing will be also the prediction of “Strong bull” and obtaining “Bull” as a final result. The strategy level of accuracy reaches 58 % when this is taken into account.
在我們的情況下,正確的猜測也將是對“強牛”的預測,并最終獲得“牛”。 考慮到這一點,策略的準確性達到58%。
結論 (Conclusion)
The aim of this piece of work was the development of a strategy which allows a beginner investor to carry out to generate low-risk profit without suffering a total loss. As I have mentioned, the strategy ensures a 58 % level of accuracy under the described conditions, but, on a personal note, it is not a strategy to be implemented automatically because the error level assumed raises up to 40%.
這項工作的目的是開發一種策略,該策略允許初學者投資者開展活動以產生低風險的利潤而不會造成總損失。 正如我已經提到的那樣,該策略可確保在所描述的條件下達到58%的準確度,但是,就個人而言,由于假定的錯誤級別會上升到40%,因此這不是自動實施的策略。
However, it is interesting to see how a level of accuracy over 50 % is obtained in the performed operations, following a strategy based only on data and with a limited and minimal knowledge about the stock market.
然而,有趣的是,遵循僅基于數據且對股票市場的了解有限且很少的策略,在執行的操作中如何獲得超過50%的準確度。
All the project code can be read on: GitHub/esan94/bsm03
所有項目代碼都可以在GitHub / esan94 / bsm03上閱讀
后續步驟 (Following Steps)
The possible next steps to improve might include:
可能需要改進的后續步驟可能包括:
- The change of the data model. 數據模型的變化。
- The improvement of the classification algorithm. 分類算法的改進。
- The addition to the project of more knowledge about the stock market. 除了該項目以外,還擁有有關股票市場的更多知識。
- The assignation of value to the main components to apply the cosine similarity. 將值分配給主要成分以應用余弦相似度。
資源資源 (Resources)
[1] https://en.wikipedia.org/wiki/Efficient-market_hypothesis
[1] https://en.wikipedia.org/wiki/Efficient-market_hypothesis
[2] https://www.investopedia.com/technical-analysis-4689657
[2] https://www.investopedia.com/technical-analysis-4689657
[3] https://www.alphavantage.co/
[3] https://www.alphavantage.co/
[4] https://finnhub.io/
[4] https://finnhub.io/
[5] https://kschool.com/
[5] https://kschool.com/
[6] https://www.investopedia.com/insights/digging-deeper-bull-and-bear-markets/
[6] https://www.investopedia.com/insights/digging-deeper-bull-and-bear-markets/
You can follow me on LinkedIn, GitHub o Medium.
您可以在LinkedIn , GitHub或 o 中關注我。
Translation made by Paloma Sánchez Narváez.
翻譯由PalomaSánchezNarváez撰寫 。
翻譯自: https://towardsdatascience.com/beating-stock-market-8b33c5afb633
如何擊敗騰訊
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/392181.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/392181.shtml 英文地址,請注明出處:http://en.pswp.cn/news/392181.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!