算法 從 數中選出
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
Towards Data Science編輯的注意事項: 盡管我們允許獨立作者按照我們的 規則和指南 發表文章 ,但我們不認可每位作者的貢獻。 您不應在未征求專業意見的情況下依賴作者的作品。 有關 詳細信息, 請參見我們的 閱讀器條款 。
I enjoy basketball. It’s a fast-paced competitive game and I’ve enjoyed both playing and watching it for a long time. The NBA is famous for generating very clean data, which has long been used by enthusiasts (like myself) for data visualizations, modeling and game predictions.
我喜歡籃球。 這是一款快節奏的競技游戲,很長時間以來我都喜歡玩和觀看它。 NBA以生成非常干凈的數據而聞名,長期以來,發燒友(如我自己)一直將其用于數據可視化 , 建模和游戲預測 。
Recently, I was contacted by DraftKings regarding an interview for a potential job. As part of my preparations for the same, I started using their platform and competing in mock competitions to get acquainted with the DraftKings (DK) contest process. It was during this time period that I really started getting into the idea of using data to model and predict a winning roster.
最近, DraftKings就某項潛在工作的面試與我聯系。 作為準備工作的一部分,我開始使用他們的平臺并參加模擬競賽,以了解DraftKings(DK)競賽過程。 正是在這段時間里,我才真正開始使用數據建模和預測獲勝者名單的想法。
I built the algorithm iteratively, and from scratch- starting with a naive version 1, a more robust version 2 and currently I’m working on a winning version 3.
我迭代地構建了算法,從零開始,從樸素的版本1開始,是功能更強大的版本2,目前我正在開發獲獎的版本3。
I built the algorithm iteratively, and from scratch
我從頭開始迭代構建算法
You can follow along my algorithm design journey in the rest of the article.
您可以在本文的其余部分中繼續我的算法設計過程。
快速級別設置:評分和規則 (Quick Level Set: Scoring and Rules)
DK’s rules and scoring for their NBA classic fantasy contests are fairly intuitive, even if you have no prior basketball knowledge. In a nutshell, the objective is to:
即使您沒有籃球知識,DK的NBA經典幻想比賽規則和得分也非常直觀。 簡而言之,目標是:
Create an 8-player lineup while staying under the $50,000 salary cap.
創建一個8人游戲陣容,同時將工資保持在50,000美元以下。
Players get different points for different actions (more details below) and the draft with the most number of points, at the end of all games in a night, wins. Sounds simple enough :)
玩家在不同的操作中獲得不同的分數(更多詳細信息,請參見下文),并且在一夜內所有游戲結束時,得分最高的選秀會獲勝。 聽起來很簡單:)
The breakdown for different actions that result in positive (or negative) points can be seen below.
導致正(或負)分的不同動作的細分如下所示。

One last constraint which makes drafting slightly more complicated is player positions. According to DK: Lineups will consist of 8 players and must include players from at least 2 different NBA games.
最后一個使選秀稍微復雜一些的約束是球員位置。 根據DK: 陣容將由8名球員組成,并且必須包括至少2場不同NBA游戲中的球員。
Further, the 8 players are broken down by positions, which can be seen below.
此外,這8個玩家按位置細分,如下所示。

There you have it! A simple optimization problem with a set of constraints. Sounds like something an algorithms would excel at. Or would it?
你有它! 具有一組約束的簡單優化問題。 聽起來像算法會擅長的事情。 還是會?
算法版本1-天真 (Algorithm Version 1- Naive)
My goal with this algorithm was to build it as fast as possible, with little to no hopes of winning. Mainly because I was interested in setting up a strong foundation, without worrying about building complex logic early in the process. To do this, I downloaded a player dataset from DK and started a Jupyter notebook. If you’re interested, you can find the full raw data here and my notebook here.
我使用此算法的目標是盡可能快地構建它,幾乎沒有希望獲勝。 主要是因為我有興趣建立一個強大的基礎,而不必擔心在此過程的早期就構建復雜的邏輯。 為此,我從DK下載了播放器數據集并啟動了Jupyter筆記本。 如果你有興趣,你可以找到完整的原始數據, 在這里 ,我的筆記本電腦在這里 。
Let’s see what our data looks like.
讓我們看看我們的數據是什么樣的。

Right off the bat, we can tell that for a simple algorithm, given our requirements and constraints, we’ll find the following columns useful: ID, Salary and AvgPointsPerGame (fantasy points). This would allow us to pick the “best” players while staying under the $50,000 salary cap. Sure, without positional information we could have overlaps etc. but that’s an issue for a later version. Remember, version 1 should be the simplest implementation of your product.
馬上,我們可以說出,對于一個簡單的算法,鑒于我們的要求和約束,我們將發現以下幾欄有用:ID,Salary和AvgPointsPerGame(幻想點)。 這將使我們能夠選擇“最佳”球員,同時保持在50,000美元的薪金上限以下。 當然,如果沒有位置信息,我們可能會有重疊等,但這對于更高版本是一個問題。 請記住,版本1應該是產品的最簡單實現。
Given this data, our first pass optimization algorithm can be broken up into the following simple steps:
有了這些數據,我們的首過優化算法可以分解為以下簡單步驟:
- Randomly select 8 players from the dataset. 從數據集中隨機選擇8個玩家。
- If the sum of the salaries of the players is greater than $50,000: go back to step 1 (too expensive). 如果玩家的薪金總和超過50,000美元:請返回步驟1(太貴)。
- Otherwise, sum the AvgPointsPerGame of each of the players in the roster and compare with a master maximum value. If greater, replace maximum value and roster. 否則,對名冊中每個玩家的AvgPointsPerGame求和,然后與主最大值進行比較。 如果更大,則替換最大值和花名冊。
- Unless all possible combinations have been explored, return to step 1. Once no more combinations, return the maximum value and the roster. 除非已探究所有可能的組合,否則請返回步驟1。不再組合時,請返回最大值和花名冊。
There we have it: a simple naive algorithm that picks 8 players in random that will have the maximum expected fantasy points while staying under the $50,000 salary cap. But this algorithm has a few glaring issues:
我們有一個簡單的天真的算法,該算法隨機選擇8個玩家,這些玩家將具有最大的預期幻想積分,同時保持在50,000美元的薪金上限以下。 但是此算法存在一些明顯的問題:
- No control regarding the position of the players. Hence the algorithm could generate a roster which consists of >3 of one position (G/F), in which case the roster would be invalid. 無法控制玩家的位置。 因此,該算法可以生成由一個位置(G / F)> 3組成的花名冊,在這種情況下,該花名冊將無效。
- No check on players who are injured or not scheduled to play. This would result in a most definitive loss as all player points are important for a winning draft. 不檢查受傷或未安排比賽的球員。 這將導致最確定的損失,因為所有球員得分對獲勝選秀都很重要。
- Lastly, the algorithm is very inefficient. Considering that we need to check each possible roster: for a given number of players n and roster size r, the number of possible rosters would be- 最后,該算法效率很低。 考慮到我們需要檢查每個可能的名冊:對于給定數量的n個玩家和名冊大小r,可能的名冊數量為-
C( n , r ) = n! / (n — r)! . r!
C(n,r)= n! /(n-r)! 。 !
To get an appreciation of this complexity, take a look at the table below which shows the number of checks if the total number of available players is 100.
要了解這種復雜性,請查看下表,該表顯示了可用球員總數為100時的檢查次數。

It’s safe to assume that our algorithm will take a VERY long time to output a roster of 8 players. But, because this is a first pass algorithm, we‘re happy with what we got. You can see the algorithm in action below, picking the top 5 players for a combined salary of $35,000. Not bad.
可以肯定地說,我們的算法將花費很長時間才能輸出8名球員的花名冊。 但是,由于這是首過算法,因此我們對所獲得的結果感到滿意。 您可以在下面看到該算法的運行情況,以最高薪水35,000美元選出前5名球員。 不錯。

Because we’re on a mission to build a winning algorithm, let’s talk about version 2 optimizations.
因為我們肩負著構建成功算法的使命,所以讓我們談談版本2優化。
算法版本2-中級體育博彩者 (Algorithm Version 2- Intermediate Sports Bettor)
Now, this is where our algorithm goes from being a naive optimizer to an intermediate-level sports bettor. Based on the drawbacks of version 1, and the factorial time complexity, I decided to implement a few data and algorithm level optimizations.
現在,這是我們的算法從單純的優化器發展為中級體育博彩者的地方。 基于版本1的缺點和階乘時間復雜度,我決定實施一些數據和算法級別的優化 。
First, I cleaned the data to only include players who’re confirmed to play games. This was an easy way to decrease the total number of available players from ~100 to ~85. This might look like a small increase, but in reality, for a roster of 8 players, our number of checks drastically decreases when the total number of players decreases. The change in the number of checks can be seen below.
首先,我清除了數據,只包括經確認可以玩游戲的玩家。 這是將可用玩家總數從100個減少至85個的簡便方法。 這看似有點增加,但實際上,對于名額8人的名單,當總人數減少時,我們的支票數會急劇減少。 支票數量的變化可以在下面看到。
- C (100, 8) = 186,087,894,300 C(100,8)= 186,087,894,300
- C (85, 8) = 48,124,511,370 C(85,8)= 48,124,511,370
Our total number of operations (or checks) in the algorithm went down by ~75%!
我們在算法中的操作(或檢查)總數下降了約75%!
Next up, I modified the algorithm itself to pick specific positions. Now, instead of picking every possible roster from the total number of players available, the algorithm picks 3 guards from only all available guards, followed by 3 forwards and lastly 1 center. As you can see, the total here is only 7 players and leaves the last pick to the user. This is a quick way to save some additional time on the algorithm as the user can manually find the best remaining player (highest expected points given the salary remaining).
接下來,我修改了算法本身以選擇特定位置。 現在,該算法不再從可用球員總數中選擇所有可能的花名冊,而是僅從所有可用后衛中挑選3個后衛,然后是3個前鋒和最后1個中鋒。 如您所見,此處的總數僅為7位玩家,而最后的選擇權留給了用戶。 這是一種節省算法上額外時間的快速方法,因為用戶可以手動找到剩余的最好的球員(給定剩余的薪水,可以獲得最高的期望積分)。
This was a huge optimization because the number of guards vs the total number of players is ~40 vs 85. The number is similar for forwards and even less for centers. Note, there’s a slight overlap between the players in each category as some players play multiple positions but this was easy to deal with: I removed played who were already picked as Guards, before picking Forwards etc. The performance boost as a result of the above changes can be seen below:
這是一個巨大的優化,因為后衛人數與球員總數之比約為40比85。前鋒的人數相似,中鋒的人數更少。 請注意,每個類別中的玩家之間都有一點重疊,因為有些玩家扮演多個職位,但是這很容易解決:我刪除了已經被選為后衛的角色,然后再選擇Forwards等。由于上述原因,性能提升更改如下所示:
- C (85 , 8) = 48,124,511,370 C(85,8)= 48,124,511,370
- C (40 , 3) x C (40 , 3) x C (20, 1) = 1,952,288,000 C(40,3)x C(40,3)x C(20,1)= 1,952,288,000
This is huge. Now, the algorithm is conducting almost ~95% fewer operations and we have the best possible roster broken up by positions and under our salary cap. Let’s test our results!
這是巨大的。 現在,該算法的運算量減少了約95%,并且按職位和工資帽劃分的人員名單可能最好。 讓我們測試一下結果!
實際結果 (Real World Results)
If you’ve made it so far, congratulations. You’ve worked through the technical stuff, now it’s time for the results! I tried the algorithm’s pick over the course of three days on DK’s classic multiplier contests. Each time my entry fee was $1 and the payoff was $3 for the top 30% of the finishers. You can see the lineups generated by the algorithm and the results below.
到目前為止,如果您做到了,那就祝賀您。 您已經完成了技術性工作,現在是時候取得成果了! 我在DK的經典乘數比賽中嘗試了3天的算法選擇。 每次我的報名費是1美元,而前30%的完成者的回報是3美元。 您可以在下面查看算法生成的陣容和結果。



As you can see from the above results, the real world outcomes of the competition have been good! Out of the three days that I created lineups using the algorithm, we lost twice and won once. Our intermediate-level sports bettor algorithm has done better than I expected, but there’s still a long way to go.
從以上結果可以看出,比賽的真實結果是不錯的! 在我使用算法創建陣容的三天內,我們輸了兩次,贏了一次。 我們的中級體育博彩算法比我預期的要好,但是還有很長的路要走。
I noticed few nuances about the results, including that our algorithm (before the v2 optimization) made a mistake on day 1 where an injured player was drafted into the team (P. Beverley) which resulted in a weak draft. This was fixed in version 2 and will not be repeated again. Additionally, once cool thing has been that despite the mixed results, the algorithm has consistently created lineups which get >200 fantasy points, which is pretty high!
我注意到關于結果的細微差別,包括我們的算法(在v2優化之前)在第1天犯了一個錯誤,即一名受傷的球員被征召入隊(P. Beverley),導致選秀不力。 此問題在版本2中已修復,將不再重復。 此外,一旦出現有趣的結果是,盡管得到了混合結果,該算法仍會持續創建陣容,獲得超過200個幻想點,這是非常高的!
下一步是什么? (What’s next?)
Well, there you have it. So far, I’ve spent $3 on entree fees and made $3 on winnings, for a grand total of $0 change! I have $25 left to spend on this project before my inner alarm bells start ringing, so I clearly need to improve this algorithm. After talking to some of my friends, who know a lot more about basketball than myself, I have a few hypotheses to test out. Some of these include:
好吧,那里有。 到目前為止,我已經在主菜費用上花費了$ 3,并在獎金中賺了$ 3,總共有$ 0的找零! 在我的內部警鐘開始鳴響之前,我還有25美元可用于該項目,所以我顯然需要改進此算法。 與我的一些朋友交談后,我比我更了解籃球,我有一些假設可以檢驗。 其中一些包括:
Using additional player data over the last n games. This way the model would have more context, instead of just a snapshot value
在過去n場比賽中使用其他玩家數據。 這樣,模型將具有更多上下文,而不僅僅是快照值
- Using prior team match-up data to adjust weights placed on certain games. For example, this could help avoid picking a player in a match-up where (based on previous meets) the player has failed to perform 使用先前的球隊比賽數據來調整某些游戲的權重。 例如,這可以幫助避免在對戰中選擇一名球員(基于先前的見面)而該球員未能完成比賽
- Exploring dual optimization strategies 探索雙重優化策略
And more! If you have any ideas about how to improve this project please feel free to reach out to me on LinkedIn or over email which you can find on my Website. Additionally, all the data and code for this project can be found on my Github repository, so feel free to clone/fork it and test your own hypotheses! And, as always, any and all feedback is greatly appreciated.
和更多! 如果您對如何改善此項目有任何想法,請隨時通過LinkedIn或通過我的網站上找到的電子郵件與我聯系。 此外,該項目的所有數據和代碼都可以在我的Github存儲庫中找到,因此隨時可以克隆/分叉它并測試自己的假設! 而且,一如既往,我們非常感謝任何反饋。
Stay safe out there everyone and keep building cool stuff.
每個人都應該保持安全,并繼續制作有趣的東西。
翻譯自: https://towardsdatascience.com/can-an-algorithm-pick-a-winning-nba-fantasy-draft-c05342f130f2
算法 從 數中選出
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389780.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389780.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389780.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!