算法 從 數中選出_算法可以選出勝出的nba幻想選秀嗎

算法 從 數中選出

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Towards Data Science編輯的注意事項: 盡管我們允許獨立作者按照我們的 規則和指南 發表文章 ,但我們不認可每位作者的貢獻。 您不應在未征求專業意見的情況下依賴作者的作品。 有關 詳細信息, 請參見我們的 閱讀器條款

I enjoy basketball. It’s a fast-paced competitive game and I’ve enjoyed both playing and watching it for a long time. The NBA is famous for generating very clean data, which has long been used by enthusiasts (like myself) for data visualizations, modeling and game predictions.

我喜歡籃球。 這是一款快節奏的競技游戲,很長時間以來我都喜歡玩和觀看它。 NBA以生成非常干凈的數據而聞名,長期以來,發燒友(如我自己)一直將其用于數據可視化 , 建模和游戲預測 。

Recently, I was contacted by DraftKings regarding an interview for a potential job. As part of my preparations for the same, I started using their platform and competing in mock competitions to get acquainted with the DraftKings (DK) contest process. It was during this time period that I really started getting into the idea of using data to model and predict a winning roster.

最近, DraftKings就某項潛在工作的面試與我聯系。 作為準備工作的一部分,我開始使用他們的平臺并參加模擬競賽,以了解DraftKings(DK)競賽過程。 正是在這段時間里,我才真正開始使用數據建模和預測獲勝者名單的想法。

I built the algorithm iteratively, and from scratch- starting with a naive version 1, a more robust version 2 and currently I’m working on a winning version 3.

我迭代地構建了算法,從零開始,從樸素的版本1開始,是功能更強大的版本2,目前我正在開發獲獎的版本3。

I built the algorithm iteratively, and from scratch

我從頭開始迭代構建算法

You can follow along my algorithm design journey in the rest of the article.

您可以在本文的其余部分中繼續我的算法設計過程。

快速級別設置:評分和規則 (Quick Level Set: Scoring and Rules)

DK’s rules and scoring for their NBA classic fantasy contests are fairly intuitive, even if you have no prior basketball knowledge. In a nutshell, the objective is to:

即使您沒有籃球知識,DK的NBA經典幻想比賽規則和得分也非常直觀。 簡而言之,目標是:

Create an 8-player lineup while staying under the $50,000 salary cap.

創建一個8人游戲陣容,同時將工資保持在50,000美元以下。

Players get different points for different actions (more details below) and the draft with the most number of points, at the end of all games in a night, wins. Sounds simple enough :)

玩家在不同的操作中獲得不同的分數(更多詳細信息,請參見下文),并且在一夜內所有游戲結束時,得分最高的選秀會獲勝。 聽起來很簡單:)

The breakdown for different actions that result in positive (or negative) points can be seen below.

導致正(或負)分的不同動作的細分如下所示。

Image for post
NBA Fantasy points breakdown- DraftKings. Photo by Author.
NBA Fantasy點數分解-DraftKings。 圖片由作者提供。

One last constraint which makes drafting slightly more complicated is player positions. According to DK: Lineups will consist of 8 players and must include players from at least 2 different NBA games.

最后一個使選秀稍微復雜一些的約束是球員位置。 根據DK: 陣容將由8名球員組成,并且必須包括至少2場不同NBA游戲中的球員。

Further, the 8 players are broken down by positions, which can be seen below.

此外,這8個玩家按位置細分,如下所示。

Image for post
NBA Fantasy player positions- DraftKings. Photo by Author.
NBA Fantasy球員位置-DraftKings。 圖片由作者提供。

There you have it! A simple optimization problem with a set of constraints. Sounds like something an algorithms would excel at. Or would it?

你有它! 具有一組約束的簡單優化問題。 聽起來像算法會擅長的事情。 還是會?

算法版本1-天真 (Algorithm Version 1- Naive)

My goal with this algorithm was to build it as fast as possible, with little to no hopes of winning. Mainly because I was interested in setting up a strong foundation, without worrying about building complex logic early in the process. To do this, I downloaded a player dataset from DK and started a Jupyter notebook. If you’re interested, you can find the full raw data here and my notebook here.

我使用此算法的目標是盡可能快地構建它,幾乎沒有希望獲勝。 主要是因為我有興趣建立一個強大的基礎,而不必擔心在此過程的早期就構建復雜的邏輯。 為此,我從DK下載了播放器數據集并啟動了Jupyter筆記本。 如果你有興趣,你可以找到完整的原始數據, 在這里 ,我的筆記本電腦在這里 。

Let’s see what our data looks like.

讓我們看看我們的數據是什么樣的。

Image for post
Players dataset- DraftKings. Photo by Author.
玩家數據集-DraftKings。 圖片由作者提供。

Right off the bat, we can tell that for a simple algorithm, given our requirements and constraints, we’ll find the following columns useful: ID, Salary and AvgPointsPerGame (fantasy points). This would allow us to pick the “best” players while staying under the $50,000 salary cap. Sure, without positional information we could have overlaps etc. but that’s an issue for a later version. Remember, version 1 should be the simplest implementation of your product.

馬上,我們可以說出,對于一個簡單的算法,鑒于我們的要求和約束,我們將發現以下幾欄有用:ID,Salary和AvgPointsPerGame(幻想點)。 這將使我們能夠選擇“最佳”球員,同時保持在50,000美元的薪金上限以下。 當然,如果沒有位置信息,我們可能會有重疊等,但這對于更高版本是一個問題。 請記住,版本1應該是產品的最簡單實現。

Given this data, our first pass optimization algorithm can be broken up into the following simple steps:

有了這些數據,我們的首過優化算法可以分解為以下簡單步驟:

  1. Randomly select 8 players from the dataset.

    從數據集中隨機選擇8個玩家。
  2. If the sum of the salaries of the players is greater than $50,000: go back to step 1 (too expensive).

    如果玩家的薪金總和超過50,000美元:請返回步驟1(太貴)。
  3. Otherwise, sum the AvgPointsPerGame of each of the players in the roster and compare with a master maximum value. If greater, replace maximum value and roster.

    否則,對名冊中每個玩家的AvgPointsPerGame求和,然后與主最大值進行比較。 如果更大,則替換最大值和花名冊。
  4. Unless all possible combinations have been explored, return to step 1. Once no more combinations, return the maximum value and the roster.

    除非已探究所有可能的組合,否則請返回步驟1。不再組合時,請返回最大值和花名冊。

There we have it: a simple naive algorithm that picks 8 players in random that will have the maximum expected fantasy points while staying under the $50,000 salary cap. But this algorithm has a few glaring issues:

我們有一個簡單的天真的算法,該算法隨機選擇8個玩家,這些玩家將具有最大的預期幻想積分,同時保持在50,000美元的薪金上限以下。 但是此算法存在一些明顯的問題:

  • No control regarding the position of the players. Hence the algorithm could generate a roster which consists of >3 of one position (G/F), in which case the roster would be invalid.

    無法控制玩家的位置。 因此,該算法可以生成由一個位置(G / F)> 3組成的花名冊,在這種情況下,該花名冊將無效。
  • No check on players who are injured or not scheduled to play. This would result in a most definitive loss as all player points are important for a winning draft.

    不檢查受傷或未安排比賽的球員。 這將導致最確定的損失,因為所有球員得分對獲勝選秀都很重要。
  • Lastly, the algorithm is very inefficient. Considering that we need to check each possible roster: for a given number of players n and roster size r, the number of possible rosters would be-

    最后,該算法效率很低。 考慮到我們需要檢查每個可能的名冊:對于給定數量的n個玩家和名冊大小r,可能的名冊數量為-

C( n , r ) = n! / (n — r)! . r!

C(n,r)= n! /(n-r)! 。 !

To get an appreciation of this complexity, take a look at the table below which shows the number of checks if the total number of available players is 100.

要了解這種復雜性,請查看下表,該表顯示了可用球員總數為100時的檢查次數。

Image for post
Time complexity magnitude for first algorithm. Photo by Author.
第一種算法的時間復雜度大小。 圖片由作者提供。

It’s safe to assume that our algorithm will take a VERY long time to output a roster of 8 players. But, because this is a first pass algorithm, we‘re happy with what we got. You can see the algorithm in action below, picking the top 5 players for a combined salary of $35,000. Not bad.

可以肯定地說,我們的算法將花費很長時間才能輸出8名球員的花名冊。 但是,由于這是首過算法,因此我們對所獲得的結果感到滿意。 您可以在下面看到該算法的運行情況,以最高薪水35,000美元選出前5名球員。 不錯。

Image for post
Output from algorithm 1- Top 5 players with the maximum expected points under $35,000 combined salary. Note: first row shows combined expected fantasy points, second row is combined salary and third are the IDs of the player, followed by the names. Photo by Author.
算法1的輸出-最高預期得分低于35,000美元的前5名球員的總工資。 注意:第一行顯示組合的預期幻想積分,第二行顯示組合的薪水,第三行顯示玩家的ID,后跟姓名。 圖片由作者提供。

Because we’re on a mission to build a winning algorithm, let’s talk about version 2 optimizations.

因為我們肩負著構建成功算法的使命,所以讓我們談談版本2優化。

算法版本2-中級體育博彩者 (Algorithm Version 2- Intermediate Sports Bettor)

Now, this is where our algorithm goes from being a naive optimizer to an intermediate-level sports bettor. Based on the drawbacks of version 1, and the factorial time complexity, I decided to implement a few data and algorithm level optimizations.

現在,這是我們的算法從單純的優化器發展為中級體育博彩者的地方。 基于版本1的缺點和階乘時間復雜度,我決定實施一些數據和算法級別的優化

First, I cleaned the data to only include players who’re confirmed to play games. This was an easy way to decrease the total number of available players from ~100 to ~85. This might look like a small increase, but in reality, for a roster of 8 players, our number of checks drastically decreases when the total number of players decreases. The change in the number of checks can be seen below.

首先,我清除了數據,只包括經確認可以玩游戲的玩家。 這是將可用玩家總數從100個減少至85個的簡便方法。 這看似有點增加,但實際上,對于名額8人的名單,當總人數減少時,我們的支票數會急劇減少。 支票數量的變化可以在下面看到。

  • C (100, 8) = 186,087,894,300

    C(100,8)= 186,087,894,300
  • C (85, 8) = 48,124,511,370

    C(85,8)= 48,124,511,370

Our total number of operations (or checks) in the algorithm went down by ~75%!

我們在算法中的操作(或檢查)總數下降了約75%!

Next up, I modified the algorithm itself to pick specific positions. Now, instead of picking every possible roster from the total number of players available, the algorithm picks 3 guards from only all available guards, followed by 3 forwards and lastly 1 center. As you can see, the total here is only 7 players and leaves the last pick to the user. This is a quick way to save some additional time on the algorithm as the user can manually find the best remaining player (highest expected points given the salary remaining).

接下來,我修改了算法本身以選擇特定位置。 現在,該算法不再從可用球員總數中選擇所有可能的花名冊,而是僅從所有可用后衛中挑選3個后衛,然后是3個前鋒和最后1個中鋒。 如您所見,此處的總數僅為7位玩家,而最后的選擇權留給了用戶。 這是一種節省算法上額外時間的快速方法,因為用戶可以手動找到剩余的最好的球員(給定剩余的薪水,可以獲得最高的期望積分)。

This was a huge optimization because the number of guards vs the total number of players is ~40 vs 85. The number is similar for forwards and even less for centers. Note, there’s a slight overlap between the players in each category as some players play multiple positions but this was easy to deal with: I removed played who were already picked as Guards, before picking Forwards etc. The performance boost as a result of the above changes can be seen below:

這是一個巨大的優化,因為后衛人數與球員總數之比約為40比85。前鋒的人數相似,中鋒的人數更少。 請注意,每個類別中的玩家之間都有一點重疊,因為有些玩家扮演多個職位,但是這很容易解決:我刪除了已經被選為后衛的角色,然后再選擇Forwards等。由于上述原因,性能提升更改如下所示:

  • C (85 , 8) = 48,124,511,370

    C(85,8)= 48,124,511,370
  • C (40 , 3) x C (40 , 3) x C (20, 1) = 1,952,288,000

    C(40,3)x C(40,3)x C(20,1)= 1,952,288,000

This is huge. Now, the algorithm is conducting almost ~95% fewer operations and we have the best possible roster broken up by positions and under our salary cap. Let’s test our results!

這是巨大的。 現在,該算法的運算量減少了約95%,并且按職位和工資帽劃分的人員名單可能最好。 讓我們測試一下結果!

實際結果 (Real World Results)

If you’ve made it so far, congratulations. You’ve worked through the technical stuff, now it’s time for the results! I tried the algorithm’s pick over the course of three days on DK’s classic multiplier contests. Each time my entry fee was $1 and the payoff was $3 for the top 30% of the finishers. You can see the lineups generated by the algorithm and the results below.

到目前為止,如果您做到了,那就祝賀您。 您已經完成了技術性工作,現在是時候取得成果了! 我在DK的經典乘數比賽中嘗試了3天的算法選擇。 每次我的報名費是1美元,而前30%的完成者的回報是3美元。 您可以在下面查看算法生成的陣容和結果。

Image for post
Day 1 — Oh no, finished at rank 26 and lost money. DK screenshot by Author.
第一天-哦,不,以第26名的成績結束并輸了錢。 作者的DK屏幕截圖。
Image for post
Day 2 — Oh no again, finished at rank 26 and lost another dollar. DK screenshot by Author.
第2天–哦,沒有了,排名第26,又損失了1美元。 作者的DK屏幕截圖。
Image for post
Day 3- Woohoo! Finished at rank 3 and made $3. DK screenshot by Author.
第3天-哇! 排名第3,并賺了$ 3。 作者的DK屏幕截圖。

As you can see from the above results, the real world outcomes of the competition have been good! Out of the three days that I created lineups using the algorithm, we lost twice and won once. Our intermediate-level sports bettor algorithm has done better than I expected, but there’s still a long way to go.

從以上結果可以看出,比賽的真實結果是不錯的! 在我使用算法創建陣容的三天內,我們輸了兩次,贏了一次。 我們的中級體育博彩算法比我預期的要好,但是還有很長的路要走。

I noticed few nuances about the results, including that our algorithm (before the v2 optimization) made a mistake on day 1 where an injured player was drafted into the team (P. Beverley) which resulted in a weak draft. This was fixed in version 2 and will not be repeated again. Additionally, once cool thing has been that despite the mixed results, the algorithm has consistently created lineups which get >200 fantasy points, which is pretty high!

我注意到關于結果的細微差別,包括我們的算法(在v2優化之前)在第1天犯了一個錯誤,即一名受傷的球員被征召入隊(P. Beverley),導致選秀不力。 此問題在版本2中已修復,將不再重復。 此外,一旦出現有趣的結果是,盡管得到了混合結果,該算法仍會持續創建陣容,獲得超過200個幻想點,這是非常高的!

下一步是什么? (What’s next?)

Well, there you have it. So far, I’ve spent $3 on entree fees and made $3 on winnings, for a grand total of $0 change! I have $25 left to spend on this project before my inner alarm bells start ringing, so I clearly need to improve this algorithm. After talking to some of my friends, who know a lot more about basketball than myself, I have a few hypotheses to test out. Some of these include:

好吧,那里有。 到目前為止,我已經在主菜費用上花費了$ 3,并在獎金中賺了$ 3,總共有$ 0的找零! 在我的內部警鐘開始鳴響之前,我還有25美元可用于該項目,所以我顯然需要改進此算法。 與我的一些朋友交談后,我比我更了解籃球,我有一些假設可以檢驗。 其中一些包括:

  • Using additional player data over the last n games. This way the model would have more context, instead of just a snapshot value

    在過去n場比賽中使用其他玩家數據。 這樣,模型將具有更多上下文,而不僅僅是快照值

  • Using prior team match-up data to adjust weights placed on certain games. For example, this could help avoid picking a player in a match-up where (based on previous meets) the player has failed to perform

    使用先前的球隊比賽數據來調整某些游戲的權重。 例如,這可以幫助避免在對戰中選擇一名球員(基于先前的見面)而該球員未能完成比賽
  • Exploring dual optimization strategies

    探索雙重優化策略

And more! If you have any ideas about how to improve this project please feel free to reach out to me on LinkedIn or over email which you can find on my Website. Additionally, all the data and code for this project can be found on my Github repository, so feel free to clone/fork it and test your own hypotheses! And, as always, any and all feedback is greatly appreciated.

和更多! 如果您對如何改善此項目有任何想法,請隨時通過LinkedIn或通過我的網站上找到的電子郵件與我聯系。 此外,該項目的所有數據和代碼都可以在我的Github存儲庫中找到,因此隨時可以克隆/分叉它并測試自己的假設! 而且,一如既往,我們非常感謝任何反饋。

Stay safe out there everyone and keep building cool stuff.

每個人都應該保持安全,并繼續制作有趣的東西。

翻譯自: https://towardsdatascience.com/can-an-algorithm-pick-a-winning-nba-fantasy-draft-c05342f130f2

算法 從 數中選出

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389780.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389780.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389780.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

jQuery表單校驗

小小Demo&#xff1a; <script>$(function () {//給username綁定失去焦點事件$("#username").blur(function () {//得到username文本框的值var nameValue $(this).val();//每次清除數據$("table font:first").remove();//校驗username是否合法if (n…

5912. 每一個查詢的最大美麗值

5912. 每一個查詢的最大美麗值 給你一個二維整數數組 items &#xff0c;其中 items[i] [pricei, beautyi] 分別表示每一個物品的 價格 和 美麗值 。 同時給你一個下標從 0 開始的整數數組 queries 。對于每個查詢 queries[j] &#xff0c;你想求出價格小于等于 queries[j] …

django-rest-framework第一次使用使用常見問題

2019獨角獸企業重金招聘Python工程師標準>>> 記錄在第一次使用django-rest-framework框架使用時遇到的問題&#xff0c;為了便于理解在這里創建了Person和Grade這兩個model from django.db import models class Person(models.Model):SHIRT_SIZES ((S, Small),(M, …

插入腳注把腳注標注刪掉_地獄司機不應該只是英國電影歷史數據中的腳注,這說明了為什么...

插入腳注把腳注標注刪掉Cowritten by Andie Yam由安迪(Andie Yam)撰寫 Hell Drivers”, 1957地獄司機 》電影海報 Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. Mor…

vue之axios 登陸驗證及數據獲取

登陸驗證&#xff0c;獲取token methods:{callApi () {var vm thisvm.msg vm.result //驗證地址vm.loginUrl http://xxx///查詢地址vm.apiUrl http://yyy/vm.loginModel {username: 你的用戶名,password: 你的密碼,// grant_type: password,}//先獲取 tokenaxios.post(v…

5926. 買票需要的時間

5926. 買票需要的時間 有 n 個人前來排隊買票&#xff0c;其中第 0 人站在隊伍 最前方 &#xff0c;第 (n - 1) 人站在隊伍 最后方 。 給你一個下標從 0 開始的整數數組 tickets &#xff0c;數組長度為 n &#xff0c;其中第 i 人想要購買的票數為 tickets[i] 。 每個人買票…

貝葉斯統計 傳統統計_統計貝葉斯如何補充常客

貝葉斯統計 傳統統計For many years, academics have been using so-called frequentist statistics to evaluate whether experimental manipulations have significant effects.多年以來&#xff0c;學者們一直在使用所謂的常客統計學來評估實驗操作是否具有significant效果。…

吳恩達機器學習+林軒田機器學習+高等數學和線性代數等視頻領取

機器學習一直是一個熱門的領域。這次小編應大家需求&#xff0c;整理了許多相關學習視頻和書籍。本次分享包含&#xff1a;臺灣大學林軒田老師的【機器學習基石】和【機器學習技法】視頻教學、吳恩達老師的機器學習分享、徐小湛的高等數學和線性代數視頻&#xff0c;還有相關機…

saltstack二

配置管理 haproxy的安裝部署 haproxy各版本安裝包下載路徑https://www.haproxy.org/download/1.6/src/&#xff0c;跳轉地址為http&#xff0c;改為https即可 創建相關目錄 # 創建配置目錄 [rootlinux-node1 ~]# mkdir /srv/salt/prod/pkg/ [rootlinux-node1 ~]# mkdir /srv/sa…

319. 燈泡開關

319. 燈泡開關 初始時有 n 個燈泡處于關閉狀態。第一輪&#xff0c;你將會打開所有燈泡。接下來的第二輪&#xff0c;你將會每兩個燈泡關閉一個。 第三輪&#xff0c;你每三個燈泡就切換一個燈泡的開關&#xff08;即&#xff0c;打開變關閉&#xff0c;關閉變打開&#xff0…

如何生成隨機不重復的11位數字

要求 不重復隨機11位數字不占存儲我們都知道11位數字(random)對應有最大值max和最小值min99999999999和10000000000.很簡單的從最小值開始按順序分發到最大值&#xff0c;就滿足了不重復&#xff0c;不占存儲&#xff0c;11位數字的特性。那么接下來就要考慮如何生成隨機數字這…

因為你的電腦安裝了即點即用_即你所愛

因為你的電腦安裝了即點即用Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. More importantly, it’s a fun way to connect things we love — visualizing data and …

關于前端緩存問題

Cookie、localStorage、sessionStorage的異同 之前沒怎接觸過前端緩存&#xff0c;請教了前端同事之后他給我粘了幾行代碼&#xff0c;用localStorage存取信息&#xff0c;后來老大review代碼的時候發現&#xff0c;被批了一頓&#xff0c;現在好好看看這幾個前端緩存的區別&am…

2074. 反轉偶數長度組的節點

2074. 反轉偶數長度組的節點 給你一個鏈表的頭節點 head 。 鏈表中的節點 按順序 劃分成若干 非空 組&#xff0c;這些非空組的長度構成一個自然數序列&#xff08;1, 2, 3, 4, …&#xff09;。一個組的 長度 就是組中分配到的節點數目。換句話說&#xff1a; 節點 1 分配給…

阿里云云服務器硬盤分區及掛載

云服務器環境&#xff1a;CentOS 6.2 64位 客戶端環境&#xff1a;Mac OSX 遠程連接方式&#xff1a;運行 Terminal&#xff0c;輸入命令 ssh usernameip 硬盤分區及掛載操作步驟&#xff1a; 查看未掛載的硬盤&#xff08;名稱為/dev/xvdb&#xff09;fdisk -l Disk /dev/xvdb…

團隊管理新思考_需要一個新的空間來思考討論和行動

團隊管理新思考andrew wong安德魯黃 Follow跟隨 Sep 4 九月4 There is a need for a new space to think, discuss, and act. This need are being felt by the majority of AI / ML / Data Product Managers out there. They are exhausted by the ever increasing data volum…

Uva201

原題地址&#xff1a;https://uva.onlinejudge.org/index.php?optioncom_onlinejudge&Itemid9 題意&#xff1a; 就是要你輸入一系列橫邊的起始點&#xff0c;和豎邊的起始點&#xff0c;然后你去找出這些邊里面構成的所有正方形。 心得體會 一道難度適中的模擬題&#xf…

2075. 解碼斜向換位密碼

2075. 解碼斜向換位密碼 字符串 originalText 使用 斜向換位密碼 &#xff0c;經由 行數固定 為 rows 的矩陣輔助&#xff0c;加密得到一個字符串 encodedText 。 originalText 先按從左上到右下的方式放置到矩陣中。 先填充藍色單元格&#xff0c;接著是紅色單元格&#xff…

微服務實戰(六):落地微服務架構到直銷系統(事件存儲)

在CQRS架構中&#xff0c;一個比較重要的內容就是當命令處理器從命令隊列中接收到相關的命令數據后&#xff0c;通過調用領域對象邏輯&#xff0c;然后將當前事件的對象數據持久化到事件存儲中。主要的用途是能夠快速持久化對象此次的狀態&#xff0c;另外也可以通過未來最終一…

時間序列數據的多元回歸_清理和理解多元時間序列數據

時間序列數據的多元回歸No matter what kind of data science project one is assigned to, making sense of the dataset and cleaning it always critical for success. The first step is to understand the data using exploratory data analysis (EDA)as it helps us crea…