條件概率分布
If you’re currently in the job market or looking to switch careers, you’ve probably noticed an increase in popularity of Data Science jobs. In 2019, LinkedIn ranked “data scientist” the №1 most promising job in the U.S. based on job openings, salary, and career advancement opportunities and reported a 56% rise in job openings for data scientists over the previous year. Despite its popularity, however, data science can me a difficult field to enter, let alone to learn. I know from my personal experience, the amount of statistics involved made it very challenging. Probability, in particular, can be quite complicated but is fundamental to many machine learning models such as decision tree learning. So the purpose of this article is to provide a rudimentary undertanding of conditional probability.
如果您目前正處于就業市場或正在尋求轉行,您可能已經注意到Data Science職位的受歡迎程度有所提高。 根據職位空缺,薪水和職業晉升機會,LinkedIn在2019年將“數據科學家”排在美國最有前途的工作之一,并報告說數據科學家的職位空缺比上一年增長了56%。 盡管它非常流行,但是數據科學還是一個很難進入的領域,更不用說學習了。 從我的親身經歷,我知道所涉及的統計數據非常具有挑戰性。 概率尤其可能非常復雜,但是對于許多機器學習模型(例如決策樹學習)而言,這是基礎。 因此,本文的目的是提供對條件概率的基本理解。
How To Calculate Probability
如何計算概率
Simply put, the probability of an event happening is equal to the number of times an event could happen divided by the total number of outcomes. For example, imagine you have a deck of cards and you want to calculate the probability that you’ll randomly pull a king from the deck. How would you calculate that? Well, since there are 4 kings in a deck of cards, there are 4 possible ways you can draw a king from the deck; and since there are 52 cards in the deck, there’s 52 possible outcomes. So 4 divided by 52 is .076 or 7.6% chance your card will be a king. Now say you want to figure out the probability of drawing another king — the answer will depend on how you handle replacement. Sampling with replacement means that you place the first card back into the deck making the two events independant (the probability of drawing each king doesn’t change). Sampling without replacement means you’re not placing the first card back, which affects the probability of drawing the second king (total number of outcomes is now 51). If event A is drawing the first king card and event B os drawing the second king card, then we’d say the probability of B given A is equal to the probability of event A multiplied by the probability of event B given that A occurs.
簡而言之,事件發生的概率等于事件可能發生的次數除以結果總數。 例如,假設您有一副撲克牌,并且想要計算隨機從該副牌中拉出國王的概率。 您將如何計算? 好吧,由于在一副紙牌中有4個國王,因此有四種方法可以從紙牌中抽出一張國王; 而且由于套牌中有52張牌,因此有52種可能的結果。 因此,將4除以52得出的結果是.076,即7.6%的機會是您的卡成為王牌。 現在,您要確定吸引另一位國王的可能性-答案將取決于您如何進行替換 。 進行替換采樣意味著您將第一張卡放回卡組中,從而使兩個事件無關(抽出每位國王的概率不變)。 無需更換就可以進行采樣,這意味著您不會放回第一張紙牌,這會影響抽出第二張王牌的可能性(現在總結果為51)。 如果事件A吸引第一張王牌而事件B os吸引第二張王牌,那么我們說給定A的B概率等于事件A的概率乘以給定A發生的事件B的概率。
Mathematical Notation
P(A and B) = P(A) x P(B|A) = 4/52 x 3/51 = .45%
Tree Diagram
樹狀圖
Mathematics isn’t intuitive to everyone; it certainly wasn’t for me as I was just starting out in this field. Visualizations, however, can be a great tool when it comes to reenforcing complex topics. A tree diagram is one example that can help you break down a general problem into smaller components — perfect for probability problems that involves multiple events that lead to a variety of outcomes. For example, take a look at the diagram I’ve created that helps answer the following question: If you have a bag of 23 marbles (5 green, 8 blue, and 10 red), what’s the probability that you’ll randomly pull out a blue marble and a green marble? Let’s break it down.
數學不是每個人都直觀的。 因為我剛開始涉足這一領域,所以對我當然不是。 但是,在強化復雜主題時,可視化可能是一個很好的工具。 樹形圖是一個示例,可以幫助您將一般問題分解為較小的部分-非常適合涉及多個事件并導致各種結果的概率問題。 例如,看一下我創建的有助于回答以下問題的圖表:如果您有一袋23顆大理石(5顆綠色,8顆藍色和10顆紅色),那么您隨機抽出的概率是多少?藍色大理石和綠色大理石? 讓我們分解一下。
- The probability of grabbing a blue marble is 35%, because there are 8 way you can get a blue marble and 23 total potential outcomes. 抓住藍色大理石的可能性為35%,因為有8種方法可以獲取藍色大理石,并且有23種潛在結果。
Now given that you pulled out a blue marble, the probability of grabbing a green marble from the bag is 23% — 5 green marbles divided by 22 potential outcomes (notice how the total number of outcomes changes the second time, hence the change in probability).
現在,假設您拔出一塊藍色大理石,則從袋子中抓取綠色大理石的概率為23%-5個綠色大理石除以22個潛在結果(請注意結果總數如何第二次更改,因此概率發生變化) 。
Finally, calculating the probability of both these events happening involves multiplying the probability of both events (.35 x .23 = 8%).
最后,計算這兩個事件發生的概率涉及將兩個事件的概率相乘(.35 x .23 = 8%)。
Conclusion
結論
Hopefully this demsonstration has given you a clearer mental picture of statistical probability. Even though conditional probability may seem elementary compared to the more advanced concepts in machine learning, having a solid understanding of the foundation of which data science is built on is extremely important. So whenever you begin to learn something new, remember that no topic is too small and relearning is reenforcement.
希望這種演示能使您對統計概率有更清晰的認識。 盡管與機器學習中更高級的概念相比,條件概率似乎是基本的,但對數據科學所基于的基礎有扎實的了解仍然非常重要。 因此,每當您開始學習新知識時,請記住,沒有一個主題太小,重新學習就是強化。
翻譯自: https://medium.com/swlh/conditional-probability-7f519a81655e
條件概率分布
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389475.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389475.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389475.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!