圖像離群值
你是! (You are!)
Actually not. This is not a text about you.
其實并不是。 這不是關于您的文字。
But, as Gladwell puts it in Outliers, if you find yourself being that type of outlier, you’re quite lucky. And rare.
但是,正如Gladwell在“ 離群值”中所說的那樣,如果您發現自己屬于這種離群值,那么您很幸運。 和罕見。
實際上是什么離群值? (What is actually an outlier?)

According to Meriam-Webster, an outlier is:
根據Meriam-Webster的估計,離群值是:
“a statistical observation that is markedly different in value from the others of the sample”
“統計觀察值與樣本中其他值明顯不同”
But you’re not here for that, are you?
但是,您不是在這里嗎?
Let’s simply explain when a data point is considered an outlier, why that might happen, and what you can do about it.
讓我們簡單地解釋一下何時將數據點視為異常值,為什么會發生這種異常以及您可以采取什么措施。
什么時候? (When?)
There are multiple ways with which we can identify and highlight outliers but our goal here is to keep it short and simple, so let’s discuss the easiest way. You can find other ways here.
我們可以使用多種方法來識別和突出顯示離群值,但是我們的目標是使其簡短而簡單,因此讓我們討論最簡單的方法。 您可以在這里找到其他方法。
Any observed value is considered an outlier if it falls beyond the range of 1stQuartile-1.5 x IQR to 3rdQuartile + 1.5 x IQR.
如果任何觀測值超出1stQuartile-1.5 x IQR到3rdQuartile + 1.5 x IQR的范圍,則將其視為異常值。

Stay here!
留在這兒!
I promised it will be easy, so it will. We just have to fix what this IQR (inter-quartile-range) means.
我保證這會很容易,所以會。 我們只需要解決此IQR(四分位間距)的含義即可。
Let’s consider you’re meeting your highschool colleagues, 9 people. All coming in cars. For the purpose of this explanation, let’s image we collect data on the horsepower of all your cars in ascending order.
讓我們考慮一下您正在與9位高中生見面。 都進來的車。 為了便于說明,讓我們想象一下,我們以升序收集有關您所有汽車的馬力的數據。
105 | 133 | 146 | 183 | 190 | 195 | 210 | 220 | 510 ← values collected
105 | 133 | 146 | 183 | 190 | 195 | 210 | 220 | 510←收集的值
Now if you know a bit of statistics, we have what is called quartiles. If you don’t remember please look here and then come back.
現在,如果您知道一些統計信息,我們就有所謂的四分位數。 如果您不記得了,請看這里然后再回來。
IQR = 3rdQuartile - 1stQuartile = 215–139.5 = 75.5
IQR =第三四分位數-1stQuartile = 215–139.5 = 75.5
Now, coming back to what is considered an outlier in our example, we need to calculate Q1-1.5 x IQR and Q3+1.5 x IQR.
現在,回到示例中被認為是異常值的地方,我們需要計算Q1-1.5 x IQR和Q3 + 1.5 x IQR。
Q1 - 1.5 x IQR = 139.5–75.5 = 64 (Q1 — first quartile)
Q1-1.5 x IQR = 139.5–75.5 = 64 ( Q1- 第一個四分位數)
Q3 + 1.5 x IQR = 215 + 75.5 = 290.5 (Q3 — third quartile)
Q3 + 1.5 x IQR = 215 + 75.5 = 290.5 (Q3-第三四分位數)
We’re very close. STAY HERE!
我們非常接近。 留在這里 !
As mentioned before starting the calculation, any observed value that is outside the interval [64;290.5] is considered an outlier. An extreme value compared to the collected data. Question is, are there any values outside the interval in our data? That’s right, 510 is. (Let’s assume that’s you, you have a new BMW M5).
如開始計算之前所述,在間隔[64; 290.5]之外的任何觀測值都被視為異常值。 與收集的數據相比的極值。 問題是,我們的數據間隔之外是否還有其他值? 是的, 510是。 (假設您是您,您有新的BMW M5)。
And here we are, that is the very easy way of calculating outliers out of a set of simple collected data.
這就是從一組簡單的收集數據中計算離群值的非常簡單的方法。
為什么? (Why?)
There are multiple reasons outliers might end up in a set of data. Both good and bad.
有多種原因可能導致離群值出現在一組數據中。 好與壞。
Data entry errors → instead of 510 you wanted to type 210 and thus the value became an outlier;
數據輸入錯誤 →您想輸入210而不是510,因此該值成為異常值;
Measurement errors → you’ve measured your car’s power at a service center that is well known for inflating the numbers. That 510 is not real;
測量誤差 →您已經在服務中心測量了汽車的功率,該服務中心以數字夸大而聞名。 那510不是真實的;
Experimental errors → one of your colleagues, the one with 105 told you the value in kw not in horsepower, the misunderstanding is an experimental error;
實驗錯誤→您的一位同事,有105個告訴您以kw表示的值而不是馬力,誤解是實驗錯誤;
Intentional → you’re putting your colleagues to the test and tell them a value that is not real;
故意 →您正在對同事進行測試,并告訴他們一個不真實的價值;
Natural → and that is where we are, you’re really a hustler and your M5 power is not experimental measurement BS, you really are an outlier.
自然→這就是我們的位置,您真的是騙子,您的M5功率不是實驗測量BS,您確實是一個異常值。
什么? (What?)
Now that you know what they are, how you find them, and what may cause them, what can be done to make use or get rid of them?
現在,您知道它們是什么,如何找到它們以及可能導致它們的原因,可以采取哪些措施來利用或擺脫它們?
If you want to brag about how great the average of hp in your class is, keep the values. Consider that the average is not representative as it is influenced by the outlier. You.
如果您要吹噓班級中的平均功率是多少, 請保留這些值 。 考慮到平均值沒有代表性,因為它受到異常值的影響。 您。
If you think your car is very different and you’re an exception to the other cars, take your value out.
如果您認為自己的汽車與眾不同,并且是其他汽車的例外,那么請充分利用自己的價值。
If you feel like there are other highschool colleagues with powerful cars but did not show up, make another meeting and treat your group as a different one.
如果您覺得還有其他高中生有高功率汽車,但沒有露面,請舉行另一次會議并將您的小組視為另一小組 。
That was it.
就是這樣

This is, as always, an oversimplistic and humoristic approach to explaining rather complex statistical concepts.
與往常一樣,這是一種過于簡單和幽默的方法,用于解釋相當復雜的統計概念。
If you like my work, consider reading other posts of mine, I try to publish weekly:
如果您喜歡我的作品,請考慮閱讀我的其他文章,我嘗試每周發布一次:
翻譯自: https://towardsdatascience.com/what-is-an-outlier-26888fd9870d
圖像離群值
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388965.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388965.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388965.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!