編譯原理 數據流方程
重點 (Top highlight)
Prepare a box of tissues! I’m about to drop a truth bomb about statistics and data science that’ll bring tears to your eyes.
準備一盒紙巾! 我將投放一本關于統計和數據科學的真相炸彈,這會讓您眼淚汪汪。

INFERENCE = DATA + ASSUMPTIONS. In other words, statistics does not give you truth.
推斷=數據+假設。 換句話說,統計并不能為您提供真實的信息。
常見的神話 (Common myths)
Here are some standard misconceptions:
以下是一些標準的誤解:
“If I find the right equations, I can know the unknown.”
“如果找到正確的方程式,我就能知道未知數。”
“If I math at my data hard enough, I can reduce my uncertainty.”
“如果我對數據進行足夠的數學計算,就可以減少不確定性。”
“Statistics can transform data into truth!”
“統計可以將數據轉化為事實!”
They sound like fairytales, don’t they? That’s because they are!
他們聽起來像童話,不是嗎? 那是因為他們!
痛苦的事實 (Painful truths)
There is no magic in the world that lets you make something out of nothing, so abandon that hope now. That’s not what statistics is about. Take it from a statistician. (As a bonus, this article might save you from wasting a decade of your life studying the dark arts of statistics to chase that elusive dream.)
世界上沒有任何魔法可以讓您一無所有,所以現在就放棄那個希望。 那不是統計的意義。 從統計學家那里拿來。 (作為獎勵,這篇文章可能使您免于浪費生命的十年來研究統計的黑暗藝術來追逐那個難以捉摸的夢想。)
Unfortunately, there are plenty of charlatans out there who may try to convince you otherwise. They’ll pull a classic bullying move on you, “You don’t understand the equations I’m clobbering you with, so bow before my superiority and do what I say!”
不幸的是,那里有許多騙子可能試圖說服您。 他們將向您施加經典的欺凌舉動, “您不理解我正在困擾您的方程式,所以在我的優勢面前屈服,做我說的!”
Resist those posers.
抵制那些裝腔作勢者。

伊卡洛斯(Icarus)別摔了! (Don’t land with a splat, Icarus!)
Think of statistical inference (“statistics” for short) as an Icarus-like leap from what we know (our sample data) to what we don’t (our population parameter).
將統計推斷(簡稱“ 統計 ”)視為從我們所知道的(我們的樣本數據 )到我們所不知道的(我們的總體參數 )類似伊卡洛斯的飛躍。
In statistics, what you know is not what you wish you knew.
在統計中,您所知道的并不是您所希望的。
Perhaps you want tomorrow’s facts, but you only have the past to inform you. (It’s so annoying when we can’t remember the future, right?) Perhaps you want to know what all your potential users think of your product, but you can only ask a hundred of them. Then you’re dealing with uncertainty!
也許您想要明天的事實,但只有過去可以告訴您。 (當我們不記得未來時,這真令人討厭,對嗎?)也許您想知道所有潛在用戶對您產品的看法,但您只能問其中的一百個 。 然后,您正在處理不確定性 !
這不是魔術,而是假設 (It’s not magic, it’s assumptions)
How can you possibly leap from what you know to what you don’t? You need a bridge to cross that chasm… and that bridge is assumptions. Which brings me back to the most painful equation in all of data science: DATA + ASSUMPTIONS = PREDICTION.
您怎么可能從知道的知識躍升為不知道的知識? 您需要一座橋梁來克服這一鴻溝……而這座橋梁是假設 。 這使我回到了所有數據科學中最痛苦的方程式:數據+假設=預測。
DATA + ASSUMPTIONS = PREDICTION
數據+假設=預測
(Feel free to replace the word “prediction” with “inference” or “forecast” if you like — they’re all the same thing here: a statement about something you can’t know for sure.)
(如果愿意,可以用“ 推斷 ”或“ 預測 ”替換“ 預測 ”一詞,它們在這里都是一樣的:關于您不確定的事情的陳述。)
有什么假設? (What‘s an assumption?)
If we knew all the facts (and we knew that our facts were actually true facts), we wouldn’t need assumptions (or statisticians). Assumptions are the ugly patches you use to bridge the gap between what you know and what you wish you knew. They’re hacks you have to use to make the math work out when you’re missing the facts.
如果我們知道所有事實 (并且我們知道我們的事實實際上是真實的事實),則不需要假設(或統計學家)。 假設是您用來彌合您所知道和所希望之間的鴻溝的丑陋補丁。 當您錯過事實時,您必須使用它們來進行數學計算。
Assumptions are ugly band-aids you put over the parts where information is missing.
假設是您在缺少信息的部分上貼上了丑陋的創可貼。
Should I put it more bluntly? An assumption is not a fact, it’s some nonsense you make up precisely because you’ve got gaping holes in your knowledge. If you’re in the habit of bullying people with your overconfidence intervals, take a moment to remind yourself of that it’s a stretch to refer to anything based on assumptions as truth. It’s best to start treating the whole thing as a personal decision-making tool that is imperfect but better than nothing (in specific situations).
我應該說得更直白些嗎? 假設不是事實,這恰恰是因為您的知識空洞而造成的,這是胡說八道。 如果您習慣于以過分自信的時間欺負他人,請花點時間提醒自己,將任何基于假設的東西稱為真理是很困難的 。 最好開始將整個事情視為不完美但總比沒有好( 在特定情況下 )的個人決策工具 。
Statistics is your attempt to do your best in an uncertain world.
統計數據是您在不確定的世界中盡力而為的嘗試。
There are always assumptions.
總有假設。
假設是決策的一部分 (Assumptions are part of decision-making)
Show me an “assumption-free” real-world decision and I’ll rattle off a host of implicit assumptions you’re not even aware you’re making.
向我展示一個“無假設”的現實決策,我會冒充您甚至不知道自己在做的一系列隱含假設。
Examples: When you read a newspaper, did you assume all the facts were checked? When you made your plans for 2020, did you assume there would be no global pandemic? If you analyzed data, did you assume the information was captured without errors? Did you assume that your random number generator is random? (They usually aren’t.) When you chose to make an online purchase, did you assume the right amount would be withdrawn from your bank account? What about the last snack you had, did you assume it wouldn’t poison you? When you took medicine, did you *know* anything about its long-term safety and efficacy… or did you assume?
示例: 當您閱讀報紙時,您是否假設所有事實都經過檢查? 當您制定2020年計劃時,您是否假設不會發生全球大流行? 如果您分析了數據,您是否假設信息被正確捕獲? 您是否假設您的隨機數生成器是隨機的? (通常不是。)當您選擇進行在線購買時,您是否假設將從您的銀行帳戶中提取了正確的金額? 您最近吃的零食怎么樣,您是否認為它不會毒死您? 當您服藥時,您是否*知道*有關其長期安全性和功效的任何信息……還是您假設?
Like it or not, assumptions are part of decision-making.
不管喜歡與否,假設都是決策的一部分。
Like it or not, assumptions are always part of decision-making. A proper foray into real-world data should contain a host of written-down assumptions where the data scientist comes clean about corners they had to cut.
無論喜歡與否,假設始終是決策的一部分。 對現實世界數據的適當嘗試應包含大量的書面假設, 數據科學家可以清楚地了解自己必須削減的數據。
Even if you choose to steer clear of statistics, you’re probably using assumptions to guide your actions. To stay safe, it’s crucial that you keep track of the assumptions that your decisions are based on.
即使您選擇避開統計信息,您也可能會使用假設來指導自己的行動。 為了保持安全,至關重要的是,您要跟蹤決策所依據的假設。
統計“魔術”如何發生 (How the statistical “magic” happens)
The field of statistics gives you a whole arsenal of tools for formalizing your assumptions and combining them with evidence to make reasonable decisions. (Catch my 8 minute intro to stats here.)
統計領域為您提供了一整套工具,用于正規化您的假設并將其與證據結合以做出合理的決定。 ( 在這里獲取我8分鐘的統計簡介)。
It’s preposterous to expect an analysis involving uncertainty and probability to be a source of truth-with-a-capital-T.
期望將涉及不確定性和概率的分析作為資本真實性T的來源是荒謬的。
Yep, that’s how the statistical “magic” happens. You choose which assumptions you’re willing to live with, then you combine them with data to take reasonable actions on the basis of that unholy union. That’s all statistics is.
是的,這就是統計“魔術”的發生方式。 您選擇愿意接受的假設,然后將它們與數據結合起來,以根據那個邪惡的聯盟采取合理的行動。 這就是所有統計信息。

That’s why an analysis involving uncertainty and probability could never be a source of truth-with-a-capital-T. There is no secret dark art that can do that for you.
這就是為什么涉及不確定性和概率的分析永遠不會成為資本真實性的來源。 沒有秘密的黑暗藝術可以為您做到這一點。
Two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions.
兩個人可以從同一數據得出完全不同的有效結論! 它所要做的只是使用不同的假設。
It’s also why two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions. Statistics gives you a tool for making decisions more thoughtfully, but there’s no single right way to use it. It’s a personal decision-making tool.
這也是為什么兩個人可以從同一數據得出完全不同的有效結論的原因! 它所要做的只是使用不同的假設。 統計信息為您提供了一種更周到地制定決策的工具,但是沒有唯一正確的使用方法。 這是個人決策工具。
A study is only as good as the assumptions you’ll make about it.
一項研究僅與您對它所做的假設一樣好 。
那科學呢? (What about science?)
What does it mean when a scientist uses statistics to come to a conclusion? Simply that they’ve formed an opinion and have made the decision to share it with the world. That’s not a bad thing — it’s a scientist’s job to form opinions reluctantly, which makes me feel better about assuming that they’re worth listening to.
科學家使用統計數據得出結論是什么意思? 只是他們已經形成了一種意見,并決定與世界分享。 這不是一件壞事-勉強地形成觀點是科學家的工作,這使我對假設它們值得聽取感到更好。
It’s a scientist’s job to form opinions reluctantly.
勉強形成意見是科學家的工作。
I’m a huge fan of taking advice from those who have more expertise and information than I do, but I never let myself confuse their opinions with facts. But while many scientists are well-versed in working with probability, I’ve seen other scientists make enough statistical mess to last several lifetimes. Opinions could not (and should not) convince someone who’s not willing to make the assumption that those opinions were arrived at competently from a blend of evidence and mutually-palatable untested assumptions.
我非常喜歡 忠告 那些比我擁有更多專業知識和信息的人,但我從來沒有讓自己迷惑他們 意見 與 事實 。 但是,盡管許多科學家精通概率論,但我已經看到其他科學家在統計上一團糟,可以持續幾生。 意見不能(也不應該)說服別人誰是不愿意讓這些意見是在勝任從證據和相互 -palatable未經檢驗的假設混合到達的假設 。
If you’d like to hear more of my musings on science and scientists, read this.
如果您想聽到更多我對科學和科學家沉思的,讀 這個 。
綜上所述 (In summary)
It’s best to think of statistics as the science of changing your mind under uncertainty. It’s a framework to help you make thoughtful decisions when you lack information… and there’s no single right way to use it.
最好將統計數字視為在不確定性下改變主意的科學 。 它是一個框架,可在您缺乏信息時幫助您做出周到的決定……并且沒有唯一正確的使用方法。
And no, it doesn’t give you the facts you need; it gives you what you need to cope with not having those facts in the first place. The entire point is to help you do your best in an uncertain world.
不,它并不能為您提供所需的事實。 它為您提供了您需要解決的事情,而不是一開始就沒有這些事實。 關鍵是要幫助您在不確定的世界中盡力而為。
To do that, you’ll have to start making assumptions.
為此,您必須開始進行假設。
接下來 (Next up)
In follow-up articles, I’ll write about where assumptions come from, how to pick “good” assumptions, and what it means to test an assumption. If these topics intrigue you, your retweets are my favorite motivation for writing.
在后續文章中,我將介紹假設的來源,如何選擇“好的”假設以及檢驗假設的含義。 如果這些主題引起您的興趣,您的轉發是我最喜歡寫的動機。
In the meantime, most of the links in this article take you to my other musings. Can’t choose? Try one of these:
同時,本文中的大多數鏈接都將您帶入我的其他想法。 無法選擇? 嘗試以下方法之一:
翻譯自: https://towardsdatascience.com/the-saddest-equation-in-data-science-e60e7819b63f
編譯原理 數據流方程
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/391796.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/391796.shtml 英文地址,請注明出處:http://en.pswp.cn/news/391796.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!