編譯原理 數據流方程_數據科學中最可悲的方程式

編譯原理 數據流方程

重點 (Top highlight)

Prepare a box of tissues! I’m about to drop a truth bomb about statistics and data science that’ll bring tears to your eyes.

準備一盒紙巾! 我將投放一本關于統計和數據科學的真相炸彈,這會讓您眼淚汪汪。

Image for post
SOURCE.SOURCE 。

INFERENCE = DATA + ASSUMPTIONS. In other words, statistics does not give you truth.

推斷=數據+假設。 換句話說,統計并不能為您提供真實的信息。

常見的神話 (Common myths)

Here are some standard misconceptions:

以下是一些標準的誤解:

  • “If I find the right equations, I can know the unknown.”

    “如果找到正確的方程式,我就能知道未知數。”

  • “If I math at my data hard enough, I can reduce my uncertainty.”

    “如果我對數據進行足夠的數學計算,就可以減少不確定性。”

  • “Statistics can transform data into truth!”

    “統計可以將數據轉化為事實!”

They sound like fairytales, don’t they? That’s because they are!

他們聽起來像童話,不是嗎? 那是因為他們!

痛苦的事實 (Painful truths)

There is no magic in the world that lets you make something out of nothing, so abandon that hope now. That’s not what statistics is about. Take it from a statistician. (As a bonus, this article might save you from wasting a decade of your life studying the dark arts of statistics to chase that elusive dream.)

世界上沒有任何魔法可以讓您一無所有,所以現在就放棄那個希望。 那不是統計的意義。 從統計學家那里拿來。 (作為獎勵,這篇文章可能使您免于浪費生命的十年來研究統計的黑暗藝術來追逐那個難以捉摸的夢想。)

Unfortunately, there are plenty of charlatans out there who may try to convince you otherwise. They’ll pull a classic bullying move on you, “You don’t understand the equations I’m clobbering you with, so bow before my superiority and do what I say!”

不幸的是,那里有許多騙子可能試圖說服您。 他們將向您施加經典的欺凌舉動, “您不理解我正在困擾您的方程式,所以在我的優勢面前屈服,做我說的!”

Resist those posers.

抵制那些裝腔作勢者。

Image for post
SOURCE.SOURCE 。

伊卡洛斯(Icarus)別摔了! (Don’t land with a splat, Icarus!)

Think of statistical inference (“statistics” for short) as an Icarus-like leap from what we know (our sample data) to what we don’t (our population parameter).

將統計推斷(簡稱“ 統計 ”)視為從我們所知道的(我們的樣本數據 )到我們所不知道的(我們的總體參數 )類似伊卡洛斯的飛躍。

In statistics, what you know is not what you wish you knew.

在統計中,您所知道的并不是您所希望的。

Perhaps you want tomorrow’s facts, but you only have the past to inform you. (It’s so annoying when we can’t remember the future, right?) Perhaps you want to know what all your potential users think of your product, but you can only ask a hundred of them. Then you’re dealing with uncertainty!

也許您想要明天的事實,但只有過去可以告訴您。 (當我們不記得未來時,這真令人討厭,對嗎?)也許您想知道所有潛在用戶對您產品的看法,但您只能問其中的一百個 。 然后,您正在處理不確定性 !

這不是魔術,而是假設 (It’s not magic, it’s assumptions)

How can you possibly leap from what you know to what you don’t? You need a bridge to cross that chasm… and that bridge is assumptions. Which brings me back to the most painful equation in all of data science: DATA + ASSUMPTIONS = PREDICTION.

您怎么可能從知道的知識躍升為不知道的知識? 您需要一座橋梁來克服這一鴻溝……而這座橋梁是假設 。 這使我回到了所有數據科學中最痛苦的方程式:數據+假設=預測。

DATA + ASSUMPTIONS = PREDICTION

數據+假設=預測

(Feel free to replace the word “prediction” with “inference” or “forecast” if you like — they’re all the same thing here: a statement about something you can’t know for sure.)

(如果愿意,可以用“ 推斷 ”或“ 預測 ”替換“ 預測 ”一詞,它們在這里都是一樣的:關于您不確定的事情的陳述。)

Image for post
SOURCE.SOURCE 。

有什么假設? (What‘s an assumption?)

If we knew all the facts (and we knew that our facts were actually true facts), we wouldn’t need assumptions (or statisticians). Assumptions are the ugly patches you use to bridge the gap between what you know and what you wish you knew. They’re hacks you have to use to make the math work out when you’re missing the facts.

如果我們知道所有事實 (并且我們知道我們的事實實際上是真實的事實),則不需要假設(或統計學家)。 假設是您用來彌合您所知道和所希望之間的鴻溝的丑陋補丁。 當您錯過事實時,您必須使用它們來進行數學計算。

Assumptions are ugly band-aids you put over the parts where information is missing.

假設是您在缺少信息的部分上貼上了丑陋的創可貼。

Should I put it more bluntly? An assumption is not a fact, it’s some nonsense you make up precisely because you’ve got gaping holes in your knowledge. If you’re in the habit of bullying people with your overconfidence intervals, take a moment to remind yourself of that it’s a stretch to refer to anything based on assumptions as truth. It’s best to start treating the whole thing as a personal decision-making tool that is imperfect but better than nothing (in specific situations).

我應該說得更直白些嗎? 假設不是事實,這恰恰是因為您的知識空洞而造成的,這是胡說八道。 如果您習慣于以過分自信的時間欺負他人,請花點時間提醒自己,將任何基于假設的東西稱為真理是很困難的 。 最好開始將整個事情視為不完美但總比沒有好( 在特定情況下 )的個人決策工具 。

Statistics is your attempt to do your best in an uncertain world.

統計數據是您在不確定的世界中盡力而為的嘗試。

There are always assumptions.

總有假設。

假設是決策的一部分 (Assumptions are part of decision-making)

Show me an “assumption-free” real-world decision and I’ll rattle off a host of implicit assumptions you’re not even aware you’re making.

向我展示一個“無假設”的現實決策,我會冒充您甚至不知道自己在做的一系列隱含假設。

Examples: When you read a newspaper, did you assume all the facts were checked? When you made your plans for 2020, did you assume there would be no global pandemic? If you analyzed data, did you assume the information was captured without errors? Did you assume that your random number generator is random? (They usually aren’t.) When you chose to make an online purchase, did you assume the right amount would be withdrawn from your bank account? What about the last snack you had, did you assume it wouldn’t poison you? When you took medicine, did you *know* anything about its long-term safety and efficacy… or did you assume?

示例: 當您閱讀報紙時,您是否假設所有事實都經過檢查? 當您制定2020年計劃時,您是否假設不會發生全球大流行? 如果您分析了數據,您是否假設信息被正確捕獲? 您是否假設您的隨機數生成器是隨機的? (通常不是。)當您選擇進行在線購買時,您是否假設將從您的銀行帳戶中提取了正確的金額? 您最近吃的零食怎么樣,您是否認為它不會毒死您? 當您服藥時,您是否*知道*有關其長期安全性和功效的任何信息……還是您假設?

Like it or not, assumptions are part of decision-making.

不管喜歡與否,假設都是決策的一部分。

Like it or not, assumptions are always part of decision-making. A proper foray into real-world data should contain a host of written-down assumptions where the data scientist comes clean about corners they had to cut.

無論喜歡與否,假設始終是決策的一部分。 對現實世界數據的適當嘗試應包含大量的書面假設, 數據科學家可以清楚地了解自己必須削減的數據。

Even if you choose to steer clear of statistics, you’re probably using assumptions to guide your actions. To stay safe, it’s crucial that you keep track of the assumptions that your decisions are based on.

即使您選擇避開統計信息,您也可能會使用假設來指導自己的行動。 為了保持安全,至關重要的是,您要跟蹤決策所依據的假設。

統計“魔術”如何發生 (How the statistical “magic” happens)

The field of statistics gives you a whole arsenal of tools for formalizing your assumptions and combining them with evidence to make reasonable decisions. (Catch my 8 minute intro to stats here.)

統計領域為您提供了一整套工具,用于正規化您的假設并將其與證據結合以做出合理的決定。 ( 在這里獲取我8分鐘的統計簡介)。

It’s preposterous to expect an analysis involving uncertainty and probability to be a source of truth-with-a-capital-T.

期望將涉及不確定性和概率的分析作為資本真實性T的來源是荒謬的。

Yep, that’s how the statistical “magic” happens. You choose which assumptions you’re willing to live with, then you combine them with data to take reasonable actions on the basis of that unholy union. That’s all statistics is.

是的,這就是統計“魔術”的發生方式。 選擇愿意接受的假設,然后將它們與數據結合起來,以根據那個邪惡的聯盟采取合理的行動。 這就是所有統計信息。

Image for post
SOURCE.SOURCE 。

That’s why an analysis involving uncertainty and probability could never be a source of truth-with-a-capital-T. There is no secret dark art that can do that for you.

這就是為什么涉及不確定性和概率的分析永遠不會成為資本真實性的來源。 沒有秘密的黑暗藝術可以為您做到這一點。

Two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions.

兩個人可以從同一數據得出完全不同的有效結論! 它所要做的只是使用不同的假設。

It’s also why two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions. Statistics gives you a tool for making decisions more thoughtfully, but there’s no single right way to use it. It’s a personal decision-making tool.

這也是為什么兩個人可以從同一數據得出完全不同的有效結論的原因! 它所要做的只是使用不同的假設。 統計信息為您提供了一種更周到地制定決策的工具,但是沒有唯一正確的使用方法。 這是個人決策工具。

A study is only as good as the assumptions you’ll make about it.

一項研究僅與您對它所做的假設一樣好 。

那科學呢? (What about science?)

What does it mean when a scientist uses statistics to come to a conclusion? Simply that they’ve formed an opinion and have made the decision to share it with the world. That’s not a bad thing — it’s a scientist’s job to form opinions reluctantly, which makes me feel better about assuming that they’re worth listening to.

科學家使用統計數據得出結論是什么意思? 只是他們已經形成了一種意見,并決定與世界分享。 這不是一件壞事-勉強地形成觀點是科學家的工作,這使我對假設它們值得聽取感到更好。

It’s a scientist’s job to form opinions reluctantly.

勉強形成意見是科學家的工作。

I’m a huge fan of taking advice from those who have more expertise and information than I do, but I never let myself confuse their opinions with facts. But while many scientists are well-versed in working with probability, I’ve seen other scientists make enough statistical mess to last several lifetimes. Opinions could not (and should not) convince someone who’s not willing to make the assumption that those opinions were arrived at competently from a blend of evidence and mutually-palatable untested assumptions.

我非常喜歡 忠告 那些比我擁有更多專業知識和信息的人,但我從來沒有讓自己迷惑他們 意見 事實 但是,盡管許多科學家精通概率論,但我已經看到其他科學家在統計上一團糟,可以持續幾生。 意見不能(也不應該)說服別人誰是不愿意讓這些意見是在勝任從證據和相互 -palatable未經檢驗的假設混合到達的假設

If you’d like to hear more of my musings on science and scientists, read this.

如果您想聽到更多我對科學和科學家沉思的,讀 這個

綜上所述 (In summary)

It’s best to think of statistics as the science of changing your mind under uncertainty. It’s a framework to help you make thoughtful decisions when you lack information… and there’s no single right way to use it.

最好將統計數字視為在不確定性下改變主意的科學 。 它是一個框架,可在您缺乏信息時幫助您做出周到的決定……并且沒有唯一正確的使用方法。

And no, it doesn’t give you the facts you need; it gives you what you need to cope with not having those facts in the first place. The entire point is to help you do your best in an uncertain world.

不,它并不能為您提供所需的事實。 它為您提供了您需要解決的事情,而不是一開始就沒有這些事實。 關鍵是要幫助您在不確定的世界中盡力而為。

To do that, you’ll have to start making assumptions.

為此,您必須開始進行假設。

接下來 (Next up)

In follow-up articles, I’ll write about where assumptions come from, how to pick “good” assumptions, and what it means to test an assumption. If these topics intrigue you, your retweets are my favorite motivation for writing.

在后續文章中,我將介紹假設的來源,如何選擇“好的”假設以及檢驗假設的含義。 如果這些主題引起您的興趣,您的轉發是我最喜歡寫的動機。

In the meantime, most of the links in this article take you to my other musings. Can’t choose? Try one of these:

同時,本文中的大多數鏈接都將您帶入我的其他想法。 無法選擇? 嘗試以下方法之一:

翻譯自: https://towardsdatascience.com/the-saddest-equation-in-data-science-e60e7819b63f

編譯原理 數據流方程

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/391796.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/391796.shtml
英文地址,請注明出處:http://en.pswp.cn/news/391796.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

@ConTrollerAdvice的使用

ConTrollerAdvice&#xff0c;從名字上面看是控制器增強的意思。 在javaDoc寫到/*** Indicates the annotated class assists a "Controller".** <p>Serves as a specialization of {link Component Component}, allowing for* implementation classes to be a…

Mybatis—注解開發

Mybatis的注解開發 MyBatis的常用注解 這幾年來注解開發越來越流行&#xff0c;Mybatis也可以使用注解開發方式&#xff0c;這樣我們就可以減少編寫Mapper映射文件了。 Insert&#xff1a;實現新增 Update&#xff1a;實現更新 Delete&#xff1a;實現刪除 Select&#x…

道路工程結構計算軟件_我從軟件工程到產品管理的道路

道路工程結構計算軟件by Sari Harrison莎莉哈里森(Sari Harrison) 我從軟件工程到產品管理的道路 (My path from software engineering to product management) 以及一些有關如何自己做的建議 (And some advice on how to do it yourself) I am often asked how to make the m…

Vue 指令

下面列舉VUE的HTML頁面模板指令&#xff0c;并進行分別練習。 1. templates 2. v-if, v-for <div idapp><ol><li v-for"todo in todos>{{ todo.text}}</li></ol> </div><script>app new Vue({ el: #app, data: { return…

iOS-FMDB

2019獨角獸企業重金招聘Python工程師標準>>> #import <Foundation/Foundation.h> #import <FMDatabase.h> #import "MyModel.h"interface FMDBManager : NSObject {FMDatabase *_dataBase; }(instancetype)shareInstance;- (BOOL)insert:(MyM…

解決朋友圈壓縮_朋友中最有趣的朋友[已解決]

解決朋友圈壓縮We live in uncertain times.我們生活在不確定的時代。 We don’t know when we’re going back to school or the office. We don’t know when we’ll be able to sit inside at a restaurant. We don’t even know when we’ll be able to mosh at a Korn co…

西安項目分析

西安物流 西安高考補習 西安藝考 轉載于:https://www.cnblogs.com/wpxuexi/p/7294269.html

MapServer應用開發平臺示例

MapServer為當前開源WebGIS的應用代表&#xff0c;在西方社會應用面極為廣泛&#xff0c;現介紹幾個基于它的開源應用平臺。 1.GeoMOOSE GeoMoose is a Web Client Javascript Framework for displaying distributed cartographic data. Among its many strengths, it can hand…

leetcode 995. K 連續位的最小翻轉次數(貪心算法)

在僅包含 0 和 1 的數組 A 中&#xff0c;一次 K 位翻轉包括選擇一個長度為 K 的&#xff08;連續&#xff09;子數組&#xff0c;同時將子數組中的每個 0 更改為 1&#xff0c;而每個 1 更改為 0。 返回所需的 K 位翻轉的最小次數&#xff0c;以便數組沒有值為 0 的元素。如果…

kotlin數據庫_如何在Kotlin應用程序中使用Xodus數據庫

kotlin數據庫I want to show you how to use one of my favorite database choices for Kotlin applications. Namely, Xodus. Why do I like using Xodus for Kotlin applications? Well, here are a couple of its selling points:我想向您展示如何在Kotlin應用程序中使用我…

使用route add添加路由,使兩個網卡同時訪問內外網

route add命令格式&#xff1a;route [-f] [-p] [Command] [Destination] [mask Netmask] [Gateway] [metric Metric] [if Interface] 通過配置電腦的靜態路由來實現同時訪問內外網的。電腦的網絡IP配置不用變&#xff0c;兩個網卡都按照正常配置&#xff08;都配置IP地址、子網…

基于JavaConfig配置的Spring MVC的構建

上一篇講了基于XML配置的構建&#xff0c;這一篇講一講基于JavaConfig的構建。為什么要寫這篇文章&#xff0c;因為基于xml配置的構建&#xff0c;本人認為很麻煩&#xff0c;要寫一堆的配置&#xff0c;不夠簡潔&#xff0c;而基于JavacConfig配置的構建符合程序員的編碼習慣&…

pymc3 貝葉斯線性回歸_使用PyMC3進行貝葉斯媒體混合建模,帶來樂趣和收益

pymc3 貝葉斯線性回歸Michael Johns, Zhenyu Wang, Bruno Dupont, and Luca Fiaschi邁克爾約翰斯&#xff0c;王振宇&#xff0c;布魯諾杜邦和盧卡菲亞斯基 “If you can’t measure it, you can’t manage it, or fix it”“如果無法衡量&#xff0c;就無法管理或修復它” –…

webkit中對incomplete type指針的處理技巧

近日在研究webkit的時候發現了一個函數 template<typename T> inline void deleteOwnedPtr(T* ptr) {typedef char known[sizeof(T) ? 1 : -1];if(sizeof(known))delete ptr; } 一開始對這個函數非常費解&#xff0c;為什么作者不直接 delete ptr; 通過上stackoverflow提…

leetcode 1004. 最大連續1的個數 III(滑動窗口)

給定一個由若干 0 和 1 組成的數組 A&#xff0c;我們最多可以將 K 個值從 0 變成 1 。 返回僅包含 1 的最長&#xff08;連續&#xff09;子數組的長度。 示例 1&#xff1a; 輸入&#xff1a;A [1,1,1,0,0,0,1,1,1,1,0], K 2 輸出&#xff1a;6 解釋&#xff1a; [1,1,1…

我如何找到工作并找到理想的工作

By Julius Zerwick朱利葉斯澤威克(Julius Zerwick) This article is about how I went through my job hunt for a full time position as a software engineer in New York City and ended up with my dream job. I had spent two years building my skills and had aspirati…

synchronized 與 Lock 的那點事

synchronized 與 Lock 的那點事 最近在做一個監控系統&#xff0c;該系統主要包括對數據實時分析和存儲兩個部分&#xff0c;由于并發量比較高&#xff0c;所以不可避免的使用到了一些并發的知識。為了實現這些要求&#xff0c;后臺使用一個隊列作為緩存&#xff0c;對于請求只…

ols線性回歸_普通最小二乘[OLS]方法使用于機器學習的簡單線性回歸變得容易

ols線性回歸Hello Everyone!大家好&#xff01; I am super excited to be writing another article after a long time since my previous article was published.自從上一篇文章發表很長時間以來&#xff0c;我很高興能寫另一篇文章。 A Simple Linear Regression [SLR] is…

ubuntu安裝配置jdk

先去 Oracle下載Linux下的JDK壓縮包&#xff0c;我下載的是jdk-7u4-linux-i586.tar.gz文件&#xff0c;下好后直接解壓Step1:# 將解壓好的jdk1.7.0_04文件夾用最高權限復制到/usr/lib/jvm目錄里sudo cp -r ~/jdk1.7.0_04/ /usr/lib/jvm/Step2:# 配置環境變量sudo gedit ~/.prof…

leetcode 697. 數組的度(hashmap)

給定一個非空且只包含非負數的整數數組 nums&#xff0c;數組的度的定義是指數組里任一元素出現頻數的最大值。 你的任務是在 nums 中找到與 nums 擁有相同大小的度的最短連續子數組&#xff0c;返回其長度。 示例 1&#xff1a; 輸入&#xff1a;[1, 2, 2, 3, 1] 輸出&…