趣味數據故事
Meet Julia. She’s a data engineer. Julia is responsible for ensuring that your data warehouses and lakes don’t turn into data swamps, and that, generally speaking, your data pipelines are in good working order.
中號 EETJulia。 她是一名數據工程師。 Julia負責確保您的數據倉庫和湖泊不會變成數據沼澤,并且通常來說,您的數據管道運行狀況良好。

Julia is happy when nothing breaks, but like any good engineer, she knows that this is near-to impossible. So, she just wants to be the first to know when issues do arise so that she can solve them.
當一切都沒有中斷時,茱莉亞很高興,但是像任何優秀的工程師一樣,她知道這幾乎是不可能的。 因此,她只想成為第一個知道問題何時發生的人,以便她可以解決問題。

Meet Ted. He’s a data analyst. Ted is known by his company as the “SQL King” because he’s the go-to query wrangler for their Marketing, Customer Support, and Operations teams. He’s an expert in Tableau, and knows all the Excel hacks. Ted is also happy when nothing breaks, and like Julia, knows that this is impossible. However, Ted doesn’t want bad data to ruin his analytics, making his life and the lives of his stakeholders miserable (more on that later).
認識特德。 他是一名數據分析師。 Ted被他的公司稱為“ SQL King”,因為他是其市場營銷,客戶支持和運營團隊的首選查詢管理員。 他是Tableau的專家,并且了解所有Excel技巧。 當一切都沒有中斷時,Ted也很高興,并且像Julia一樣,知道這是不可能的。 但是,Ted不想讓不良數據破壞他的分析,從而使他的生活和利益相關者的生活變得痛苦不堪(稍后再詳述)。

Meet Alex. Alex is a data consumer. She might be a data scientist, a product manager, a VP of Marketing, or even your CEO. Alex uses data to make smarter decisions, whether that’s what the title of her new product should be or which pair of lucky socks she should wear to tomorrow’s board meeting.
認識亞歷克斯。 Alex是數據消費者。 她可能是數據科學家,產品經理,營銷副總裁,甚至是您的CEO。 亞歷克斯使用數據做出更明智的決策,無論這是她的新產品的名稱,還是她應該在明天的董事會會議上穿的那雙幸運襪子。
Alex, or anyone else at the company for that matter, can’t do their job if they can’t trust their data. We call this phenomena data downtime. Data downtime refers to periods of time where your data is inaccurate, missing, or otherwise erroneous, and spares no one, sort of like death and taxes. Unlike death and taxes, however, data downtime can be easily avoided if acted on immediately.
亞歷克斯(Alex)或公司中與此有關的任何其他人,如果他們不信任自己的數據,就無法完成他們的工作。 我們稱這種現象為數據停機時間。 數據停機時間是指您的數據不準確,丟失或以其他方式錯誤并且不遺余力的時間段,類似于死亡和稅收。 但是,與死亡和稅收不同,如果立即采取行動,可以輕松避免數據停機。

When raw data is consumed by your data pipeline, it’s abstract and meaningless on its own. It doesn’t really matter if there’s data downtime because no one is using it quite yet — other than Julia, to pass it on. The problem is, she doesn’t always know if data is broken.
當原始數據被數據管道消耗時,它本身就是抽象的且毫無意義。 是否存在數據停機時間并不重要,因為除了Julia之外,沒有人正在使用它來傳遞數據。 問題是,她并不總是知道數據是否損壞。

As data moves through the pipeline, it becomes more concrete. Once it reaches the company’s business intelligence tools, Ted can start using it, transforming what was formerly vague and abstract into Excel spreadsheets, Tableau dashboards, and other beautiful vessels of knowledge.
隨著數據在管道中移動,它變得更加具體。 一旦它到達公司的商業智能工具,Ted就可以開始使用它,將以前模糊和抽象的內容轉換為Excel電子表格,Tableau儀表板和其他精美的知識工具。
Ted can then transform this data (now nearing full maturity) into actionable insights for the rest of his company. Now, Alex can create marketing collateral and PDFs and customer decks with this data, which is polished and concrete and bound to save the world. Or is it?
然后,Ted可以將這些數據(現在已經接近完全成熟)轉換為他的公司其余部分的可行見解。 現在,Alex可以使用這些數據創建營銷資料,PDF和客戶資料,這些數據經過精心處理和具體化,必將拯救世界。 還是?

As data errors move down the pipeline, the severity of data downtime increases. There are more and more Teds and Alexs using the data, many of whom have no idea if what they’re looking at is right, wrong, or somewhere in between until it’s too late.
隨著數據錯誤沿流水線向下移動,數據停機的嚴重性增加。 越來越多的Teds和Alexs使用這些數據,其中許多人不知道自己所看的內容是對,錯還是介于兩者之間,直到為時已晚。
When is too late, you might ask?
什么時候來不及,您可能會問?
Too late is when Julia is paged at 3 a.m. Monday morning by Ted who was called by Alex, his skip-level manager and the VP of Sales, only a few minutes before about a wonky report he was supposed to present the next morning to their CEO. Too late is when you’ve wasted time, lost revenue, and eroded Alex — and everyone else’s — precious trust.
太遲了,當周一早上3點,Julia(Julia)被特德(Ted)傳呼時,特德(Ted)由他的跳級經理兼銷售副總裁亞歷克斯(Alex)召集,而幾分鐘前,他就應該在第二天早上向他們呈報一個奇怪的報告CEO。 浪費時間,失去收入,侵蝕亞歷克斯(Alex)和其他所有人的寶貴信任已經為時已晚。

The more concrete and further removed the data gets from Julia’s raw tables, the more severe the impact. We refer to this as the cone of data anxiety.
從Julia的原始表中獲取的數據越具體,越深入,影響就越嚴重。 我們將此稱為數據焦慮癥 。
Disaster struck and Julia had no idea why, let alone that it had happened. If only she had caught the data downtime immediately — right when it hit — instead of through Alex and her other data consumers (down the cone of anxiety), disaster could have been avoided.
災難來了,Julia不知道為什么,更不用說發生了。 如果只有她立即(在命中時)捕獲了數據停機,而不是通過Alex和她的其他數據消費者(在焦慮中),可以避免災難。
Worst of all, she was in the middle of a once-in-a-lifetime dream. Cotton candy clouds, chocolate fountain waterfalls, and no null values. The complete opposite of the reality she was facing at 3 a.m. on Monday morning.
最糟糕的是,她處于千載難逢的夢想之中。 棉花糖云,巧克力噴泉瀑布,并且沒有空值。 星期一早上3點,她所面對的現實完全相反。
Sounds familiar? Yeah, I’m with you.
聽起來很熟悉? 是的,我和你在一起。
If data downtime is something you’ve experienced, we’d love to hear from you! Reach out to Barr with your own good tales of bad data.
如果您遇到數據宕機的情況,我們將很高興收到您的來信! 伸出 巴爾壞數據的自己的好故事。
This article was co-written by Barr Moses & Martín Alonso Lago.
本文由 Barr Moses 和 MartínAlonso Lago 共同撰寫 。
翻譯自: https://towardsdatascience.com/good-tales-of-bad-data-91eccc29cbc5
趣味數據故事
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388129.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388129.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388129.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!