3c技能和背包需要改建嗎?
by Mahdi Karabiben
通過Mahdi Karabiben
認為您需要儀表板? 您應該改建一個筆記本。 (Think you need a Dashboard? You should build a Notebook instead.)
After first establishing themselves as a key component of the standard Business Intelligence model during the first years of the millennium, dashboards were rapidly adopted by most companies as the go-to tool to present data-driven insights and indicators.
在千禧年的最初幾年中,儀表板首先成為標準商業智能模型的關鍵組成部分之后,儀表板被大多數公司Swift采用,成為呈現數據驅動的見解和指標的必備工具。
When Hadoop was introduced afterwards in 2007, its launch was followed by a set of Big Data technologies that radically changed how things are done behind the curtains. They allowed parallelism on a previously unimaginable scale. These changes were, for a long period, limited to data storage and data processing. Changing the way the end users accessed data felt like an unnecessary step, because dashboards were still doing a fine job.
Hadoop于2007年推出之后,其發布之后是一系列大數據技術,這些技術從根本上改變了幕后工作方式。 他們允許以前所未有的規模進行并行處理。 長期以來,這些更改僅限于數據存儲和數據處理。 改變最終用戶訪問數據的方式似乎是不必要的步驟,因為儀表板仍然做得很好。
In a Big Data era that completely changed how companies process their data, dashboards managed to remain the de facto standard for making sense of the mind-boggling amounts of data being produced on a daily basis. Most companies offering dashboarding solutions rapidly adapted their products to Big Data technologies. They also offered connectors that allowed dashboards to remain the undisputed go-to tool when it comes to understanding data.
在徹底改變公司處理數據方式的大數據時代,儀表板設法保持了事實上的標準,以使每天產生的數據量令人難以置信。 提供儀表盤解決方案的大多數公司都將其產品快速適應了大數據技術。 他們還提供了連接器,使儀表板在理解數據時仍然是無可爭議的首選工具。
But with continuous changes and improvements to the standard Big Data technologies happening at a staggering pace, maybe it’s time to update the Big Data User Experience?
但是隨著標準大數據技術的不斷變化和改進以驚人的速度發生,也許是時候更新大數據用戶體驗了嗎?
儀表板的問題:您總是落后一步 (The problem with dashboards: you’re always one step behind)
When they started being integrated into technology stacks at the turn of the century, dashboards answered to a clear and coherent need: presenting KPIs and data-driven insights that offer answers to established questions. They were the portal to the company’s data, and allowed people with multiple roles and needs to understand what the data has to say. In essence, dashboards were first introduced to democratize data discovery.
當它們在世紀之交開始被集成到技術棧中時,儀表板就滿足了明確而協調的需求:提供KPI和數據驅動的見解,從而為既定問題提供答案。 它們是公司數據的門戶,并允許具有多種角色和需求的人員了解數據的含義。 本質上,首先引入了儀表板以使數據發現民主化。
But at the turn of the century, data flows were very structured, the data didn’t have that much to say, and the range of questions to ask it was limited.
但是在世紀之交,數據流已經非常結構化,數據沒有太多話要說,要問的問題范圍也很有限。
That no longer is the case. With the exponential growth of the data being produced daily, the value of this new black gold reaches new highs every day. The volumes of data available for exploitation in this Big Data era don’t just offer answers to a specific set of questions. They offer you questions you still haven’t thought about asking yet. This led to the rise of data exploration, with data scientists trying to extract as much value from data as possible.
情況不再如此。 隨著每天生成的數據呈指數增長,這種新的黑金的價值每天都達到新的高點。 在這個大數據時代,可用于開發的數據量不僅為特定問題提供了答案。 他們向您提供您尚未想到的問題。 這導致數據探索的興起,數據科學家試圖從數據中提取盡可能多的價值。
Relying on dashboards to visualize and extract value from your data means that you have to use another technology (usually notebooks) to explore it and decide what gets to be accessible through your dashboards. Such a mechanism means that the dashboard comes always at a second phase of extracting value from data. In this era where the amounts of data available allow for an infinite number of possibilities when it comes to data exploration, no dashboard could be enough to extract all of the value your data offers.
依靠儀表板來可視化數據并從數據中提取價值意味著您必須使用另一種技術(通常是筆記本電腦 )來探索它并確定可通過儀表板訪問的內容。 這種機制意味著儀表板始終處于從數據提取價值的第二階段。 在這個時代,可用數據量為數據探索提供了無限的可能性,沒有任何儀表板足以提取數據所提供的所有價值。
Working with this two-step mechanism means that collaboration between different roles remains limited. This is because the data architectures become too complex due to the number of technologies used by the different data specialists.
使用此兩步機制意味著不同角色之間的協作仍然受到限制。 這是因為由于不同數據專家使用的技術數量眾多,因此數據體系結構變得過于復雜。
This chain of people using different technologies for different needs means that in order to add certain insights to a dashboard, a data analyst needs to wait for a data scientist to work on the data via a notebook. In turn the data scientist may need to wait for a data engineer to offer the data in a certain structure through a script. And remember — throughout this whole time-consuming process, the value of the data keeps decreasing.
使用不同技術滿足不同需求的人員鏈意味著,為了向儀表板添加某些見解,數據分析師需要等待數據科學家通過筆記本來處理數據。 反過來,數據科學家可能需要等待數據工程師通過腳本以某種結構提供數據。 請記住,在整個耗時的過程中,數據的價值一直在下降。
Multiple dashboard-providers have tried to integrate data exploration capabilities within their platforms, with Tableau notably offering an impressive Spark connector that allows you to run Spark SQL jobs directly from your dashboard. Still, the capabilities remain limited and the interactivity is only partial, which leaves the end-user always one step behind.
多個儀表板提供者已嘗試將數據探索功能集成到其平臺中,Tableau尤其提供了令人印象深刻的Spark連接器 ,該連接器使您可以直接從儀表板運行Spark SQL作業。 盡管如此,功能仍然有限,并且交互性僅是部分的,這使最終用戶始終落后一步。
Whether you’re using Kibana, Tableau, or Qlikview, your dashboard can offer valuable insights regarding your data. The problem with such technologies is that they were built with data discovery in mind. And because of that they neglect one key element made possible on a massive scale in this Big Data era: data exploration.
無論您使用的是Kibana,Tableau還是Qlikview,儀表板都可以提供有關數據的寶貴見解。 此類技術的問題在于它們在構建時就考慮了數據發現。 因此,它們忽略了在大數據時代大規模實現的一個關鍵要素: 數據探索 。
As data flows keep growing exponentially, dedicating the main portal to your data merely to insights means that you’re only reading the first page of a very interesting book.
隨著數據流呈指數級增長,將主要門戶數據僅用于洞察力意味著您僅閱讀一本非常有趣的書的第一頁。
筆記本,以及它們如何將交互性提高到一個全新的水平 (Notebooks, and how they take interactivity to a completely new level)
As mentioned above, notebooks have been the standard tool for data exploration for the past few years. Since the release of project Jupyter in 2014, and through the set of functionalities it offered on top of what was already available via IPython, notebooks attracted data scientists as an ideal data exploration tool thanks mainly to one key concept: interactivity.
如上所述,在過去的幾年中,筆記本電腦一直是數據探索的標準工具。 自2014年發布Jupyter項目以來, 筆記本計算機憑借其在IPython已有功能之上提供的功能集,主要由于一個關鍵概念: 交互性 ,吸引了數據科學家作為理想的數據探索工具。
Thanks to kernels (within the Jupyter ecosystem) and interpreters (within Apache Zeppelin), notebooks let you explore your data through a multitude of Big Data processing technologies. They then offer immediate access to the data via built-in visualization modules and output mechanisms. Gathering both of these capabilities into the same tool is the key to using such tool for both data discovery and exploration.
借助內核(在Jupyter生態系統內)和解釋器(在Apache Zeppelin中),筆記本使您可以通過多種大數據處理技術來探索數據。 然后,他們可以通過內置的可視化模塊和輸出機制立即訪問數據。 將這兩種功能整合到同一個工具中,是將此類工具用于數據發現和探索的關鍵。
Notebooks are not only a tool that allows for direct access to data, they do so while maintaining complete interactivity. They completely blur the line that separates data scientists and data analysts and allow people with these two roles to collaborate together seamlessly.
筆記本電腦不僅是一種可以直接訪問數據的工具,而且還可以保持完全的交互性。 它們完全模糊了區分數據科學家和數據分析師的界限,并允許具有這兩個角色的人們無縫地協作。
This works perfectly thanks to the powerful protocol that notebooks rely on and to their main building block, cells (paragraphs in Zeppelin). By offering multiple cell types (for code and text), notebooks allow for efficient collaboration.
由于筆記本電腦所依賴的強大協議及其主要構造單元-細胞(齊柏林飛艇中的段落),它可以完美地工作。 通過提供多種單元格類型(用于代碼和文本),筆記本電腦可實現高效的協作。
To show their efficiency compared to dashboards, let’s go back to the scenario we talked about earlier. In a notebook-based architecture, when a data analyst needs certain insights within a notebook, the data engineer can add a code cell within which they manipulate the data through the adequate data processing technology. Then the data scientist uses this data in another code cell to extract the desired information and offer the output to the data analyst. This all happens without any of these three data specialists leaving the notebook.
為了顯示它們與儀表盤相比的效率,讓我們回到前面討論的場景。 在基于筆記本的體系結構中,當數據分析人員需要在筆記本中提供某些洞察力時,數據工程師可以添加一個代碼單元,在其中通過適當的數據處理技術來操縱數據。 然后,數據科學家在另一個代碼單元中使用此數據來提取所需的信息,并將輸出提供給數據分析人員。 這一切都是在這三位數據專家都沒有離開筆記本的情況下發生的。
In an era where Fast Data is the norm, extracting value from your data through a structured pipeline using different tools for each step is no longer a sustainable pattern. The data that comes through an unstructured real-time data flow may offer valuable insights when used for batch processes. But it offers even more value when it’s progressively analyzed via near-real-time processing and interactive dashboards (i.e. notebooks) that offer complete access to the raw data and sophisticated visualizations.
在以快速數據為準則的時代,通過結構化的管道為每個步驟使用不同的工具從數據中提取價值已不再是可持續的模式。 當用于批處理時,通過非結構化實時數據流獲得的數據可能會提供有價值的見解。 但是,當通過近實時處理和交互式儀表板(即筆記本電腦)進行逐步分析時,它可以提供更大的價值,這些儀表板可以完全訪問原始數據和復雜的可視化效果。
翻譯自: https://www.freecodecamp.org/news/think-you-need-a-dashboard-you-should-build-a-notebook-instead-33104d913f95/
3c技能和背包需要改建嗎?