大數據平臺構建
重點 (Top highlight)
Over the past few years, many companies have embraced data platforms as an effective way to aggregate, handle, and utilize data at scale. Despite the data platform’s rising popularity, however, little literature exists on what it actually takes to successfully build one.
在過去的幾年中,許多公司都將數據平臺視為一種有效的大規模聚合,處理和利用數據的方法。 盡管數據平臺越來越受歡迎,但是,關于成功構建數據平臺實際所需的文獻很少。
Barr Moses, CEO & co-founder of Monte Carlo, and Atul Gupte, former Product Manager for Uber’s Data Platform Team, share advice for designing a data platform that will maximize the value and impact of data on your organization.
蒙特卡洛(Monte Carlo) 首席執行官兼聯合創始人 Barr Moses 和 Uber數據平臺團隊 前產品經理 Atul Gupte 分享了有關設計數據平臺的建議,以最大程度地提高數據對組織的價值和影響。
Your company likes data. A lot. Your boss requested additional headcount this year to beef up your data engineering team (Presto and Kafka and Hadoop, oh my!). Your VP of Data is constantly lurking in your company’s Eng-Team Slack channel to see “how people feel” about migrating to Snowflake. Your CEO even wants to become data-driven, whatever that means. To say that data is a priority for your company would be an understatement.
您的公司喜歡數據。 很多。 您的老板今年要求增加人員,以增強您的數據工程團隊(Presto,Kafka和Hadoop,我的天哪!)。 您的數據副總裁一直潛伏在公司的Eng-Team Slack頻道中,以了解人們對遷移到Snowflake的 “感覺”。 您的CEO甚至想成為數據驅動型,無論這意味著什么。 要說數據是貴公司的優先事項,那是輕描淡寫。
To satisfy your company’s insatiable appetite for data, you may even be building a complex, multi-layered data ecosystem: in other words, a data platform.
為了滿足公司對數據的無限需求,您甚至可能正在構建一個復雜的多層數據生態系統:換句話說,就是一個數據平臺 。
At its core, a data platform is a central repository for all data, handling the collection, cleansing, transformation, and application of data to generate business insights. For most organizations, building a data platform is no longer a nice-to-have but a necessity, with many businesses distinguishing themselves from the competition based on their ability to glean actionable insights from their data, whether to improve the customer experience, increase revenue, or even define their brand.
數據平臺的核心是所有數據的中央存儲庫,用于處理數據的收集,清理,轉換和應用以產生業務見解。 對于大多數組織而言,構建數據平臺已不再是一個好主意 ,而是必不可少的 ,因為許多企業基于從數據中收集可行的見解,是否改善客戶體驗,增加收入的能力,將自己與競爭對手區分開來。 ,甚至定義自己的品牌。
Much in the same way that many view data itself as a product, data-first companies like Uber, LinkedIn, and Facebook increasingly view data platforms as “products,” too, with dedicated engineering, product, and operational teams. Despite their ubiquity and popularity, however, data platforms are often spun up with little foresight into who is using them, how they’re being used, and what engineers and product managers can do to optimize these experiences.
就像許多人將數據本身視為產品一樣, Uber , LinkedIn和Facebook等數據優先公司也越來越多地將數據平臺視為“產品”,并擁有專門的工程,產品和運營團隊。 盡管數據平臺無處不在且很受歡迎,但是它們常常毫無預見性地演變為誰在使用它們,如何使用它們以及工程師和產品經理可以做什么以優化這些體驗。
Whether you’re just getting started or are in the process of scaling one, we share five best practices for avoiding these common pitfalls and building the data platform of your dreams:
無論您是剛剛起步還是正在擴展一個,我們都會分享五種最佳實踐,以避免這些常見的陷阱并構建您夢想中的數據平臺:
使您的產品目標與業務目標保持一致 (Align your product’s goals with the goals of the business)
For several decades, data platforms were viewed as a means to an end versus “the end,” as in, the core product you’re building. In fact, although data platforms powered many services, fueling rich insights to the applications that power our lives, they weren’t given the respect and attention they truly deserve until very recently.
幾十年來,數據平臺一直被視為實現目標而不是“終結”的手段,就像您正在構建的核心產品一樣。 實際上,盡管數據平臺為許多服務提供了支持,并為支持我們生活的應用程序提供了豐富的見識,但直到最近,他們才真正得到應有的重視和關注。
When you’re building or scaling your data platform, the first question you should ask is: how does data map to your company’s goals?
在構建或擴展數據平臺時,您應該問的第一個問題是: 數據如何映射到公司的目標?
To answer this question, you have to put on your data platform product manager hat. Unlike specific product managers, a data platform product manager must understand the big picture versus area-specific goals since data feeds into the needs of every other functional team, from marketing and recruiting to business development and sales.
要回答這個問題,您必須戴上數據平臺產品經理的帽子。 與特定的產品經理不同, 數據平臺的產品經理必須了解全局和特定區域的目標,因為數據會滿足從營銷和招聘到業務開發和銷售的每個其他職能團隊的需求 。
For instance, if your business’s goal is to increase revenue (go big or go home!), how does data help you achieve these goals? For the sake of this experiment, consider the following questions:
例如,如果您的企業目標是增加收入(變大或回家!),那么數據如何幫助您實現這些目標? 為了進行此實驗,請考慮以下問題:
- What services or products drive revenue growth? 哪些服務或產品推動收入增長?
- What data do these services or products collect? 這些服務或產品收集什么數據?
- What do we need to do with the data before we can use it? 在使用數據之前,我們需要對數據做什么?
- Which teams need this data? What will they do with it? 哪些團隊需要此數據? 他們將如何處理?
- Who will have access to this data or the analytics it generates? 誰將有權訪問此數據或其生成的分析?
- How quickly do these users need access to this data? 這些用戶需要多長時間才能訪問此數據?
- What, if any, compliance or governance checks does the platform need to address? 平臺需要解決哪些(如果有)合規性或治理檢查?
By answering these questions, you’ll have a better understanding of how to prioritize your product roadmap, as well as who you need to build for (often, the engineers) versus design for (the day-to-day platform users, including analysts). Moreover, this holistic approach to KPI development and execution strategy sets your platform up for a more scalable impact across teams.
通過回答這些問題,您將更好地了解如何確定產品路線圖的優先級,以及為(通常是工程師)為誰構建的(而不是針對(包括平臺的)日常平臺用戶的設計) )。 而且,這種用于KPI開發和執行策略的整體方法為平臺建立了跨團隊的更具可擴展性的影響。
獲得正確的利益相關者的反饋和支持 (Gain feedback and buy-in from the right stakeholders)
It goes without saying that receiving both buy-in upfront and iterative feedback throughout the product development process are necessary components of the data platform journey. What isn’t as widely understood is whose voice you should care about.
毋庸置疑,在整個產品開發過程中,既要獲得預購的支持,又要獲得迭代式反饋,這是數據平臺之旅的必要組成部分。 尚未廣為人知的是您應該關注誰的聲音。
Yes, you need the ultimate sign-off from your CTO or VP of Data on the finished product, but their decisions are often informed by their trusted advisors: staff engineers, technical program managers, and other day-to-day data practitioners.
是的,您需要最終產品的CTO或數據副總裁的最終批準,但他們的決定通常是由其值得信賴的顧問(員工工程師,技術程序經理和其他日常數據從業人員)告知的。
While developing a new data cataloging system for her company, one product manager we spoke with at a leading transportation company spent 3 months trying to sell her VP of Engineering on her team’s idea, only to be shut down in a single email by his chief-of-staff.
在為公司開發新的數據分類系統時,我們在一家領先的運輸公司與一位產品經理進行了交流,他們花了3個月的時間試圖根據她的團隊的想法出售她的工程副總裁,但隨后被他的首席執行官以一封電子郵件關閉了,工作人員。
Consider different tactics based on the DNA of your company. We suggest following these three concurrent steps:
根據您公司的DNA考慮不同的策略。 我們建議遵循以下三個并行步驟:
- Sell leadership on the vision. 領導愿景。
- Sell the brass tacks and day-to-day use case on your actual users. 向實際用戶出售銅釘和日常用例。
Apply a customer-centric approach, no matter who you’re talking to. Position the platform as a means of empowering different types of personas in your data ecosystem, including both your data team (data engineers, data scientists, analysts, and researchers) and data consumers (program managers, executives, business development, and sales, to name a few categories). A great data platform will enable the technical users to do their work easily and efficiently, while also allowing less technical personas to leverage rich insights or put together visualizations based on data without much assistance from engineers and analysts.
無論您與誰聊天,都應以客戶為中心 。 將平臺定位為增強數據生態系統中不同類型角色的一種手段,包括您的數據團隊(數據工程師,數據科學家,分析師和研究人員)和數據消費者(程序經理,主管,業務開發和銷售),列舉幾個類別)。 出色的數據平臺將使技術用戶能夠輕松高效地完成工作,同時還允許較少的技術人員利用豐富的見解或基于數據將可視化結果整合在一起,而無需工程師和分析師的大力支持。
At the end of the day, it’s important that this experience nurtures a community of data enthusiasts that build, share, and learn together. Since your platform has the potential to serve the entire company, everyone should feel invested in its success, even if that means making some compromises along the way.
歸根結底,重要的是,這種體驗應養育一群數據愛好者,他們可以一起建立,共享和學習。 由于您的平臺有潛力服務于整個公司,因此每個人都應該為自己的成功而投入,即使這意味著在此過程中做出一些妥協。
優先考慮長期增長和可持續性與短期收益 (Prioritize long-term growth and sustainability vs. short-term gains)
Unlike other types of products, data platforms are not successful simply because they benefit “first-to-market.” Since data platforms are almost exclusively internal tools, we’ve found that the best data platforms are built with sustainability in mind versus feature-specific wins.
與其他類型的產品不同,數據平臺之所以不能成功,不僅僅是因為它們有益于“首創”。 由于數據平臺幾乎完全是內部工具,因此我們發現,構建最佳數據平臺時要考慮到可持續性與特定功能的優勢。
Remember: your customer is your company, and your company’s success is your success. This is not to say that your roadmap won’t change several times over (it will), but when you do make changes, do it with growth and maturation in mind.
請記住:您的客戶就是您的公司,而公司的成功就是您的成功。 這并不是說您的路線圖不會多次改變(它會改變),但是當您進行更改時,請牢記增長和成熟度。
For instance, Uber’s big data platform was built over the course of five years, constantly evolving with the needs of the business; Pinterest has gone through several iterations of their core data analytics product; and leading the pack, LinkedIn has been building and iterating on its data platform since 2008!
例如, 優步(Uber)的大數據平臺是在過去的五年中建立的,并隨著業務需求不斷發展。 Pinterest已經對其核心數據分析產品進行了多次迭代。 從2008年開始, LinkedIn就一直在其數據平臺上進行構建和迭代!
Our suggestion: choose solutions that make sense in the context of your organization, and align your plan with these expectations and deadlines. Sometimes, quick wins as part of a larger product development strategy can help with achieving internal buy-in — as long as it’s not shortsighted. Rome wasn’t built in a day, and neither was your data platform.
我們的建議: 選擇在您的組織范圍內有意義的解決方案,并使您的計劃與這些期望和最后期限保持一致。 有時,只要不是短視的話,將快速獲勝作為更大的產品開發策略的一部分可以幫助實現內部認可。 羅馬不是一天建成的,您的數據平臺也不是一天。
簽署數據的基準指標及其測量方式 (Sign-off on baseline metrics for your data and how you measure it)
It doesn’t matter how great your data platform is if you can’t trust your data, but data quality means different things to different stakeholders. Consequently, your data platform won’t be successful if you and your stakeholders aren’t aligned on this definition.
如果您不信任數據,則數據平臺的強大程度并不重要,但是數據質量對于不同的利益相關者而言意味著不同的事情。 因此,如果您和您的利益相關者對此定義不統一,則您的數據平臺將不會成功。
To address this, it’s important to set baseline expectations for your data reliability, in other words, your organization’s ability to deliver high data availability and health throughout the entire data life cycle. Setting clear Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for software application reliability is a no-brainer. Data teams should do the same for their data pipelines.
為了解決這個問題,重要的是為數據可靠性設定基線期望,換句話說,就是組織在整個數據生命周期中提供高數據可用性和運行狀況的能力。 為軟件應用程序的可靠性設置明確的服務水平目標(SLO)和服務水平指標(SLI)并非難事。 數據團隊應該對他們的數據管道做同樣的事情。
This isn’t to say that different stakeholders will have the same vision for what “good data” looks like; in fact, they probably won’t, and that’s OK. Instead of fitting square pegs into round holes, it’s important to create a baseline metric of data reliability and, as with building a new platform feature, gain sign-off on the lowest common denominator.
這并不是說不同的利益相關者對“好數據”的外觀會有相同的看法; 實際上,他們可能不會,那就可以了。 與將方形釘插入圓Kong中不同,重要的是創建數據可靠性的基準度量標準,并且與構建新的平臺功能一樣,獲得最低公分母上的簽字。
We suggest choosing a novel measurement (like this one for data downtime) that will help data practitioners across the company align on baseline quality metrics.
我們建議選擇一種新穎的度量標準( 例如用于數據停機的度量標準),以幫助整個公司的數據從業人員調整基準質量指標。
知道何時建造與購買 (Know when to build vs. buy)
One of the first decisions you have to make is whether or not to build the platform from scratch or purchase the technology (or several supporting technologies) from a vendor.
您首先要做出的決定之一是是否從頭開始構建平臺或從供應商那里購買技術(或幾種支持技術)。
While companies like — you guessed it — Uber, LinkedIn, and Facebook have opted to build their own data platforms, often on top of open source solutions, it doesn’t always make sense for your needs. While there isn’t a magic formula that will tell you whether to build vs. buy, we’ve found that there is value in buying until you’re convinced that:
盡管您猜對了,但Uber,LinkedIn和Facebook這樣的公司通常選擇在開源解決方案之上構建自己的數據平臺,但這并不總是符合您的需求。 雖然沒有一個神奇的公式可以告訴您是建造還是購買,但我們發現購買是有價值的,直到您確信:
- The product needs to operate using sensitive/classified information (e.g., financial or health records) that cannot be shared with external vendors for regulatory reasons 產品需要使用出于監管原因無法與外部供應商共享的敏感/分類信息(例如財務或健康記錄)進行操作
- Specific customizations are required for it to work well with other internal tools/systems 為了使其與其他內部工具/系統良好配合,需要進行特定的自定義
- These customizations are niche enough that a vendor may not prioritize them 這些自定義項非常利基,因此供應商可能不會優先考慮它們
- There is some other strategic value to building vs. buying (i.e., competitive advantage for the business or beneficial for hiring talent) 建立與購買之間還有其他一些戰略價值(例如,企業的競爭優勢或人才的聘用優勢)
One VP of Data Engineering at a healthcare startup we spoke with noted that if he was in his 20s, he would have wanted to build. But now, in his late 30s, he would almost exclusively buy.
我們與之交談的一家醫療保健初創公司的數據工程副總裁指出,如果他20多歲,他本來想建造。 但是現在,在他30多歲的時候,他幾乎會獨家購買。
“I get the enthusiasm,” he says, “But I’ll be darned if I have the time, energy, and resources to build a data platform from scratch. I’m older and wiser now — I know better than to NOT trust the experts.”
他說:“我充滿熱情,但是如果我有時間,精力和資源從頭開始構建數據平臺,我會感到驚訝。 我現在年紀大了,也比較聰明-我比不信任專家更了解。”
When it comes to where you could be spending your time — and more importantly, money — it often makes more sense to buy a tried and true solution with a dedicated team to help you solve any issues that arise.
說到您可能會花費時間的地方-更重要的是,省錢-在專門的團隊那里購買經過實踐檢驗的真實解決方案來幫助您解決出現的任何問題通常更有意義。
下一步是什么? (What’s next?)
Building your data platform as a product will help you ensure greater consensus around data priorities, standardize on data quality and other key KPIs, foster greater collaboration, and, as a result, bring unprecedented value to your company.
將數據平臺構建為產品將幫助您確保就數據優先級達成更大的共識,標準化數據質量和其他關鍵KPI,促進更好的協作,從而為您的公司帶來空前的價值。
In addition to serving as a vehicle for effective data management, reliability, and democratization, the benefits of building a data platform as a product include:
除了充當有效數據管理,可靠性和民主化的手段外,構建數據平臺產品的好處還包括:
- Guiding sales efforts (giving you insights on where to focus your efforts based on how prospective customers are responding) 指導銷售工作(根據潛在客戶的React為您提供工作重點的見解)
- Driving application product road maps 駕駛應用產品路線圖
- Improving the customer experience (helps teams learn what your service pain points are, what’s working, and what’s not) 改善客戶體驗(幫助團隊了解您的服務難題是什么,什么在起作用以及什么不起作用)
- Standardizing data governance and compliance measures across the company (GDPR, CCPA, etc.) 標準化整個公司的數據治理和合規措施(GDPR,CCPA等)
Building a data platform might seem overwhelming at first blush, but with the right approach, your solution has the potential to become a force multiplier for your entire organization.
乍一看,構建數據平臺似乎不堪重負,但是采用正確的方法,您的解決方案就有可能成為整個組織的力量倍增器。
Want to learn more about building a reliable data platform? Reach out to Barr Moses and the Monte Carlo Team.
想更多地了解構建可靠的數據平臺嗎? 接觸 Barr Moses 和蒙特卡洛團隊。
This article was co-written by Barr Moses and Atul Gupte.
本文由Barr Moses和Atul Gupte共同撰寫。
翻譯自: https://towardsdatascience.com/how-to-build-your-data-platform-like-a-product-6677e8abe318
大數據平臺構建
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/391249.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/391249.shtml 英文地址,請注明出處:http://en.pswp.cn/news/391249.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!