數據庫備份策略 分布式
COVID-19 has forced nearly every organization to adapt to a new workforce reality: distributed teams. We share four key tactics for turning your remote data team into a force multiplier for your entire company.
COVID-19迫使幾乎每個組織都適應新的勞動力現實:分散的團隊。 我們分享了四個關鍵策略,可將您的遠程數據團隊變成整個公司的力量倍增器。
It’s month 6 (or is it 72? It’s hard to tell) of the global pandemic, and despite the short commute from your bedroom to the kitchen table, you’re still adjusting to this new normal.
現在是全球大流行的第6個月(或者是72歲?這很難說),盡管從臥室到廚房的通勤時間很短,但您仍在適應這一新常態。
Your team is responsible for all the same tasks (handling ad-hoc queries, fixing broken pipelines, implementing new rules and logic, etc.), but troubleshooting broken data has only gotten harder. It’s difficult enough to identify the root cause of a data downtime incident when you’re all 5 feet away from each other; it’s 10 times harder when you’re working on different time zones.
您的團隊負責所有相同的任務(處理臨時查詢,修復損壞的管道,實現新規則和邏輯等),但是對損壞的數據進行故障排除只會變得更加困難。 當您彼此相距5英尺時,要確定數據停機事件的根本原因已經非常困難。 當您在不同時區工作時,難度會增加10倍。
Distributed teams aren’t novel, in fact, they’ve become increasingly common over the last few decades, but working during a pandemic is new for everyone. While this shift widens the geographic talent pool, collaborating at this scale entails unforeseen hurdles, particularly when it comes to working with real-time data.
分布式團隊并不是什么新奇的事物,事實上,在過去的幾十年里它們已經變得越來越普遍,但是在大流行期間工作對于每個人來說都是新事物。 盡管這種轉變擴大了地理人才庫,但這種規模的協作帶來了不可預見的障礙,尤其是在處理實時數據時。
Your daily standup only gets you so far.
每天的站起來只會讓您走得那么遠。
Here are 4 essential steps to managing a great distributed data team:
以下是管理一個出色的分布式數據團隊的4個基本步驟:
記錄所有東西 (Document all the things)
Information about which tables and columns are “good or bad” breaks down when teams are distributed. One data scientist we spoke with at a leading e-commerce company told us that it takes 9 months of working on a team to develop a spidey-sense for what data lives where, which tables are the ‘right’ ones, and which columns are healthy vs. experimental.
分配團隊時,有關哪些表和列是“好是壞”的信息會分解。 我們在一家領先的電子商務公司與之交談的一位數據科學家告訴我們,一個團隊需要花9個月的時間開發出針對數據存放在何處,哪些表是“正確的”表,哪些列是什么的間諜意識。健康與實驗。
The answer? Consider investing in a data catalog or lineage solution. Such technologies provide one source of truth about a team’s data assets, and make it easy to understand formatting and style guidelines for data input. Data catalogs become particularly important when data governance and compliance come into play, which is top of mind for data teams in financial services, healthcare, and many other industries.
答案? 考慮投資數據目錄或沿襲解決方案 。 此類技術提供了有關團隊數據資產的一個真實來源,并易于理解數據輸入的格式和樣式準則。 當數據治理和合規性發揮作用時,數據目錄就變得尤為重要,這對于金融服務,醫療保健和許多其他行業的數據團隊而言,是最重要的。
設置數據的SLA和SLO (Set SLAs and SLOs for data)
It’s important to ensure alignment not just among data team members but with data consumers (i.e., marketing, executives, or operations teams), too. To do so, we suggest taking a page out of the site reliability engineering book and setting and align clear service level agreements (SLAs) and service level objectives (SLOs) for data. SLAs for expectations around data freshness, volume, and distribution, as well as other pillars of observability, will be crucial here.
重要的是,不僅要確保數據團隊成員之間的一致性,而且還要確保與數據消費者(即市場,執行人員或運營團隊)的一致性。 為此,我們建議從站點可靠性工程手冊中抽出一頁,并為數據設置并調整明確的服務水平協議(SLA)和服務水平目標(SLO)。 關于數據新鮮度,數據量和分布以及其他可觀察性Struts的 SLA在這里至關重要。
Katie Bauer, a Data Science Manager at Reddit, suggests distributed data teams maintain a central document with expected delivery dates for important projects, and review that document weekly.
Reddit的數據科學經理Katie Bauer建議分布式數據團隊維護一個中心文檔,其中包含重要項目的預計交付日期,并每周審查該文檔。
“Instead of pinging my team for updates throughout the week when questions arise from stakeholders, I can easily visit this document for answers,” she said. “This keeps us focused on delivering our work and avoids unnecessary diversions.”
她說:“當利益相關者提出問題時,我不必整周對我的團隊進行更新,而是可以輕松訪問此文檔以獲取答案,”她說。 “這使我們專注于交付工作,避免了不必要的轉移。”
投資自助工具 (Invest in self-serve tooling)
Investing in self-serve data tools (including cloud warehouses like Snowflake and Redshift, as well as data analytics solutions, like Mode, Tableau, and Looker) will streamline data democratization no matter the location or persona of the data user.
投資自助數據工具(包括Snowflake和Redshift之類的云倉庫,以及Mode,Tableau和Looker之類的數據分析解決方案)將簡化數據民主化,無論數據用戶的位置或角色如何。
Similarly, self-serve versioning control systems helps everyone stay on the same page when it comes to collaborating on larger workflows, which becomes extremely important when it comes to leveraging real-time data across time zones.
同樣,自助式版本控制系統可以幫助每個人在較大的工作流程上保持一致,這在跨時區利用實時數據時顯得尤為重要。
優先考慮數據可靠性 (Prioritize data reliability)
Industries that are responsible for managing PII and other sensitive customer information, like healthcare and financial services, have a low tolerance for mistakes. Data teams need confidence that data is secure and accurate across their pipeline, from consumption to output. The right processes and procedures around data reliability can prevent such data downtime incidents and restore trust in your data.
醫療保健和金融服務等負責管理PII和其他敏感客戶信息的行業對錯誤的容忍度較低。 數據團隊需要信心,確保從消費到輸出的整個管道中的數據都是安全和準確的。 圍繞數據可靠性的正確流程和步驟可以防止此類數據停機事件并恢復對數據的信任。
For many years, data quality monitoring was the primary way in which data teams caught broken data, but this isn’t cutting it anymore, particularly when real-time data and distributed teams are the norm. Our remote-first world calls for a more comprehensive solution that can seamlessly track the five pillars of data observability and other important data health metrics tailored to the needs of your organization.
多年來,數據質量監視是數據團隊捕獲損壞的數據的主要方式,但是這種情況已不再減少,尤其是在實時數據和分布式團隊成為常態的情況下。 我們的遠程第一世界需要一個更全面的解決方案,該解決方案可以無縫地跟蹤數據可觀察性的五個Struts以及適合組織需求的其他重要數據健康指標。
記住:沒事也可以 (Remember: it’s OK to not be OK)
We hope these tips help you accept and even embrace the data world’s new normal.
我們希望這些技巧可以幫助您接受甚至接受數據世界的新常態。
On top of this more tactical advice, however, it never hurts to remember that it’s OK to not be OK. Emilie Schario, GitLab’s first data analyst who is now an internal strategy consultant, put it best: “This is not normal remote work. What it takes to be successful during a period of forced remote work in a global pandemic is different from what it means to be remote-as-usual.”
但是,除了這個更具戰術性的建議外,記住“ 不行是可以的”也從未有過任何傷害。 GitLab的第一位數據分析師Emilie Schario現已成為內部戰略顧問,他最好地指出:“這不是正常的遠程工作。 在全球大流行中被迫進行遠程工作期間要取得成功所需要的與不同于通常進行遠程管理意味著什么。”
We’d love to hear your advice for leading distributed teams! Reach out to Barr Moses with your words of wisdom.
我們很想聽聽您對領先的分布式團隊的建議! 用您的智慧之言與 Barr Moses 接觸 。
This article was written by Will Robins & Barr Moses.
本文由威爾·羅賓斯和巴爾·摩西撰寫。
翻譯自: https://towardsdatascience.com/4-essential-tactics-for-managing-a-great-distributed-data-team-e7df9f85e6fa
數據庫備份策略 分布式
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/392486.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/392486.shtml 英文地址,請注明出處:http://en.pswp.cn/news/392486.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!