為什么用scrum_為什么Scrum糟糕于數據科學

為什么用scrum

Scrum is a popular methodology for PM in software engineering and recently the trend has carried over to data science. While the utility of Scrum in standard software engineering may remain up for debate, here I will detail why it has unquestionably no place in data science (and data engineering as well). This is not to say that “Agile” as a whole is bad for data science, but rather that the specific principles of Scrum: sprints, single product owner, scrum master, daily stand-ups (and the litany of other meetings) fit poorly for data science teams and ultimately result in poorer products.

Scrum是軟件工程中PM的一種流行方法，最近這種趨勢一直延續到數據科學中。盡管Scrum在標準軟件工程中的實用性仍有待爭論，但在這里我將詳細說明為什么它在數據科學(以及數據工程)中毫無疑問地沒有地位。這并不是說“敏捷”作為一個整體對數據科學不利，而是說Scrum的具體原則：沖刺，單一產品所有者，Scrum負責人，日常站立式會議(以及其他會議的連網)為數據科學團隊服務，最終導致產品質量下降。

The Sprint/Estimation

沖刺/估計

Scrum prioritizes creating “deliverables” often in two-week sprints. While this might arguably work well for certain areas of software engineering, it fails spectacularly in the data science world. Data Science by its very nature is a scientific process and involves, research, experimentation, and analysis. Data Science projects are very difficult to estimate because many times they are asking the team to do something that hasn’t been done before. While it is true that data scientists may have designed similar models before they likely haven’t leveraged the dataset or utilized the specific technique required. This means that there is a lot of uncertainty in the process. Things like poorer than expected data quality, problems with hyper-parameter tuning, and/or a technique just not working can cause a failure to “deliver” by the end of the sprint. This means that point estimates are often less than worthless as they are based on prior projects that often don’t bear resemblance to the current project in progress.

Scrum優先考慮經常在兩周的沖刺中創建“可交付成果”。盡管這對于軟件工程的某些領域來說可能是行之有效的，但在數據科學領域卻失敗了。數據科學從本質上說是一個科學過程，涉及研究，實驗和分析。數據科學項目很難估算，因為很多時候他們要求團隊做一些以前沒有做過的事情。確實，數據科學家可能在未利用數據集或未使用所需的特定技術之前就已經設計了類似的模型。這意味著該過程存在很多不確定性。諸如數據質量差于預期，超參數調整問題和/或某種技術無法使用等情況可能導致在sprint結束之前無法“交付”。這意味著積分估計往往是不值錢的，因為它們是基于以前的項目而得出的，這些項目通常與正在進行的當前項目不相似。

Moreover, even if data scientists “deliver” the required items for a sprint in many cases they have likely sacrificed code quality, model robustness, or documentation in order to meet the arbitrary end of the sprint. I have often heard management describe positively the benefits of “more urgency” in a two-week sprint. But remember this urgency also has drawbacks chiefly that data scientists are more likely to make mistakes and overlook things.

而且，即使數據科學家在許多情況下“交付”了沖刺所需的物品，他們也可能為了滿足沖刺的任意目的而犧牲了代碼質量，模型魯棒性或文檔編制。我經常聽到管理層在兩周的沖刺中積極地描述了“更加緊迫”的好處。但是請記住，這種緊迫性也主要有弊端，即數據科學家更有可能犯錯誤并忽視事物。

On the opposite end of the spectrum, I’ve seen data scientists who finished their work early, hesitant to pull in new stories out of fear of not being able to complete them by the end of sprint. Therefore, they just sit idly for several days before the next sprint.

另一方面，我看到數據科學家們早日完成了他們的工作，出于擔心無法在沖刺結束時完成它們的考慮而猶豫不決地提出新的故事。因此，在下一次沖刺之前，他們只是閑置幾天。

But couldn’t we break up these big tasks? Proponents of Scrum will argue that the issue here is not Scrum, but just the need to better break up big tasks (likely with additional time consuming grooming sessions). However, even breaking up big tasks does not remove the uncertainty with data science. For instance, a task like train an XGBoost model and report results, might take much longer than a single sprint due to missing values in the code that need to be encoded or needed data not being present at all. Yes, this could be addressed by having prior story “explore the dataset and fill missing values” but as I will describe in a second most PO’s lack the expertise to prioritize these types of stories as it doesn’t fulfill an immediate deliverable.

但是我們不能分解這些大任務嗎？ 支持Scrum的人會爭辯說，這里的問題不是Scrum，而是需要更好地分解大型任務(可能需要花費更多時間進行梳理會話)。但是，即使分解大任務也不能消除數據科學的不確定性。例如，訓練XGBoost模型并報告結果之類的任務可能比單個sprint花費更長的時間，這是因為代碼中缺少需要編碼的值或根本不存在數據。是的，這可以通過使用先前的故事“探索數據集并填充缺失的值”來解決，但是正如我將在第二篇中描述的那樣，大多數PO缺乏專門知識來對這些類型的故事進行優先級排序，因為它無法立即完成交付。

Constant Pivoting

恒定旋轉

Related to the above point, Scrum often results in constant pivoting from one project to another. True, many consider pivoting a “desirable” trait, however this constant change in direction often results in nothing getting done and promising projects being shelved simply because they aren’t producing immediate deliverables. This is particularly true in data science where many projects require a long-term investment in both employee time and resources. I have seen many times where promising projects were discontinued because they didn’t deliver performance improvements fast enough or the product owner just saw something flashier, they wanted to focus on.

與上述觀點有關，Scrum經常導致從一個項目到另一個項目的不斷旋轉。的確，許多人考慮轉為“理想的”特征，但是，方向的不斷變化通常導致一無所獲，有希望的項目被擱置僅僅是因為它們沒有立即產生可交付的成果。在數據科學領域尤其如此，因為在該領域中，許多項目都需要在員工時間和資源上進行長期投資。我見過很多次有希望的項目被終止，因為它們沒有足夠快地提供性能改進，或者產品負責人只是看到了一些閃光點，他們想集中精力。

Lack of cross team pollination

缺乏跨團隊授粉

Scrum often creates a horribly narrow focus on one’s own team’s sprint tickets to the exclusion of everything else. It discourages data scientists (or really anyone) from contributing to other initiatives around their company; other initiatives where their skills could potentially be of help. It also has the tendency to push off important issues that could affect other teams unless that team is in direct contact with the product owner.

Scrum經常將焦點集中在自己團隊的沖刺門票上，從而將其他一切排除在外。它不鼓勵數據科學家(或幾乎任何人)為公司的其他計劃做出貢獻；其他可能有助于他們的技能的計劃。除非該團隊直接與產品所有者聯系，否則它也傾向于推銷可能影響其他團隊的重要問題。

The role of the product owner (PO)

產品負責人(PO)的角色

Another key problem of Scrum is that it places too much power in the hands of the PO. The PO is generally in charge of the backlog and determines which issues need to be prioritized. However, product owners generally have a poor understanding of the technical nuances of data science projects. Therefore, needed work such as refactoring of code or further analysis of model performance often gets pushed to the back. Additionally, lack of immediate “progress” might result in the product owner moving away from a project entirely. This isn’t to say that data scientists shouldn’t regularly communicate with stakeholders to determine the priority of tickets, but rather than having a dedicated product owner at all the meetings and deciding the priority of tasks is counterproductive both to the team and long term to the product itself.

Scrum的另一個關鍵問題是，它將過多的權力置于PO手中。采購訂單通常負責積壓，并確定需要優先處理的問題。但是，產品所有者通常對數據科學項目的技術細微差別知之甚少。因此，諸如代碼重構或對模型性能的進一步分析之類的所需工作通常被推遲到后面。此外，缺乏即時的“進度”可能會導致產品所有者完全脫離項目。這并不是說數據科學家不應該定期與利益相關者進行交流來確定工單的優先級，而是要在所有會議上讓專門的產品負責人確定任務的優先級既不利于團隊，又不利于長期發展。產品本身。

Daily Standups, grooming and other wastes of time

每日站立，梳理和其他浪費時間

I’ve seen very few if any teams that need to meet on a daily basis. Communication between teammates is important, however, usually twice per week or three times will more than suffice. Likewise, teammates should be encouraged to reach out if they get blocked or need help. However, a daily standup often does nothing but micro-manage employees.

我見過很少有需要每天開會的團隊。隊友之間的溝通很重要，但是，通常每周兩次或三次就足夠了。同樣，如果被阻止或需要幫助，應鼓勵隊友伸出援手。但是，每天站起來通常只對微觀管理人員無能為力。

Grooming (or refinement) is a meeting of the Scrum team in which the product backlog items are discussed, and the next sprint planning is prepared.
整理(或改進)是Scrum團隊的一次會議，其中討論了產品待辦事項，并準備了下一個沖刺計劃。

Grooming is another session that needlessly wastes time. As I mentioned above, technical complexity in data science often means that sprint goals will often not be met or met with subpar results. This in turn often results in the justification for even more grooming meetings (or pre-grooming as we used to call them) in order to “break down those big issues.” In a never-ending cycle these meetings continue to eat up more and more data scientist time.

修飾是另一次不必要地浪費時間的會議。正如我上面提到的，數據科學中的技術復雜性通常意味著沖刺目標通常不會達到或達到低于標準的結果。反過來，這通常導致有理由進行更多的梳理會議(或稱我們以前稱為“預梳理”)以“解決那些大問題”。在無休止的循環中，這些會議繼續吞噬越來越多的數據科學家時間。

The retro is one of the few scrum meetings that I like, however, suggestions at these meetings are often not taken seriously. For instance, at several sprint retros I’ve attended during my career, the majority of teammates recommended not having a daily stand-up, but the scrum master and management discounted these suggestions because “that would not be scrum.” However, in contrast, suggestions for adding more grooming sessions are almost always enacted without question.

回顧會議是我所喜歡的為數不多的Scrum會議之一，但是，這些會議上的建議通常不會被認真對待。例如，在我職業生涯中參加過的幾次沖刺比賽中，大多數隊友建議不要每天站起來，但是Scrum主管和管理層不贊成這些建議，因為“那不是Scrum。” 但是，與此相反，幾乎總是會提出增加更多修飾會話的建議。

The Scrum Master

Scrum大師

Another role that is essentially useless is the scrum master. the definition of a scrum master formally is:

本質上沒有用的另一個角色是Scrum Master。 Scrum Master的正式定義為：

The scrum master is the team role responsible for ensuring the team lives agile values and principles and follows the processes and practices that the team agreed they would use.”- Agile Alliance
Scrum主管是團隊的角色，負責確保團隊生活在敏捷的價值觀和原則中，并遵循團隊同意使用的流程和實踐。”-敏捷聯盟

What…? In practice, the scrum master acts as a non-technical busy body who coerces team members into attending the aforementioned pointless meetings and drags around JIRA cards, while preaching the canon of how Scrum will lead your team to salvation (e.g. more points delivered per sprint).

什么…？實際上，Scrum主管是一個非技術性的忙碌機構，它強迫團隊成員參加上述無意義的會議，并拖拉JIRA卡，同時宣揚Scrum如何引導您的團隊獲得救助的標準(例如，每個沖刺傳遞更多的積分) )。

False Dichotomy and the no true Scrum argument

錯誤的二分法和不正確的Scrum論點

Finally, proponents of Scrum often create a straw man in comparing Scrum to waterfall and other older project management methods. Moreover, in many companies, management takes an all or nothing approach. It is possible to take aspects of Scrum, Agile or other forms of project management without adhering to them completely. For instance, you could utilize ideas like stories, epics, etc, without a product owner, sprints, or a scrum master.

最后，Scrum的支持者經常將稻草人與瀑布和其他較舊的項目管理方法進行比較時創建了一個稻草人。而且，在許多公司中，管理層采取全有或全無的方法。可以完全不遵循Scrum，敏捷或其他形式的項目管理方面的內容。例如，您可以利用故事，史詩等想法，而無需產品所有者，沖刺或Scrum主管。

Another common trend I see frequently is for people to say “what you experienced was not true Scrum, blah blah is actually waterfall. If only you had a better product owner…” What this fails to realize is that Scrum as a system breeds these types of problems. Delegating a singular role as the product owner is bound to cause problems. Sure you could have an exceptionally good PO that has years of DS experience or understands the team, but that likely won’t be the case. Moreover, the sprint at its core encourages frantic rushing at the end of whatever arbitrarily decided duration in order to meet the “commitment.” It also fundamentally assumes that all work can be concretely estimated. Scrum could possibly work in areas of software engineering where there is very well defined problems that are only slight variations (though even then you have the issue with POs and the Scrum masters). However, when there is any uncertainty (like there is in data science, data engineering and Devops) Scrum breakdowns and results in both wasted time and resources.

我經常看到的另一個普遍趨勢是，人們會說：“您所經歷的不是真正的Scrum，實際上是瀑布。如果只有一個更好的產品負責人……”這無法實現的是，Scrum作為系統會滋生此類問題。將單一角色委派為產品所有者必定會引起問題。當然，您可能擁有一個非常出色的PO，該PO具有多年的DS經驗或對團隊的了解，但事實并非如此。此外，沖刺的核心是鼓勵在任意決定期限的末尾瘋狂奔波，以實現“承諾”。這也從根本上假定所有的工作可以具體估算。 Scrum可能在軟件工程領域中工作，在這些領域中，定義非常明確的問題只有很小的變化(盡管即使如此，PO和Scrum管理員仍然遇到問題)。但是，當存在任何不確定性時(例如數據科學，數據工程和Devops中)，Scrum故障會導致浪費時間和資源。

What should you use instead?

您應該使用什么呢？

This leads to a central question of how you should manage a data science team. There is no single answer. What I’ve found to work well is a Kanban based approach without a product owner but regular discussions with stakeholders (weekly or every other week). Additionally, work in progress limits seem to help streamline the process. Like I mentioned above, meetings twice per week (Tuesday/Friday) or some alternative often work well.

這就引出了一個中心問題，即您應該如何管理數據科學團隊。沒有一個答案。我發現行之有效的是一種基于看板的方法，沒有產品所有者，而是與利益相關者進行定期討論(每周或每兩周一次)。此外，進行中的限制似乎有助于簡化流程。就像我上面提到的，每周(星期二/星期五)召開兩次會議或其他一些會議通常效果很好。

However, this approach may not work well for all teams. That is why, particularly for data science, I’d recommend trying out many different approaches to determine what works well for your team. The key is to find the system that works well for your team and stakeholders rather than one that just placates upper management’s ideas of how a data scientist team should operate.

但是，這種方法可能不適用于所有團隊。因此，特別是對于數據科學，我建議嘗試許多不同的方法來確定最適合您的團隊的方法。關鍵是要找到一種對您的團隊和利益相關者都適用的系統，而不是僅僅體現高級管理層關于數據科學家團隊應如何運作的想法的系統。

Additional Links

附加鏈接

Reddit Discussions on Scrum for Data Science

Reddit關于數據科學Scrum的討論

Quora Question and Answers

Quora問題與解答

翻譯自: https://towardsdatascience.com/why-scrum-is-awful-for-data-science-db3e5c1bb3b4

為什么用scrum

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/392331.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/392331.shtml
英文地址，請注明出處：http://en.pswp.cn/news/392331.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！