蠕變斷裂 ansys
by Dror Berel
由Dror Berel
如何避免范圍蠕變,以及其他軟件設計課程的辛苦學習方法 (How to avoid scope creep, and other software design lessons learned the hard way)
從數據科學的角度來看。 (From a data-science perspective.)
You’ve got a fresh new project on your desk, some exciting data, a challenging Kaggle competition, a new client you wish to impress, and you are fully motivated. At first, the problem seems to be well defined, and you even feel comfortable with the task in hand. You have just completed a similar task. This new one should not be much different. Maybe even just a few copy/pastes with some modifications at the edges.
您的辦公桌上有一個嶄新的項目,一些令人興奮的數據,充滿挑戰性的Kaggle競賽,您希望給新客戶留下深刻的印象,并且您充滿了動力。 最初,問題似乎已經很好地定義了,您甚至對手頭的任務感到滿意。 您剛剛完成了類似的任務。 這個新的應該沒有太大的不同。 甚至可能只是一些復制/粘貼,并且在邊緣進行了一些修改。
But then it comes… The client / collaborator / boss has just one simple additional request… It usually goes like this:
但是隨之而來……客戶/合作者/老板只有一個簡單的附加請求……通常是這樣的:
‘Hmmmm, I wonder how would the results look like if instead of x, we do only a minor change, just do y, or… you know what, let’s try both and see how it affects the results’.
“嗯,我想知道結果是什么樣的,如果我們只做一個微小的改變而不是x,或者做y,或者……您知道嗎,讓我們同時嘗試一下,看看它如何影響結果。”
Can the initial tool/solution you chose handle such an adjustment? It may be easy to copy/paste it with a couple of alterations, but what if you have to do it again and again? For how long are you going to stick to your initial plan?
您選擇的初始工具/解決方案可以處理這種調整嗎? 復制/粘貼一些更改可能很容易,但是如果必須一遍又一遍怎么辦? 您將堅持最初的計劃多長時間?
Within the context of machine learning, some examples are:
在機器學習的上下文中,一些示例是:
Tuning ‘let’s see how a different model parameter affects it’
調整 “ 讓我們看看不同的模型參數如何影響它”
Benchmarking ‘let’s see how various models affect it’
基準測試 “ 讓我們看看各種模型如何影響它”
Ensemble ‘let’s try combining the best models together’
合奏 “ 讓我們嘗試將最佳模型組合在一起 ”
Resampling / cross-validation ‘we must inspect for over-fitting’
重新采樣/交叉驗證 “ 我們必須檢查是否過擬合 ”
Imagine adding on top of that some complex, messy, multi-layer, high-throughput genomics data that can easily go into a very fine resolution level (gene expression / mutation / sequence, …),… AND THEN adding multiple layers of various multi-genomic data on top of each other, …AND THEN doing it for multiple cohorts / studies in a meta-analysis level … you may end up with a VERY … BIG … UGLY … MESS!
想象在這些復雜,混亂,多層,高通量的基因組數據的基礎上添加,這些數據可以輕松進入非常精細的分辨率級別 (基因表達/突變/序列等),然后再添加多層多層-基因組數據互為基礎,...然后在薈萃分析級別將其用于多個群組/研究中...您可能最終會感到……非常……非常……非常……非常!
Sound familiar? Unfortunately, I have been in this situation more than once. As much as I was motivated to please my collaborators, at those times, my tools were…limited, and not sufficient to deliver the broader scope resolution. At that time, I might have not even been aware that a higher level of scope was relevant.
聽起來有點熟? 不幸的是,我不止一次遇到這種情況。 盡我所能去取悅我的合作者,當時我的工具是……有限的,不足以提供更廣泛的解決方案。 那時,我什至可能還沒有意識到更高層次的范圍是有意義的。
A lot has been written about scope creep in the context of project management. But what would a scientist, who was mostly trained to care about the rightness of the analysis / tools, rather than the ‘management’ of the whole project, have to say about it?
在項目管理的背景下,有關范圍蠕動的文章很多。 但是,一位受過最多培訓的科學家關心的分析/工具的正確性,而不是整個項目的“管理”,該怎么說呢?
The good news, my friend, is that it is never too late to learn from someone else’s mistakes. Here are couple of lessons, learned the hard way. (No worries, this is not another blog post about reproducible research).
我的朋友,好消息是,從別人的錯誤中學習永遠不會太晚。 這是幾節課,是艱難學習的方法。 (不用擔心,這不是有關可重復研究的另一篇博客文章)。
第一課:從頭開始! 定義您的范圍。 您需要擴展它嗎? (Lesson #1: Begin at the end! Define what your scope is. Do you need to extend it?)
Make sure you understand what is the highest expected resolution! Brainstorm what would be the craziest outcomes of your project, and then agree on reasonable expectations within your timeframe and budget.
確保您了解最高的預期分辨率! 集體討論您的項目最瘋狂的結果,然后在您的時間范圍和預算內就合理的期望達成共識。
Have a very detailed, clear, definition of the project scope. For example, is your solution going to handle just one data set, or more? How are you going to validate your results? There are always going to be more methods/data sets for that, but what would be just sufficient enough?
對項目范圍有一個非常詳細,清晰的定義。 例如,您的解決方案是否僅處理一個或多個數據集? 您將如何驗證結果? 總是會有更多的方法/數據集,但是什么才足夠?
The tricky challenge with scope creep is that the client doesn’t really care or think in terms of “scope”. Their goal is to get a solution that solves a hypothesis, or a business need. Whether their request is within or outside scope is entirely your problem! DEAL WITH IT!
范圍蔓延的棘手挑戰是客戶并不真正在乎“范圍”。 他們的目標是獲得解決假設或業務需求的解決方案。 他們的要求是在范圍內還是范圍外完全是您的問題! 處理它!
In the context of machine learning, back in the day, I used ad-hoc R packages that do just one multivariate model. They did the work well, but were too specific for the developers domain, and lacked the higher resolution on comparing it with other models, or aggregating other models, or lacking resample implementation. Only later did I learn to utilized machine learning meta/aggregator packages such as mlr, tidymodels (formerly caret), or SuperLearner to extend my scope. Read more about it here.
在過去的機器學習中,我使用了只做一個多元模型的即席R軟件包。 他們的工作做得很好,但是對于開發人員領域而言過于具體,并且在與其他模型進行比較,將其他模型進行匯總或缺少重新采樣實現方面缺乏更高的分辨率。 直到后來,我才學會利用機器學習元/聚合器程序包(例如mlr,tidymodels(以前為插入號)或SuperLearner)來擴展我的范圍。 在此處了解更多信息 。
第2課:不要重新發明輪子! 還有其他專家比您知道如何做得更好! (Lesson #2: Do not reinvent the wheel! There are other experts that know how to do it better than you!)
In a role where you are expected to be multidisciplinary, and new tools/methods pop daily that are accessible for everyone to use, it may be a slippery fall into a very deep rabbit hole to explore any new approach. And guess what, nobody want you to waste their time/money on that.
在這個角色中,您應該是多學科的,并且每天都有新的工具/方法可供所有人使用,這可能是一個滑入非常深的兔子洞以探索任何新方法的方法。 猜猜是什么,沒人希望您在此浪費時間/金錢。
How to bet on the right tool? Ask yourself, what do the experts in that domain use? How mature is the tool they developed? Is it going to be maintained, or deprecated? They of course had their own learning curve, and over time, have perfected their tools to overcome the common pitfalls you are about to discover.
如何下注正確的工具? 問問自己,該領域的專家使用了什么? 他們開發的工具有多成熟? 是要維護還是不推薦使用? 他們當然有自己的學習曲線,并且隨著時間的推移,他們已經完善了他們的工具,以克服您將要發現的常見陷阱。
For me, with genomics data, it was Bioconductor Object-oriented S4 classes. Read more here about why that was the best tool for my need. Sure, it wasn’t trivial to learn, but I felt comfortable betting on it when I saw how it is implemented at top academic and industry organizations. I also knew that it was not another open source resource that might die. Instead, it as a government and academia-funded project, powered by the best experts of the domain, open, and free, for all of us to use.
對我而言,利用基因組學數據,它是Bioconductor 面向對象的S4類。 在這里關于為什么那是我需要的最好工具。 當然,學習并不是一件容易的事,但是當我看到它在頂級學術和行業組織中是如何實現的時,我感到很放心。 我還知道,這可能不是另一個開源資源。 相反,它是由政府和學術界資助的項目,由該領域的最佳專家提供支持,開放,免費,供我們所有人使用。
第3課:發現差距了? 要有創造力,但要保持簡單! (Lesson #3: Found a gap? Be creative, but keep it simple!)
But what if something in the analytical pipeline is still not in place? A missing link, nowhere to be found, that would have better fit to the specific need you have, bridging the gap?
但是,如果分析管道中的某些內容仍然不存在怎么辦? 找不到鏈接的缺失鏈接會更好地滿足您的特定需求,從而彌合差距嗎?
Here you might need to get some dirty work done, and stop depending on others to provide you the solution. Another potentially slippery scope creep rabbit hole? Maybe… if you are not careful enough!
在這里,您可能需要完成一些骯臟的工作,然后停止其他工作來為您提供解決方案。 另一個可能滑的示波器蠕變兔子洞? 也許……如果您不夠小心!
How to avoid it? Very easy: Keep it Simple!
如何避免呢? 非常簡單: 保持簡單!
Here is a very simple example. Suppose you have to solve an unsupervised problem. There is definitely more than one way to it. Which one to choose? Is the simplest one, suppose ‘hierarchical clustering’, just be good enough to begin with? Implement it, see how it works with the rest of your analytical components (data, scalability, reproducibility), and later on, after things have worked out well as you planned, relax that simplification into a more complex method. Do it very carefully and gradually.
這是一個非常簡單的示例。 假設您必須解決一個無監督的問題。 肯定有不止一種方法。 選擇哪一個? 最簡單的假設是“層次集群”嗎? 實施它,查看它如何與其余分析組件(數據,可伸縮性,可再現性)一起使用,然后在按計劃進行一切工作之后,將其簡化為更復雜的方法。 請非常小心并逐步進行。
More examples to follow next.
接下來將有更多示例。
第4課:不要害怕重構! (Lesson #4: Do not be afraid to refactor!)
Tired of patching and debugging poorly cohesive and poorly-designed code that someone else, maybe even your boss, has written long time ago, before better tools became available? You ask yourself, GRRRRR, this is such an ugly workaround, why not just simply use that new approach that was designed specifically for this task? (see lesson #2).
厭倦了修補和調試缺乏凝聚力和設計欠佳的代碼,而這些代碼是別人(甚至是您的老板)很久以前在更好的工具可用之前編寫的? 您問自己,GRRRRR,這是一個丑陋的解決方法,為什么不僅僅使用專門為此任務設計的新方法呢? (請參閱第2課)。
Yes, it is risky to begin everything from scratch, and sometimes you may not have the resources to do it, but perhaps it is time for a reality check.
是的,從頭開始一切都是冒險的,有時您可能沒有足夠的資源來做,但是也許是時候進行現實檢查了。
But what if the refactor solution will give us different results from what our collaborators are already counting on? Well, if there was indeed a past error/bug/mistake, it is better to face it and acknowledge it now, before even more damage is done. But also remember lesson #3: If you stick to simple solutions at the core, refactoring them under broader wrapping solution should assist in producing similar results.
但是,如果重構解決方案將給我們帶來不同于我們合作者已經期望的結果的結果呢? 好吧,如果確實存在過去的錯誤/錯誤/錯誤,最好在進一步造成損害之前立即面對并確認它。 但也要記住第3課:如果您堅持使用簡單的解決方案作為核心,那么在更廣泛的包裝解決方案中對其進行重構應有助于產生相似的結果。
第5課:進入第1課。 (Lesson #5: go to lesson #1.)
實例探究: (Case studies:)
Here are two case studies from my own experience working with multi genomic data. (Could easily expand to other types of data, but perhaps that is a topic for a future post).
這是根據我自己在處理多基因組數據方面的經驗得出的兩個案例研究。 (可以輕松擴展到其他類型的數據,但這也許是以后發布的主題)。
Case Study #1: Bioc2mlr: A utility function to transform Bioconductor’s S4 omic classes into mlr’s task and CPOs.
案例研究1:Bioc2mlr:一種實用程序功能,可將Bioconductor的S4 omic類轉換為mlr的任務和CPO。
https://drorberel.github.io/Bioc2mlr/
https://drorberel.github.io/Bioc2mlr/
I love using Bioconductor data containers for genomic data, but I also love machine learning meta-aggregator toolkits for analysis at higher level scope. The only problem was that they were not necessarily compatible with each other.
我喜歡將Bioconductor數據容器用于基因組數據 ,但我也喜歡機器學習元聚合器工具包 ,可以在更高層次上進行分析。 唯一的問題是它們不一定彼此兼容。
The S4 object oriented had multiple dimensions (slots), tied in together in complex constraints, that were intentionally designed to meet some purpose. But the machine learning approach was designed for a simplified, flat, two-dimensional, matrix like input structure: columns for the features/variables, and rows for the subjects/observations.
面向S4的對象具有多個維度(插槽),它們在復雜的約束條件下捆綁在一起,這是有意設計的,可以滿足某些目的。 但是,機器學習方法是為簡化的,扁平的,二維的,矩陣狀的輸入結構設計的:特征/變量的列和主題/觀察值的行。
I needed some way of breaking the S4 constrained ties, and flattening it. But unfortunately, to the best of my knowledge, I couldn’t find a way to do so. What should I have done?
我需要一些打破S4約束關系并將其展平的方法。 但不幸的是,據我所知,我找不到辦法。 我該怎么辦?
Remember lesson #3: Should I spend my time on this task? Well… yes, why not? I felt comfortable enough with both approaches, have already experienced the ins and outs, the soft bellies, and I definitely appreciated the tremendous value of both approaches separately, but also jointly. In fact, creating this adapter package, Bioc2mlr, was not too much effort to do, and if you look at the code itself, you will see relatively simple steps.
記住第3課:我應該花時間在此任務上嗎? 好吧,是的,為什么不呢? 我對這兩種方法都感到很自在,已經體驗過前后的動作,輕柔的腹部,并且我肯定會分別和聯合使用這兩種方法的巨大價值。 實際上,創建此適配器包Bioc2mlr并不需要花費太多精力,如果您看一下代碼本身,將會看到相對簡單的步驟。
Conclusion of case 1: When you have a couple of good tools, but they are not compatible, create a simple new adapter to link them.
案例1的結論 :如果您有幾個不錯的工具,但是它們不兼容,請創建一個簡單的新適配器來鏈接它們。
Case study #2: meta analysis
案例研究2:元分析
But that wasn’t enough for me…(see lesson #5).
但這對我來說還不夠……(請參閱第5課)。
My scope extension required me to provide a solution to even higher level of analysis. Meta-analysis of multiple studies/cohorts, each with a multi-omic data cube, each with a downstream machine learning analytics pipeline, implementing resampling, and all that jazz, across all studies, and at scale. Phewww!
我的范圍擴展要求我提供更高級別分析的解決方案。 對所有研究/群組的薈萃分析,每個研究/群組均具有多組數據立方體,每個研究/群組均具有下游機器學習分析管道,實施重采樣以及所有研究中的所有爵士樂。 哎呀!
Quite a challenge! How should I address that implementing the above lessons?
挑戰很大! 我應該如何解決實施上述課程?
Lesson #1: I began at the end. My ‘observation-unit’, row, in a tidy-fashion is not the subject, neither is the gene, nor is just one of the omics. It is the entire study/cohort (that is, a whole data cube) well-compressed into a single object in R. More than one cohort? Not a problem at all. Add as many as rows as you need for more cohorts.
教訓#1:我從結尾開始。 我的“觀察單位”整齊劃一,既不是主題,也不是基因,也不是組學之一。 它是將整個研究/隊列(即整個數據立方體)很好地壓縮到R中的單個對象中。多個隊列嗎? 完全沒有問題。 根據需要添加盡可能多的行,以獲取更多同類。
Lesson #2: Didn’t have to invent a new tool. The experts in our field have already figured it out for us. They might have not had this implementation in mind when they did so, but if I can do it, so can you. Just give it a try.
第二課:不必發明新工具。 我們領域的專家已經為我們找到了解決方案。 他們這樣做時可能沒有想到這種實現,但是如果我能做到,那么您也可以。 試一試。
Lesson #3: I found a simple solution. Should I invent/extend a new S4 object oriented class for this type of multi-cohort, multi-omic data? Of course not. There must be a simple solution. My simple solution: a tidy / nested data structure, with non-atomic objects at each cell. Read more about it here.
第3課:我找到了一個簡單的解決方案。 我是否應該為這種多隊列,多組數據類型發明/擴展一個新的面向對象的S4類? 當然不是。 必須有一個簡單的解決方案。 我的簡單解決方案:整潔/嵌套的數據結構,每個單元格都具有非原子對象。 在此處了解更多信息 。
Lesson #4: Refactor? Well. Maybe I am not there yet now, since so far my (current) scope can handle all my wildest dreams. But if you show me a better approach, perhaps a data.table one (I know), or even in python (god forbid), I would not hesitate to give it a try, even if it is beyond my comfort zone.
第4課:重構? 好。 也許我現在還不在那里,因為到目前為止,我(當前)的范圍可以處理我所有最瘋狂的夢想。 但是,如果您向我展示了一種更好的方法,也許是一個data.table(我知道),甚至是python(上帝禁止),我都會毫不猶豫地嘗試一下,即使它超出了我的舒適范圍。
Lesson #5: Meta-meta analysis? (Not a typo). Who knows. Maybe one day.
第5課:元元分析? (不是錯字)。 誰知道。 也許有一天。
Conclusion of case 2: tidy everything! Even non-atomic objects.
案例2的結論 :整理一切! 甚至非原子物體。
最后一條建議:至少在您成為自己之前,征求專家的意見。 (One last piece of advice: Get an expert’s opinion, at least until you become one yourself.)
‘If only I had known that before. That could have saved me so much time and effort…’
如果我以前才知道這一點。 那可以節省我很多時間和精力……”
To the expert, your current challenges are yesterday’s resolution. They had already figured that out when we were still in kindergarten. They have spent their entire career just on that. Shoot them an email, ask a very clear question, with no dependencies proof-of-concepts examples, or case studies to demonstrate your challenge. My experience is that they would be happy to assist if you respect their time and authority.
對專家來說,您當前面臨的挑戰是昨天的解決方案。 他們在我們還在幼兒園的時候就已經知道了。 他們將整個職業生涯都用于此。 向他們發送電子郵件,提出一個非常明確的問題,沒有依賴關系的概念證明示例,也沒有案例研究來證明您的挑戰。 我的經驗是,如果您尊重他們的時間和權威,他們將很樂意為您提供幫助。
最后的話 (Final words)
When you figure out what type of tool/solution you are passionate about, make it happen! Don’t fool yourself with excuses why it is not a good time for your new tool to be created. Just do it!
當您確定您對哪種類型的工具/解決方案充滿熱情時,就去實現吧! 不要以任何理由欺騙自己,為什么現在不是創建新工具的好時機。 去做就對了!
Don’t give up. Focus. Decide what you want to achieve. Do not be afraid to extend your scope, but do it with simple solutions! Refactor. It will be worth your time. Maybe not immediately, but in days to come. Be creative!
不要放棄 焦點。 確定要實現的目標。 不要害怕擴大您的范圍,但是可以使用簡單的解決方案來做到! 重構。 這將是值得您度過的。 也許不是立即,而是未來的日子。 有創造力!
And last but not least, don’t be shy. Tell everyone about it. Share it with your community. Make the universe a better place with your solution. You may even earn an extra buck on the side. Who knows?
最后但同樣重要的是,不要害羞。 告訴大家。 與您的社區分享。 用您的解決方案使宇宙成為更好的地方。 您甚至可以從側面賺到額外的錢。 誰知道?
p.s.
ps
This post is dedicated with love to all of my former anxious collaborators / clients / bosses. I appreciate your patience, and wish I would have known the above before. You were there to assist and support me learning these lessons the hard way, for both good and for bad. Let me make it up to you. Shoot me an email and I will redo my old work in just a few lines of code, reflecting my current level of scope.
這篇帖子獻給了我以前所有焦慮的合作者/客戶/老板。 感謝您的耐心配合,并希望我早已知道以上內容。 您在那里是為了幫助和支持我以辛苦的方式學習這些課程,無論是好是壞。 讓我來彌補你。 給我發一封電子郵件,然后我會用幾行代碼來重做我的舊作品,以反映我當前的范圍。
Check more related topics here: https://drorberel.github.io/
在此處查看更多相關主題: https : //drorberel.github.io/
顧問:目前正在接受新項目! (Consultant: currently accepting new projects!)
Useful reference:
有用的參考:
Clean Coder BlogOn the Diminished Capacity to Discuss Things Rationallyblog.cleancoder.comScope Creep in Project Management: Definition, Causes & SolutionsWhen a project stretches far beyond its original vision, it is called "scope creep". Scope creep in project management…www.workamajig.com
Clean Coder博客, 有關合理討論問題的能力已減弱blog.cleancoder.com 項目管理中的范圍蠕變:定義,原因和解決方案 當一個項目超出其最初的構想時,則稱為“范圍蠕變”。 項目管理的范圍不斷擴大…… www.workamajig.com
翻譯自: https://www.freecodecamp.org/news/scope-creep-and-other-software-design-lessons-learned-the-hard-way-edacf021965b/
蠕變斷裂 ansys