數據庫數據過長避免
讓我們從一個類比開始 (Let's start with an analogy)
Stick with me, I promise it’s relevant.
堅持下去,我保證這很重要。
If your selling vegetables in a grocery store your business value lies in your loyal customers and your position on the high street that sees a high footfall. You probably don’t have a fancy dandy shop front, it’s just boxes of veg, it’s that and your quality sales staff that sells the veg to the passers-by.
如果您在雜貨店里賣菜,您的業務價值就在于您的忠實客戶和您在大街上人流量大的位置。 您可能沒有花哨的花花公子店面,只是一箱蔬菜,是這樣,還有您的優質銷售人員將蔬菜賣給路人。
One day a salesman from High Tech Veg Retail Solutions Inc comes into your shop. He tells you “cardboard boxes are not efficient and unmanageable”. He has a product that will keep your veg in a locked fridge in the back of the shop, but passers-by could simply ask for cauliflower and it would be whizzed at top speed via conveyer belt to them.
有一天,來自高科技蔬菜零售解決方案公司的推銷員走進您的商店。 他告訴您“紙箱效率不高且無法管理”。 他的產品可以將您的蔬菜放在商店后方的鎖冰箱中,但是過路人可以簡單地索要花椰菜,然后會通過傳送帶以極高的速度將菜花打發到他們身上。
It does almost everything, the only downside is that due to the complexity of the machine you will only be able to stock half your current range of veg and by the way, all the veg will still be stored in cardboard boxes inside the fridge.
它幾乎可以完成所有工作,唯一的缺點是,由于機器的復雜性,您將只能儲備當前范圍的蔬菜的一半,而且,所有的蔬菜仍將存儲在冰箱內的紙板箱中。
On the upside, you can get rid of your quality staff and employ cheaper staff with fewer skills.
從好的方面來看,您可以擺脫高素質的員工,而聘用技能較少的廉價員工。
I’m sure you would send him on his way to find another victim.
我相信您會派他去尋找另一名受害者。
您的商業價值是知識產權 (Your business value is Intellectual Property)
If your reading this article, then you are either considering AI and ML or are already using it and have heard that there is a much better commercial data science platform available.
如果您閱讀本文,那么您正在考慮使用AI和ML或已經在使用AI和ML,并且聽說有一個更好的商業數據科學平臺可用。
In the remainder of this article, I’m going to explain why you would be making a big mistake investing in a commercial data science solution.
在本文的其余部分中,我將解釋為什么您在商業數據科學解決方案上進行投資會犯一個大錯誤。
開源紙箱 (Open source cardboard boxes)
Those free cardboard boxes that are easily accessible on the shop front are your Open Source AI and ML toolsets, freely available and easily accessible.
那些在商店前部容易獲得的免費紙板箱是您的開源AI和ML工具集,可免費獲得且易于獲得。
They don’t hide anything, you can see everything you put in and you can stand by the output, even for safety-critical applications because you can describe how you got your results.
它們不會隱藏任何內容,您可以看到所輸入的所有內容,并且可以支持輸出,即使對于安全性至關重要的應用程序也是如此,因為您可以描述如何獲得結果。
Every available option for squeezing that last 20% out of your model that produces 80% of its value is available to you.
您可以使用每個可用選項來將模型中的最后20%壓縮,從而產生其價值的80%。
Any training you need is free or very low cost at least and is easily accessible 24 hours a day on many different web sites.
您需要的任何培訓至少都是免費的或非常廉價的,并且每天24小時均可在許多不同的網站上輕松訪問。
The most common language adopted by Opensource tools is Python. A language learned at High School, college, and University.
開源工具采用的最常見的語言是Python。 在高中,大學和大學學習的一種語言。
帶有閃亮貼紙的昂貴紙板箱 (Expensive cardboard boxes with a shiny sticker)
This is what commercial AI and ML platforms offer.
這就是商業AI和ML平臺所提供的。
Under the hood, they are employing the same Opensource tools you can access for free. Yes, they have a fancy wrapper around them, a conveyer belt built-in, and a shiny sticker to boot.
在幕后,他們正在使用可以免費訪問的相同開源工具。 是的,它們周圍有精美的包裝紙,內置的傳送帶和引導套。
The only way to access those free tools though, is through the interface the platform provides you with. Its a really pretty interface, but it only gives you access to a fraction of the capability of what the underlying opensource tools are capable of.
但是,訪問這些免費工具的唯一方法是通過平臺提供的界面。 它的界面非常漂亮,但是只允許您訪問底層開源工具所能提供的部分功能。
I can’t think of any commercial data science platform that is not employing Opensource tools at its heart.
我想不出任何沒有真正使用開放源代碼工具的商業數據科學平臺。
The 80/20 ruleThe data scientists that could get that last 20% out of a model for you, are now reduced to dragging, dropping, and clicking a mouse and you're losing 80% of your business value. I hear you say, “but the results are much faster on this vendors platform”, OK, so you’re losing 80% of your business value faster!
80/20規則可以為您從模型中獲得最后20% 收益的數據科學家現在減少為拖放,單擊和單擊鼠標,您將失去80%的業務價值。 我聽到你說,“但是在這個供應商平臺上,結果更快”,好的,因此您損失了80%的業務價值!
Also, ask yourself why is this vendors platform faster, it’s because that last 20% that gets 80% of the value is not the low hanging fruit. It’s complex, it’s why data scientists dedicate their careers to the subject and its why they are invaluable as data scientists and not mouse clickers
另外,問自己為什么這個供應商平臺更快,這是因為最后20%獲得80%的價值的原因并不容易。 這很復雜,這就是為什么數據科學家將自己的職業奉獻給該學科,以及為什么他們作為數據科學家而不是鼠標點擊者而具有不可估量的價值
Where is your business value now?Let’s assume that this commercial platform, by some miracle, could get 100% of the value you can get from unrestricted Opensource tools, where is your business value now? It’s locked into this vendor's platform, a platform your spending a huge amount of money on.
您現在的業務價值在哪里? 讓我們假設這個商業平臺可以奇跡般地從無限制的開源工具中獲得100%的價值,現在您的商業價值在哪里? 它已鎖定在該供應商的平臺中,您在該平臺上花費了大量金錢。
You can’t extract your IP, its been converted into a proprietary format. Even if you could reverse engineer their generated code (see you in court), the best you would get is a result that is missing that last 20% and how long did the reverse engineering take you.
您無法提取您的IP,它已轉換為專有格式。 即使您可以對他們生成的代碼進行逆向工程(法庭上見),您得到的最好結果就是遺漏了最后20%的結果,以及逆向工程花費了您多長時間。
The tail wagging the dogAI and ML are improving all the time. Every few months a new feature comes out that wows the community and offers your business even more potential revenue.
搖擺狗 AI和ML 的尾巴一直在改善。 每隔幾個月就會發布一項新功能,該功能引起了社區的贊譽,并為您的企業提供了更多的潛在收入。
Your vendor's commercial application and UI is so tightly integrated into the older versions of the Opensource software, that you won’t see that update for another 6 to 12 months. Forget it, six months is a lifetime in AI and ML, you just missed that opportunity.
您供應商的商業應用程序和用戶界面是如此緊密地集成到舊版本的開源軟件中,以至于再過6至12個月您都不會看到該更新。 算了,六個月是AI和ML的生命,您只是錯過了這個機會。
Recruitment, retention, and training. Every data scientist you recruit, will, for the most part, come fully trained on the opensource tools that they have been working with for years. Those that are just out of university, will be full of enthusiasm, have fresh ideas. The one thing they all have in common, is they are all experts on the opensource tools sets, that will let them bring their enthusiasm and ideas to reality.
招聘,保留和培訓。 您招募的每位數據科學家都將在很大程度上接受他們多年來使用的開源工具的全面培訓。 那些剛大學畢業的人會充滿熱情,并有新的想法。 他們都有一個共同點,就是他們都是開源工具集的專家,這將使他們將熱情和想法變為現實。
Of course, you're going to tell them in the interview to forget all that knowledge they have worked hard to accrue, you have just invested a lot of money on a proprietary system that has half the data science capability they are used to and which they have never heard of before.
當然,您將在面試中告訴他們,他們會忘記他們辛辛苦苦積累的所有知識,您剛剛在專有系統上投入了很多錢,而該專有系統具有他們慣用的數據科學能力的一半,并且他們從未聽說過。
The long and short is you will find it hard to recruit staff and impossible to recruit talented staff. Any talented staff you currently have will soon be leaving as well.
總而言之,您將很難招募員工,也很難招募有才能的員工。 您目前擁有的所有有才能的員工也將很快離開。
Trust the grassroots. You will very rarely hear a data scientist raving about a commercial data science platform. For that reason, most of the vendors offering these products don’t target the grassroots. They go directly to the senior managers and even CEO looking for a top-down decision. Most CEO’s understand the value of data science, but the details are complex and overwhelming. So when a well-trained salesman scares the living shit out of them with horror stories of Opensource wow’s they tend to believe them.
相信基層。 您很少會聽到數據科學家對商業數據科學平臺大加贊賞。 因此,大多數提供這些產品的供應商都不以基層為目標。 他們直接向高級經理甚至首席執行官尋求自上而下的決定。 大多數首席執行官都了解數據科學的價值,但細節復雜而壓倒性。 因此,當一個訓練有素的推銷員以開放源代碼的恐怖故事嚇them他們的生活時,他們往往會相信它們。
Talk to your own loyal staff before forcing something on them. Find out what opensource tools they currently use and what could be done better if a small investment was made, or they were given the time to design and implement a more suitable stack. After all, they work in your business, they know your requirements, and I guarantee the costs will be orders of magnitude less than paying for a commercial platform.
在強迫他們之前,先與自己的忠實員工交談。 找出他們當前使用哪些開源工具,如果進行少量投資,或者他們有時間設計和實現更合適的堆棧,則可以做得更好。 畢竟,他們在您的企業中工作,知道您的要求,并且我保證成本將比為商業平臺支付的費用少幾個數量級。
綜上所述 (In summary)
If you have got a data science requirement and money to invest, invest it wisely. Invest in talented individuals. Look at how you can make a small investment in infrastructure to get a big payback from the tools they already use. Your skilled staff will make your company more valuable and you will retain 100% of your business IP. You don’t need a high tech cardboard box, the free opensource ones, you already have are the best you can get.
如果您有數據科學方面的要求和資金來進行投資,請明智地進行投資。 投資有才華的人。 看一下如何在基礎架構上進行少量投資,以從他們已經使用的工具中獲得豐厚的回報。 熟練的員工將使您的公司更有價值,并且您將保留100%的業務IP。 您不需要高科技的紙板箱,免費的開源紙板箱,已經是最好的了。
翻譯自: https://medium.com/swlh/why-you-should-avoid-commercial-data-science-platforms-6e9c4b5f3596
數據庫數據過長避免
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/392537.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/392537.shtml 英文地址,請注明出處:http://en.pswp.cn/news/392537.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!