數據庫數據過長避免_為什么要避免使用商業數據科學平臺

數據庫數據過長避免

讓我們從一個類比開始 (Let's start with an analogy)

Stick with me, I promise it’s relevant.

堅持下去,我保證這很重要。

If your selling vegetables in a grocery store your business value lies in your loyal customers and your position on the high street that sees a high footfall. You probably don’t have a fancy dandy shop front, it’s just boxes of veg, it’s that and your quality sales staff that sells the veg to the passers-by.

如果您在雜貨店里賣菜,您的業務價值就在于您的忠實客戶和您在大街上人流量大的位置。 您可能沒有花哨的花花公子店面,只是一箱蔬菜,是這樣,還有您的優質銷售人員將蔬菜賣給路人。

One day a salesman from High Tech Veg Retail Solutions Inc comes into your shop. He tells you “cardboard boxes are not efficient and unmanageable”. He has a product that will keep your veg in a locked fridge in the back of the shop, but passers-by could simply ask for cauliflower and it would be whizzed at top speed via conveyer belt to them.

有一天,來自高科技蔬菜零售解決方案公司的推銷員走進您的商店。 他告訴您“紙箱效率不高且無法管理”。 他的產品可以將您的蔬菜放在商店后方的鎖冰箱中,但是過路人可以簡單地索要花椰菜,然后會通過傳送帶以極高的速度將菜花打發到他們身上。

It does almost everything, the only downside is that due to the complexity of the machine you will only be able to stock half your current range of veg and by the way, all the veg will still be stored in cardboard boxes inside the fridge.

它幾乎可以完成所有工作,唯一的缺點是,由于機器的復雜性,您將只能儲備當前范圍的蔬菜的一半,而且,所有的蔬菜仍將存儲在冰箱內的紙板箱中。

On the upside, you can get rid of your quality staff and employ cheaper staff with fewer skills.

從好的方面來看,您可以擺脫高素質的員工,而聘用技能較少的廉價員工。

I’m sure you would send him on his way to find another victim.

我相信您會派他去尋找另一名受害者。

您的商業價值是知識產權 (Your business value is Intellectual Property)

If your reading this article, then you are either considering AI and ML or are already using it and have heard that there is a much better commercial data science platform available.

如果您閱讀本文,那么您正在考慮使用AI和ML或已經在使用AI和ML,并且聽說有一個更好的商業數據科學平臺可用。

In the remainder of this article, I’m going to explain why you would be making a big mistake investing in a commercial data science solution.

在本文的其余部分中,我將解釋為什么您在商業數據科學解決方案上進行投資會犯一個大錯誤。

開源紙箱 (Open source cardboard boxes)

Those free cardboard boxes that are easily accessible on the shop front are your Open Source AI and ML toolsets, freely available and easily accessible.

那些在商店前部容易獲得的免費紙板箱是您的開源AI和ML工具集,可免費獲得且易于獲得。

They don’t hide anything, you can see everything you put in and you can stand by the output, even for safety-critical applications because you can describe how you got your results.

它們不會隱藏任何內容,您可以看到所輸入的所有內容,并且可以支持輸出,即使對于安全性至關重要的應用程序也是如此,因為您可以描述如何獲得結果。

Every available option for squeezing that last 20% out of your model that produces 80% of its value is available to you.

您可以使用每個可用選項來將模型中的最后20%壓縮,從而產生其價值的80%。

Any training you need is free or very low cost at least and is easily accessible 24 hours a day on many different web sites.

您需要的任何培訓至少都是免費的或非常廉價的,并且每天24小時均可在許多不同的網站上輕松訪問。

The most common language adopted by Opensource tools is Python. A language learned at High School, college, and University.

開源工具采用的最常見的語言是Python。 在高中,大學和大學學習的一種語言。

帶有閃亮貼紙的昂貴紙板箱 (Expensive cardboard boxes with a shiny sticker)

This is what commercial AI and ML platforms offer.

這就是商業AI和ML平臺所提供的。

Under the hood, they are employing the same Opensource tools you can access for free. Yes, they have a fancy wrapper around them, a conveyer belt built-in, and a shiny sticker to boot.

在幕后,他們正在使用可以免費訪問的相同開源工具。 是的,它們周圍有精美的包裝紙,內置的傳送帶和引導套。

The only way to access those free tools though, is through the interface the platform provides you with. Its a really pretty interface, but it only gives you access to a fraction of the capability of what the underlying opensource tools are capable of.

但是,訪問這些免費工具的唯一方法是通過平臺提供的界面。 它的界面非常漂亮,但是只允許您訪問底層開源工具所能提供的部分功能。

I can’t think of any commercial data science platform that is not employing Opensource tools at its heart.

我想不出任何沒有真正使用開放源代碼工具的商業數據科學平臺。

The 80/20 ruleThe data scientists that could get that last 20% out of a model for you, are now reduced to dragging, dropping, and clicking a mouse and you're losing 80% of your business value. I hear you say, “but the results are much faster on this vendors platform”, OK, so you’re losing 80% of your business value faster!

80/20規則可以為您從模型中獲得最后20% 收益的數據科學家現在減少為拖放,單擊和單擊鼠標,您將失去80%的業務價值。 我聽到你說,“但是在這個供應商平臺上,結果更快”,好的,因此您損失了80%的業務價值!

Also, ask yourself why is this vendors platform faster, it’s because that last 20% that gets 80% of the value is not the low hanging fruit. It’s complex, it’s why data scientists dedicate their careers to the subject and its why they are invaluable as data scientists and not mouse clickers

另外,問自己為什么這個供應商平臺更快,這是因為最后20%獲得80%的價值的原因并不容易。 這很復雜,這就是為什么數據科學家將自己的職業奉獻給該學科,以及為什么他們作為數據科學家而不是鼠標點擊者而具有不可估量的價值

Where is your business value now?Let’s assume that this commercial platform, by some miracle, could get 100% of the value you can get from unrestricted Opensource tools, where is your business value now? It’s locked into this vendor's platform, a platform your spending a huge amount of money on.

您現在的業務價值在哪里? 讓我們假設這個商業平臺可以奇跡般地從無限制的開源工具中獲得100%的價值,現在您的商業價值在哪里? 它已鎖定在該供應商的平臺中,您在該平臺上花費了大量金錢。

You can’t extract your IP, its been converted into a proprietary format. Even if you could reverse engineer their generated code (see you in court), the best you would get is a result that is missing that last 20% and how long did the reverse engineering take you.

您無法提取您的IP,它已轉換為專有格式。 即使您可以對他們生成的代碼進行逆向工程(法庭上見),您得到的最好結果就是遺漏了最后20%的結果,以及逆向工程花費了您多長時間。

The tail wagging the dogAI and ML are improving all the time. Every few months a new feature comes out that wows the community and offers your business even more potential revenue.

搖擺狗 AI和ML 的尾巴一直在改善。 每隔幾個月就會發布一項新功能,該功能引起了社區的贊譽,并為您的企業提供了更多的潛在收入。

Your vendor's commercial application and UI is so tightly integrated into the older versions of the Opensource software, that you won’t see that update for another 6 to 12 months. Forget it, six months is a lifetime in AI and ML, you just missed that opportunity.

您供應商的商業應用程序和用戶界面是如此緊密地集成到舊版本的開源軟件中,以至于再過6至12個月您都不會看到該更新。 算了,六個月是AI和ML的生命,您只是錯過了這個機會。

Recruitment, retention, and training. Every data scientist you recruit, will, for the most part, come fully trained on the opensource tools that they have been working with for years. Those that are just out of university, will be full of enthusiasm, have fresh ideas. The one thing they all have in common, is they are all experts on the opensource tools sets, that will let them bring their enthusiasm and ideas to reality.

招聘,保留和培訓。 您招募的每位數據科學家都將在很大程度上接受他們多年來使用的開源工具的全面培訓。 那些剛大學畢業的人會充滿熱情,并有新的想法。 他們都有一個共同點,就是他們都是開源工具集的專家,這將使他們將熱情和想法變為現實。

Of course, you're going to tell them in the interview to forget all that knowledge they have worked hard to accrue, you have just invested a lot of money on a proprietary system that has half the data science capability they are used to and which they have never heard of before.

當然,您將在面試中告訴他們,他們會忘記他們辛辛苦苦積累的所有知識,您剛剛在專有系統上投入了很多錢,而該專有系統具有他們慣用的數據科學能力的一半,并且他們從未聽說過。

The long and short is you will find it hard to recruit staff and impossible to recruit talented staff. Any talented staff you currently have will soon be leaving as well.

總而言之,您將很難招募員工,也很難招募有才能的員工。 您目前擁有的所有有才能的員工也將很快離開。

Trust the grassroots. You will very rarely hear a data scientist raving about a commercial data science platform. For that reason, most of the vendors offering these products don’t target the grassroots. They go directly to the senior managers and even CEO looking for a top-down decision. Most CEO’s understand the value of data science, but the details are complex and overwhelming. So when a well-trained salesman scares the living shit out of them with horror stories of Opensource wow’s they tend to believe them.

相信基層。 您很少會聽到數據科學家對商業數據科學平臺大加贊賞。 因此,大多數提供這些產品的供應商都不以基層為目標。 他們直接向高級經理甚至首席執行官尋求自上而下的決定。 大多數首席執行官都了解數據科學的價值,但細節復雜而壓倒性。 因此,當一個訓練有素的推銷員以開放源代碼的恐怖故事嚇them他們的生活時,他們往往會相信它們。

Talk to your own loyal staff before forcing something on them. Find out what opensource tools they currently use and what could be done better if a small investment was made, or they were given the time to design and implement a more suitable stack. After all, they work in your business, they know your requirements, and I guarantee the costs will be orders of magnitude less than paying for a commercial platform.

在強迫他們之前,先與自己的忠實員工交談。 找出他們當前使用哪些開源工具,如果進行少量投資,或者他們有時間設計和實現更合適的堆棧,則可以做得更好。 畢竟,他們在您的企業中工作,知道您的要求,并且我保證成本將比為商業平臺支付的費用少幾個數量級。

綜上所述 (In summary)

If you have got a data science requirement and money to invest, invest it wisely. Invest in talented individuals. Look at how you can make a small investment in infrastructure to get a big payback from the tools they already use. Your skilled staff will make your company more valuable and you will retain 100% of your business IP. You don’t need a high tech cardboard box, the free opensource ones, you already have are the best you can get.

如果您有數據科學方面的要求和資金來進行投資,請明智地進行投資。 投資有才華的人。 看一下如何在基礎架構上進行少量投資,以從他們已經使用的工具中獲得豐厚的回報。 熟練的員工將使您的公司更有價值,并且您將保留100%的業務IP。 您不需要高科技的紙板箱,免費的開源紙板箱,已經是最好的了。

翻譯自: https://medium.com/swlh/why-you-should-avoid-commercial-data-science-platforms-6e9c4b5f3596

數據庫數據過長避免

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/392537.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/392537.shtml
英文地址,請注明出處:http://en.pswp.cn/news/392537.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

mysql case快捷方法_MySQL case when使用方法實例解析

首先我們創建數據庫表: CREATE TABLE t_demo (id int(32) NOT NULL,name varchar(255) DEFAULT NULL,age int(2) DEFAULT NULL,num int(3) DEFAULT NULL,PRIMARY KEY (id)) ENGINEInnoDB DEFAULT CHARSETutf8;插入數據:INSERT INTO t_demo VALUES (1, 張…

【~~~】POJ-1006

很簡單的一道題目,但是引出了很多知識點。 這是一道中國剩余問題,先貼一下1006的代碼。 #include "stdio.h" #define MAX 21252 int main() { int p , e , i , d , n 1 , days 0; while(1) { scanf("%d %d %d %d",&p,&e,&…

Java快速掃盲指南

文章轉自:https://segmentfault.com/a/1190000004817465#articleHeader22 JDK,JRE和 JVM 的區別 JVM:java 虛擬機,負責將編譯產生的字節碼轉換為特定機器代碼,實現一次編譯多處執行; JRE:java運…

xcode擴展_如何將Xcode插件轉換為Xcode擴展名

xcode擴展by Khoa Pham通過Khoa Pham 如何將Xcode插件轉換為Xcode擴展名 (How to convert your Xcode plugins to Xcode extensions) Xcode is an indispensable IDE for iOS and macOS developers. From the early days, the ability to build and install custom plugins ha…

leetcode 861. 翻轉矩陣后的得分(貪心算法)

有一個二維矩陣 A 其中每個元素的值為 0 或 1 。 移動是指選擇任一行或列,并轉換該行或列中的每一個值:將所有 0 都更改為 1,將所有 1 都更改為 0。 在做出任意次數的移動后,將該矩陣的每一行都按照二進制數來解釋,矩…

數據分析團隊的價值_您的數據科學團隊的價值

數據分析團隊的價值This is the first article in a 2-part series!!這是分兩部分的系列文章中的第一篇! 組織數據科學 (Organisational Data Science) Few would argue against the importance of data in today’s highly competitive corporate world. The tech…

mysql 保留5位小數_小猿圈分享-MySQL保留幾位小數的4種方法

今天小猿圈給大家分享的是MySQL使用中4種保留小數的方法,希望可以幫助到大家,讓大家的工作更加方便。1 round(x,d)用于數據x的四舍五入, round(x) ,其實就是round(x,0),也就是默認d為0;這里有個值得注意的地方是,d可以是負數&…

leetcode 842. 將數組拆分成斐波那契序列(回溯算法)

給定一個數字字符串 S&#xff0c;比如 S “123456579”&#xff0c;我們可以將它分成斐波那契式的序列 [123, 456, 579]。 形式上&#xff0c;斐波那契式序列是一個非負整數列表 F&#xff0c;且滿足&#xff1a; 0 < F[i] < 2^31 - 1&#xff0c;&#xff08;也就是…

博主簡介

面向各層次&#xff08;從中學到博士&#xff09;提供GIS和Python GIS案例實驗實習培訓&#xff0c;以解決問題為導向&#xff0c;以項目實戰為主線&#xff0c;以科學研究為思維&#xff0c;不講概念&#xff0c;不局限理論&#xff0c;簡單照做&#xff0c;即學即會。 研究背…

自定義Toast 很簡單就可以達到一些對話框的效果 使用起來很方便

自定義一個layout布局 通過toast.setView 設置布局彈出一些警示框 等一些不會改變的提示框 很方便public class CustomToast {public static void showUSBToast(Context context) {//加載Toast布局 View toastRoot LayoutInflater.from(context).inflate(R.layout.toas…

微信小程序阻止冒泡點擊_微信小程序bindtap事件與冒泡阻止詳解

bindtap就是點擊事件在.wxml文件綁定:cilck here在一個組件的屬性上添加bindtap并賦予一個值(一個函數名)當點擊該組件時, 會觸發相應的函數執行在后臺.js文件中定義tapMessage函數://index.jsPage({data: {mo: Hello World!!,userid : 1234,},// 定義函數tapMessage: function…

同情機器人_同情心如何幫助您建立更好的工作文化

同情機器人Empathy is one of those things that can help in any part of life whether it’s your family, friends, that special person and even also at work. Understanding what empathy is and how it effects people took me long time. I struggle with human inter…

數據庫課程設計結論_結論

數據庫課程設計結論When writing about learning or breaking into data science, I always advise building projects.在撰寫有關學習或涉足數據科學的文章時&#xff0c;我總是建議構建項目。 It is the best way to learn as well as showcase your skills.這是學習和展示技…

mongo基本使用方法

mongo與關系型數據庫的概念對比&#xff0c;區分大小寫&#xff0c;_id為主鍵。 1.數據庫操作 >show dbs #查看所有數據庫 >use dbname #創建和切換數據庫&#xff08;如果dbname存在則切換到該數據庫&#xff0c;不存在則創建并切換到該數據庫&#xff1b;新創建的…

leetcode 62. 不同路徑(dp)

一個機器人位于一個 m x n 網格的左上角 &#xff08;起始點在下圖中標記為“Start” &#xff09;。 機器人每次只能向下或者向右移動一步。機器人試圖達到網格的右下角&#xff08;在下圖中標記為“Finish”&#xff09;。 問總共有多少條不同的路徑&#xff1f; 例如&…

第一名數據科學工作冠狀病毒醫生

背景 (Background) 3 years ago, I had just finished medical school and started working full-time as a doctor in the UK’s National Health Service (NHS). Now, I work full-time as a data scientist at dunnhumby, writing code for “Big Data” analytics with Pyt…

mysql時間區間效率_對于sql中使用to_timestamp判斷時間區間和不使用的效率對比及結論...

關于日期函數TO_TIMESTAMP拓展&#xff1a;date類型是Oracle常用的日期型變量&#xff0c;時間間隔是秒。兩個日期型相減得到是兩個時間的間隔&#xff0c;注意單位是“天”。timestamp是DATE類型的擴展&#xff0c;可以精確到小數秒(fractional_seconds_precision)&#xff0c…

ajax 賦值return

ajax 獲得結果后賦值無法成功&#xff0c; function grades(num){ var name"";   $.ajax({    type:"get",     url:"",     async:true,     success:function(result){     var grades result.grades;     …

JavaScript(ES6)傳播算子和rest參數簡介

by Joanna Gaudyn喬安娜高登(Joanna Gaudyn) JavaScript(ES6)傳播算子和rest參數簡介 (An intro to the spread operator and rest parameter in JavaScript (ES6)) 擴展運算符和rest參數都被寫為三個連續的點(…)。 他們還有其他共同點嗎&#xff1f; (Both the spread opera…

python爬蟲消費者與生產者_Condition版生產者與消費者模式

概述&#xff1a;在人工智能來臨的今天&#xff0c;數據顯得格外重要。在互聯網的浩瀚大海洋中&#xff0c;隱藏著無窮的數據和信息。因此學習網絡爬蟲是在今天立足的一項必備技能。本路線專門針對想要從事Python網絡爬蟲的同學而準備的&#xff0c;并且是嚴格按照企業的標準定…