數據科學還是計算機科學
意見 (Opinion)
目錄 (Table of Contents)
- Introduction 介紹
- Examples 例子
- When You Should Use Data Science 什么時候應該使用數據科學
- Summary 摘要
介紹 (Introduction)
Both Data Science and Machine Learning are useful fields that apply several tools to predict, suggest, classify, and ultimately solve common business problems. You can create highly accurate models that automate previously manual tasks. Data Science can be powerful, saving companies money and time. However, you will find that you do not necessarily need Data Science to solve every problem you encounter. There are certain situations where human intervention is more important, or the situation does not allow for a generalized model.
數據科學和機器學習都是有用的領域,它們應用多種工具來預測,建議,分類并最終解決常見的業務問題。 您可以創建高度精確的模型來自動化以前的手動任務。 數據科學功能強大,可以節省公司的金錢和時間。 但是,您會發現不一定需要數據科學來解決遇到的每個問題。 在某些情況下,人工干預更為重要,或者這種情況不允許使用通用模型。
I will be describing five examples of when not to use Data Science. As a Data Scientist, I have found that I have slowly, over time, learned, or experienced when Data Science and Machine Learning were not necessary. I hope I can shed some light and intuition for you and your future situations.
我將描述五個何時不使用數據科學的示例。 作為數據科學家,我發現我在不需要數據科學和機器學習的過程中逐漸地,逐漸地學習或體驗。 希望我能為您和您的未來情況提供一些啟示和直覺。
例子 (Examples)
There are several examples of when to use Data Science and when not to use Data Science. Here are some situations that come to mind where a Data Science model is not necessary, and possibly could make the situation worse:
有幾個何時使用數據科學以及何時不使用數據科學的示例。 在某些情況下,我想到了不需要數據科學模型的情況,這可能會使情況變得更糟:
Classifying some health implications
分類一些對健康的影響
Depending on the severity of incorrect predictions, utilizing Data Science in some facets of the healthcare field can be extremely costly in a few ways. An example of a Data Science model that results in an incorrect prediction with no harm would be classifying a t-shirt as a sweater, and vice-versa. This incorrect suggestion that would be seen by consumers on an e-commerce site would be unfortunate, but it would not result in harm. Now imagine you created a model to classify cancer. If you classify someone as not having cancer, and they actually did, this result can be extremely harmful. Perhaps human intervention is the best route here or human-in-the-loop (a combination of Data Science and human efforts), rather than Data Science only. A good rule of thumb to know is:
根據錯誤預測的嚴重程度,在醫療保健領域的某些方面使用數據科學可能會在某些方面造成極大的損失。 數據科學模型的一個實例,該實例會導致錯誤的預測而不會造成損害,將T恤衫歸類為毛衣,反之亦然。 消費者在電子商務網站上看到的這個錯誤建議是不幸的,但不會造成危害。 現在假設您創建了一個模型來對癌癥進行分類。 如果您將某人歸類為沒有癌癥,而實際上他們確實患有癌癥,則此結果可能非常有害。 也許人為干預是此處的最佳選擇,還是人與人之間的循環 ( 數據科學和人類努力的結合 ),而不是僅數據科學。 一個好的經驗法則是:
If this model prediction is incorrect, what will be the consequences?
如果此模型預測不正確,將產生什么后果?
However, Data Science, Machine Learning, and AI are constantly evolving and you can expect to see emerging technologies and improvements on model accuracy quickly.
但是,數據科學,機器學習和AI不斷發展,您可以期望看到新興技術和模型準確性的快速提高。
When you don't have enough data
當您沒有足夠的數據時
This example is more common. When you are producing a model, you want to make sure you have sufficient data. Bad data in and a bad model out could occur, and the same could be said about not having enough data that would then produce a bad model. The model could even seem to perform well but it would not generalize well to new situations. You could be overfitting, or simply not exposing the environment to enough possible instances of training data. Before you build a model as well as spend time on development and resources, check to see if you have enough data first.
這個例子比較常見。 制作模型時,您要確保有足夠的數據。 可能會出現壞數據輸入和壞模型輸出的情況,對于沒有足夠的數據會產生壞模型的情況也可以這樣說。 該模型似乎甚至表現良好,但不能很好地推廣到新情況。 您可能過度擬合,或者只是沒有將環境暴露于足夠的訓練數據實例中。 在構建模型以及花時間在開發和資源上之前,請先檢查是否有足夠的數據。
When it’s a one-off task
當是一次性任務時
This example is a little more dependant on the specific situation. You may be asked to perform a Data Science model from a non-technical stakeholder or leader in your company, and perhaps should ask yourself if Data Science is necessary.
這個例子更多地取決于具體情況。 可能會要求公司的非技術利益相關者或負責人執行數據科學模型,并且也許應該問自己是否需要數據科學。
— if you are not outputting results daily, weekly, or even monthly, you may not want to spend the time or creating a complex model that incorporates the scheduling of ingesting new data.
—如果您不是每天,每周甚至每月都不輸出結果,則可能不希望花費時間或創建包含吸收新數據調度的復雜模型。
You could apply similar skills to answer this business problem and suggest to the stakeholder that since you only need to have one outputted CSV file, for example, you can answer the question with a simple Python function (you may not need to go in-to-depth with your stakeholders as to why you are not going to use a Data Science model, as some stakeholders just want an outputted result and do not care how you got it). You may just need a small function that manually mimics the themes of a Data Science model. If you know the situation well, you could create bins or weights yourself and apply those to features or columns and come up with your own score. Here is an example of what I am describing:
您可以應用類似的技能來回答此業務問題,并向涉眾建議,例如,由于您僅需要輸出一個CSV文件,因此可以使用簡單的Python函數來回答問題( 您可能不需要進入與您的利益相關者深入了解為什么不使用數據科學模型,因為一些利益相關者只是想要輸出結果,而不關心您如何獲得它 。 您可能只需要一個小的功能即可手動模仿數據科學模型的主題。 如果您很了解情況,則可以自己創建箱或權重,然后將其應用于要素或列,并得出自己的分數。 這是我正在描述的示例:
Example:.50*(feature_1) + .20*(feature_2) + .30(feature_3) = score (scaled)
While this might not be the most ‘accurate’, if you need a quick way to organize data, a function like this or something similar could be sufficient.
盡管這可能不是最“ 準確 ”的方法,但是如果您需要一種快速的方法來組織數據,那么像這樣的功能或類似的功能就足夠了。
When you don’t have labeled data
當您沒有標簽數據時
Sometimes you may encounter a situation where you want to classify thousands of observations, but you have too much unlabeled data in your dataset. There are ways around this problem like labeling software or unsupervised techniques to create new labels. However, if you find that either using human effort or other software services to label takes up too much time and money, then you may want to reassess the situation. Perhaps you need to perform more data engineering techniques like accessing an API before you implement a Data Science model.
有時您可能會遇到想要對數千個觀測值進行分類的情況,但是數據集中的未標記數據過多。 解決此問題的方法有很多,例如標簽軟件或創建新標簽的無監督技術。 但是,如果您發現使用人工或其他軟件服務進行標記會占用太多時間和金錢,那么您可能需要重新評估情況。 在實現數據科學模型之前,可能需要執行更多的數據工程技術,例如訪問API。
When your budget is tight
當您的預算緊張時
Depending on how much data you are ingesting and predicting, training a model can be expensive. Your company may not have enough resources yet, and an expensive Data Science model not may be feasible.
根據要攝取和預測的數據量,訓練模型可能會很昂貴。 您的公司可能沒有足夠的資源,昂貴的數據科學模型可能不可行。
This point goes along with ‘when you do not have enough time’ as well. You may have a certain deadline that is soon approaching and there are methods other than Data Science that can be beneficial like Python functions and rules.
這一點與“ 當您沒有足夠的時間時 ”也是如此。 您可能有一個即將到來的截止日期,并且除了Data Science之外,還有其他一些方法可能會有益,例如Python函數和規則。
什么時候應該使用數據科學 (When You Should Use Data Science)
There are countless situations when you should use Data Science and Machine Learning. Essentially, you could flip the above examples, or look at if you have an unsupervised, supervised, time-series, etc situation for when you should use Data Science.
在無數情況下,您應該使用數據科學和機器學習。 從本質上講,您可以翻轉上面的示例,或者查看何時使用數據科學時是否處于不受監督,受監督,時間序列等的情況。
You can also apply the above examples but incorporate both Data Science techniques and manual processes as well. Human-in-the-loop is becoming more common as a good bridge between these two practices.
您也可以應用上述示例,但同時要結合數據科學技術和手動過程。 作為這兩種實踐之間的良好橋梁, 環環相扣的人正變得越來越普遍。
Some specific examples of when to use Data Science include, but are not limited to:
何時使用數據科學的一些具體示例包括但不限于:
Recommending movies to users
向用戶推薦電影
Forecasting sales for a company
預測公司的銷售
Analyzing sentiment of reviews
分析評論情緒
Predicting temperature for a given month
預測給定月份的溫度
Etc.
等等。
The examples of ‘when not to use Data Science’ are not to discourage you from utilizing Data Science, but to stress the importance of ‘just because you can, does not mean you should’. Ultimately, it depends on your specific situation and what the output will be affecting. Therefore, each example can be rebutted to be a use case for Data Science given the specific environment.
“ 何時不使用數據科學 ”的例子并不是要阻止您使用數據科學,而是要強調“ 僅僅因為您可以,并不意味著您應該 ”的重要性。 最終,這取決于您的具體情況以及輸出將影響什么。 因此,在特定的環境下,每個示例都可以反駁為Data Science的用例。
摘要 (Summary)

There are caveats to all of these examples, and you may end up using Data Science in these situations. Data Science is evolving and new facets are emerging. Keep in mind, this article is opinion oriented and these points or examples can change quickly. Feel free to comment down below when you think you should or should not use Data Science for a given situation. To summarize, here are all of the five examples of when you should not use Data Science.
所有這些示例都有一些警告,您可能最終在這些情況下使用數據科學。 數據科學正在發展,新的方面正在涌現。 請記住,本文以觀點為導向,這些要點或示例可能會Swift改變。 如果您認為在特定情況下應該或不應該使用Data Science,請在下面隨意評論。 總而言之,以下是您不應該使用數據科學的五個示例。
Classifying some health implicationsWhen you don’t have enough dataWhen it’s a one-off taskWhen you don’t have labeled dataWhen your budget is tight
I hope you enjoyed my article. Thank you for reading!
希望您喜歡我的文章。 感謝您的閱讀!
翻譯自: https://towardsdatascience.com/when-not-to-use-data-science-f2e42a3a77d3
數據科學還是計算機科學
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/390735.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/390735.shtml 英文地址,請注明出處:http://en.pswp.cn/news/390735.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!