數據科學還是計算機科學
什么是數據科學? (What is data science?)
Well, if you have just woken up from a 10-year coma and have no idea what is data science, don’t worry, there’s still time. Many years ago, statisticians had some pretty good ideas for analysing data and getting insights from it, but they lacked the computational power to do it, so their hands were tied. Until one day, when computers managed to catch up with those guys, and made all their dreams come true. All of a sudden, we not only had more data available than ever in history, but we also had powerful machines to perform heavy calculations on this data, allowing statisticians to try out all these new algorithms. Data science is the hip daughter born from this marriage between statistics and computer science. In other words, it is the science of extracting useful patterns from data sets by use of computer power.
好吧,如果您剛從十年昏迷中醒來,不知道什么是數據科學,請不要擔心,還有時間。 許多年前,統計學家在分析數據和從中獲取見解方面有一些相當不錯的主意,但他們缺乏計算能力,因此束手無策。 直到一天,計算機都趕上了這些家伙,并使所有夢想成真。 突然之間,我們不僅擁有比以往任何時候都多的可用數據,而且還擁有功能強大的機器來對這些數據進行大量計算,從而使統計學家可以嘗試所有這些新算法。 數據科學是統計學和計算機科學之間的結合而生的時髦女兒。 換句話說,這是通過使用計算機功能從數據集中提取有用模式的科學。
它是干什么用的? (What is it used for?)
One of the reasons data science is so popular nowadays is the number of possible applications that are emerging.
當今數據科學如此流行的原因之一是正在出現的可能的應用程序數量。
市場營銷和銷售 (Marketing and sales)
A typical use case for data science in marketing is product recommendation. When you check out a product on Amazon and they tell you there’s another product you might like, there is an algorithm behind that recommendation that thinks you will like those products based on what other customers who also saw that product actually bought.
市場營銷中數據科學的典型用例是產品推薦。 當您在Amazon上查看某商品時,他們告訴您可能還會有另一種商品時,該建議背后有一個算法,該算法會根據其他顧客實際購買的商品來認為您會喜歡這些商品。
金融 (Finance)
The most common way that banks use data science methods is for credit risk analysis: back in the day, when someone asked for a loan, usually the banker took a good look at their financial record to decide whether to do it or not. Nowadays, there are sophisticated statistical models that are constantly updated and give a good estimated probability of default, making the whole process a lot faster and more reliable.
銀行使用數據科學方法的最常見方法是進行信用風險分析:過去,當有人要求貸款時,銀行家通常會仔細查看其財務記錄,以決定是否這樣做。 如今,有復雜的統計模型可以不斷更新,并且可以很好地估計違約概率,從而使整個過程變得更快,更可靠。
衛生保健 (Healthcare)
Healthcare is one of the most promising industries when it comes to data science. There is a lot of data being generated by connected wearables such as smartwatches, including calories spent, miles walked and heartbeats. One of the possible applications is tracking variables that can help explain some diseases, and even remind you to go see a doctor if you present a behavior that might indicate a health issue.
就數據科學而言,醫療保健是最有前途的行業之一。 連接的可穿戴設備(例如智能手表)會生成大量數據,包括所消耗的卡路里,行走的距離和心跳。 一種可能的應用是跟蹤變量,這些變量可以幫助解釋某些疾病,甚至提醒您如果出現可能表明健康問題的行為,請去看醫生。
它回答什么問題? (What questions does it answer?)
We can split data science tasks into two main groups: supervised vs. unsupervised learning
我們可以將數據科學任務分為兩大類:有監督與無監督學習

監督學習 (Supervised learning)
Supervised learning comprises all tasks for which we have a target variable, that is, some feature in our data that we already know we want to predict. For example, if we want to explain house prices based on their characteristics (such as number of rooms and floors), or if we want to predict the likelihood that a customer will stop using our services.
監督學習包括我們具有目標變量的所有任務,即我們已經知道要預測的數據中的某些功能。 例如,如果我們要根據房價的特征 (例如房間和樓層數)來解釋房價 ,或者我們要預測客戶停止使用我們的服務的可能性。
無監督學習 (Unsupervised learning)
These are the tasks for when we are not sure of the question we are asking. A typical case is clustering tasks, when we just want to find patterns in the data, not necessarily related to one specific variable (customer segmentation, for instance).
當我們不確定所要提出的問題時,這些就是這些任務。 一種典型的情況是群集任務,當我們只想在數據中查找模式時,不一定與一個特定變量(例如客戶細分)相關。
是誰啊 (Who does it?)
Besides the knowledge required in statistics and computer science, data science also calls for business awareness: no matter how good your algorithms are, they will be useless if they are not applicable in that domain. People who work with data usually fall into three categories, depending on which one of those three areas of expertise they are more focused on:
除了統計和計算機科學所需的知識外,數據科學還要求提高商業意識:無論您的算法有多出色,如果它們不適用于該領域,它們將毫無用處。 處理數據的人員通常分為三類,具體取決于他們更專注于這三個專業領域中的哪一個:
數據分析師 (Data analyst)
Sometimes also called business analyst, this guy knows how to talk to people who don’t work directly with data. He’s usually in charge of translating business needs into data requirements (and data insights into business recommendations). He has an overall understanding of the main data science algorithms, and usually has really good skills in data visualization.
有時也稱為業務分析師,這個人知道如何與不直接使用數據的人交談。 他通常負責將業務需求轉換為數據需求(以及將數據洞察轉換為業務建議)。 他對主要的數據科學算法有全面的了解,并且通常在數據可視化方面具有非常好的技能。
數據工程師 (Data engineer)
This is the person who makes sure the data is collected from all its sources, integrated almost seamlessly into the company’s tech environment and that all the algorithms developed turn well and fast. They almost always come from a tech background, and sometimes have to create dedicated tools to display the data processes, especially if they are to be shared with other stakeholders in the company.
該人員負責確保從所有來源收集數據,幾乎無縫地將其集成到公司的技術環境中,并且確保所開發的所有算法都能快速好轉。 它們幾乎總是來自技術背景,有時必須創建專用工具來顯示數據過程,尤其是要與公司中的其他利益相關者共享它們時。
數據科學家 (Data scientist)
As you can guess from the name, this guy has a deeper understanding of the way most algorithms operate, and which are the best ones for each situation. They probably know more about statistics than the data analyst and the data engineer, but less about the ins and outs of the business or of the process industrialisation. Some companies prefer to hire PhD’s for this position, but it is not always the case.
您可能會從名字中猜到,這個家伙對大多數算法的運行方式有更深入的了解,并且每種情況下最好的算法。 他們可能比數據分析師和數據工程師對統計信息了解更多,但對業務或流程工業化的來龍去脈了解較少。 一些公司更愿意聘請博士學位來擔任這一職位,但并非總是如此。
去哪兒了 (Where is it going?)
In the next few years, we will see much progress in many different domains. By using data, cities will be able to better manage their traffic, their energy consumption and even their police units allocation. By the use of wearables, we’ll be able to exercise, eat and sleep better. And there might be many other possibilities of which we haven’t even thought of.
在接下來的幾年中,我們將在許多不同的領域看到巨大的進步。 通過使用數據,城市將能夠更好地管理其交通,能源消耗甚至警力分配。 通過使用可穿戴設備,我們將能夠更好地運動,飲食和睡眠。 而且可能還有許多其他我們甚至沒有想到的可能性。
However, we will also find out that not everything can be improved with data, and we will soon find out where this limit lies. There will always be an important random component in every human activity or natural phenomenon that will never be tracked by any machine learning algorithm, no matter how sophisticated it is.
但是,我們還將發現并非所有數據都可以改善,而且我們很快就會發現此限制在哪里。 在任何人類活動或自然現象中,總會有一個重要的隨機成分,無論它多么復雜,都不會被任何機器學習算法跟蹤。
This data-driven culture might also cause some important behavioural changes. People are starting to realize how much of their personal lives is being tracked by big companies and the government, and most do not seem to enjoy it. This might lead people to voluntarily downgrade their tech devices, use tools to prevent data collection, and even reduce their overall technology usage. Governments are already aware of these concerns, and regulation is getting stricter all over the world when it comes to people’s privacy. Let’s see in the years to come how this will shape society (the Black Mirror series offer interesting insights into these possibilities).
這種由數據驅動的文化也可能導致一些重要的行為變化。 人們開始意識到大公司和政府正在追蹤他們多少個人生活,而且大多數人似乎并不喜歡它。 這可能會導致人們自愿降級其技術設備,使用工具來防止數據收集,甚至降低其整體技術使用率。 各國政府已經意識到了這些擔憂,并且在涉及人們隱私的世界范圍內,監管越來越嚴格。 讓我們來看看未來幾年這將如何塑造社會(《黑鏡》系列為這些可能性提供了有趣的見解)。
怎么做? (How to do it?)
If you want to learn more about it, I recommend the MIT Press Essential Knowledge series book “Data Science”, by John D. Kelleher and Brendan Tierney. It is a very good introduction to the subject, without getting too technical, to help you see if data science is really for you.
如果您想了解更多信息,我建議由John D. Kelleher和Brendan Tierney撰寫的麻省理工學院出版社基礎知識叢書“數據科學”。 這是對該主題的很好的介紹,并且沒有太多的技術知識,可以幫助您了解數據科學是否真的適合您。
Next in line is “Data Science for Business” by Foster Provost and Tom Fawcett. This one is more focused on business applications and it goes deeper into the details of the algorithms. It will give you a really good grasp of all the possibilities enabled by data-driven decision making.
接下來的是Foster Provost和Tom Fawcett撰寫的“商業數據科學”。 這是更專注于業務應用程序,它更深入地介紹了算法的細節。 它將使您真正掌握數據驅動的決策制定所帶來的所有可能性。
Then, once you got the basics covered, it’s time to study for real: you will almost certainly need to learn to code (if you don’t know it already). The main languages you should focus on are SQL and R or Python. The first one is used to querying databases to extract the data you need, in the right shape. The other two are used for applying the algorithms and creating plots. R was created with a focus on statistics, whereas Python is a more general programming language. To start with, just choose one of the two to concentrate your efforts and, if needed, learn the other one later on.
然后,一旦您掌握了基礎知識,就可以學習真實的東西了:您幾乎肯定需要學習編碼(如果您還不知道的話)。 您應該關注的主要語言是SQL和R或Python。 第一個用于查詢數據庫,以正確的形式提取所需的數據。 其他兩個用于應用算法和創建圖。 R的創建側重于統計數據,而Python是一種更通用的編程語言。 首先,只需選擇兩者之一以集中精力,如果需要,稍后再學習另一種。
A good way to start practicing your skills is Kaggle.com, where you can play with toy datasets and take part into real competitions. It will help you put your knowledge to test and also build a portfolio of your own. However, keep in mind that eventually, you will need to work with real-life cases, it’s a different beast.
Kaggle.com是開始練習技能的一個好方法,您可以在其中玩玩具數據集并參加真實的比賽。 這將幫助您測試知識,并建立自己的投資組合。 但是,請記住,最終,您將需要處理實際案例,這是另一種野獸。
結論 (Conclusion)
Now that you know some of the data science lingo, you are able to go out there and do your own research. The amount of available resources is pretty much endless, and there’s new information coming out every day, so make sure you are always up to date on the new methods and possibilities.
既然您已經了解了一些數據科學術語,那么您就可以在那里進行自己的研究。 可用資源的數量幾乎是無窮無盡的,每天都有新的信息出現,因此請確保您始終了解新的方法和可能性。
翻譯自: https://towardsdatascience.com/data-science-101-99e34bea86c
數據科學還是計算機科學
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388609.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388609.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388609.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!