算命數據
Real Estate Sale Prices, Regression, and Classification: Data Science is the Future of Fortune Telling
房地產銷售價格,回歸和分類:數據科學是算命的未來
As we all know, I am unusually blessed with totally-real psychic abilities.
眾所周知,我擁有非凡的心理能力。
My background as a psychic extends way back to my childhood. On my sixth birthday, my mother got me a full astrological prediction printed out for the next year of my life. I, of course, was disappointed. Not because I was too young for uncanny predictions of the future. But because, I already had the psychic abilities needed to predict my fate. Each morning, I would read the patterns of cheerio-residue leftover in my breakfast cereal bowls. Obviously. I had a system for making sure my future stayed bright!
我的通靈背景可以追溯到童年時代。 在我的六歲生日那天,母親為我提供了有關生命的第二年的完整的占星術預測。 我當然感到失望。 不是因為我還太年輕,無法對未來做出不可思議的預測。 但是,因為我已經具備了預測命運的心理能力。 每天早晨,我都會在早餐谷物碗中閱讀殘留的麥角酒殘留的圖案。 明顯。 我有一個系統來確保我的前途一片光明!
In all seriousness though, as a 20-year-old young Data Scientist now, I discover more and more similarities between the skills of a fortune teller and a data scientist. Finally, I’ll be able to put my years of useless-seeming, arcane knowledge to good use. You don’t believe me?
嚴肅地說,作為一個現年20歲的年輕數據科學家,我發現算命先生和數據科學家之間的技能越來越相似。 最后,我將能夠充分利用我多年的無用的神秘知識。 你不相信我嗎?
Well algorithms and machine learning are a perfect example of modern fortune telling in practice. Nowadays, the experience of finding invasive amazon ads personally customized to your own interests is near universal:
好的算法和機器學習是實踐中現代算命的完美示例。 如今,找到針對您自己的興趣量身定制的侵入性亞馬遜廣告的經驗幾乎普及了:

Machine learning is the process of teaching a computer to be able to predict future data points from its previous body of information. The main form of machine learning I focused on in my data science project, “Predicting Real Estate Sale Prices with the Ames, Iowa Housing Dataset,” is linear regression. This model creates a line of best fit over the dataset in order to predict the likelihood of a house being a certain price (if it has, say, 20,000 sq. ft., a finished garage, no fence, etc.)
機器學習是教會計算機能夠從其先前的信息主體預測未來數據點的過程。 在我的數據科學項目中,我關注的機器學習的主要形式是“使用愛荷華州住房數據集的Ames預測房地產銷售價格”,是線性回歸。 該模型在數據集中創建一條最合適的線,以預測房屋達到一定價格的可能性(例如,如果房屋有20,000平方英尺,已建成的車庫,沒有圍欄等)。
The following infographic, for example, represents my analysis of the relationship between Real Estate Sale Price (the X-axis) and Gross Living Area (the Y-axis). Outliers have been removed from this particular set of data, helping preserve the quality of my linear regression predictor. This relationship between Sale Price and Gross Living Area, in addition to many other factors that are correlated with Sale Price highly, become my tools to predict how a house of a certain demographic will be priced.
例如,以下信息圖代表我對房地產銷售價格(X軸)和總居住面積(Y軸)之間關系的分析。 已從此特定數據集中刪除了離群值,有助于保持線性回歸預測變量的質量。 銷售價格和總居住面積之間的這種關系,除了與銷售價格高度相關的許多其他因素外,還成為我預測特定人口的房屋如何定價的工具。

Ultimately, my linear regression model became able to predict houses with only a 27,000 Root Mean-Squared Error. This means that for any given house price prediction my model makes, the house’s actual (non-predicted) Sale Price will be on average $27,000 away from my prediction. Given the fact that the majority of houses sell for above $50,000 at least, this amount of error is relatively acceptable. However, my fortune-telling wizard powers now extend even further than just “Linear Regression.”
最終,我的線性回歸模型開始能夠預測只有27,000均方根誤差的房屋。 這意味著,對于我的模型進行的任何給定的房價預測,該房屋的實際(未預測)售價均比我的預測平均低27,000美元。 考慮到大多數房屋的售價至少在50,000美元以上,因此這一誤差是可以接受的。 但是,我算命向導的功能現在不僅可以擴展到“線性回歸”。
I can also use “logistic regression” and “K-Nearest-Neighbors” classifiers to sort data, predicting which camps each of my data points will fall into. For instance, in my data science project “Tinder Problems or Relationship Advice?,” I scrape data from the subreddits for “Tinder” and “Relationship Advice” off of Reddit. Using a variety of Natural Language Processing techniques, I build a model that can predict whether or not that given post originates from “Tinder” or “Relationship Advice.”
我還可以使用“邏輯回歸”和“ K最近鄰”分類器對數據進行排序,以預測我的每個數據點將屬于哪個陣營。 例如,在我的數據科學項目“ Tinder問題或關系建議?”中,我從Reddit的“ Tinder”和“ Relationship Advice”子目錄中抓取了數據。 通過使用各種自然語言處理技術,我建立了一個模型,可以預測給定帖子是源自“ Tinder”還是“ Relationship Advice”。
Now, do I actually have the psychic ability to predict the future with ritual sacrifice? The world may never know. But, thankfully, I can just predict the future with Data Science skills like machine learning. I can create regressions to determine numerical predictions, classifiers to predict categorical outcomes, and I don’t even need to pull out my crystal ball.
現在,我真的有通過儀式犧牲來預測未來的心理能力嗎? 世界可能永遠不會知道。 但是,幸運的是,我可以借助諸如機器學習之類的數據科學技能來預測未來。 我可以創建回歸來確定數值預測,創建分類器來預測分類結果,甚至不需要抽出水晶球。
And even better, unlike arcane sorcery, Data Science grounds all of its predictions in facts and previously gathered data. If anything, that’s the real magic of Data Science. I can take any amount of information in any field and, with enough time and effort, predict the future. What’s more magical than that?
甚至更好的是,與奧術法術不同,數據科學將其所有預測基于事實和先前收集的數據。 如果有的話,那就是數據科學的真正魔力。 我可以在任何領域獲得大量信息,并花費足夠的時間和精力來預測未來。 有什么比這更神奇的?
翻譯自: https://medium.com/@jjp2196/data-scientist-or-fortune-telling-psychic-wizard-from-the-future-5e7a93025fe5
算命數據
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389378.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389378.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389378.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!