統計信息在數據庫中的作用_統計在行業中的作用

統計信息在數據庫中的作用

數據科學與機器學習 (DATA SCIENCE AND MACHINE LEARNING)

Statistics are everywhere, and most industries rely on statistics and statistical thinking to support their business. The interest to grasp on statistics also required to become a successful data scientist. You need to demonstrate your keen on this field of discipline.

統計數據無處不在,大多數行業都依靠統計數據和統計思想來支持其業務。 掌握統計數據的興趣也需要成為一名成功的數據科學家。 您需要表現出對這一學科領域的熱忱。

What is statistics?

什么是統計數據?

It is the subject that includes all features of learning from data. As a methodology, we speak about the means and methods to allow us to work with data and to understand that data. Statisticians employ and develop data analysis methods and continue exploring to understand their properties.

它是包括從數據中學習的所有功能的主題。 作為一種方法論,我們談論允許我們處理數據并理解該數據的方式和方法。 統計人員使用和開發數據分析方法,并繼續探索以了解其屬性。

When will those tools provide insight?When are they possibly misleading?

這些工具何時會提供洞察力?何時可能會產生誤導?

Researchers across all various academic fields, workers in many industries, are implementing and reaching the statistical methodology, and they are providing new approaches and techniques for conducting data analysis. A concise terminology is needed upfront, which is the difference between a statistic and the field of statistics.

各個學術領域的研究人員,許多行業的工人,正在實施并達到統計方法論,他們正在提供進行數據分析的新方法和技術。 首先需要一個簡潔的術語,這是統計和統計領域之間的區別。

We encounter numerical or graphical reports from a collection of data every day. For instance, the average of total students score on the final exam, the proportion of employed and unemployed workers in some countries, or maybe stocks prices fluctuation in a day. These are statistics.

我們每天都會遇到來自數據收集的數字或圖形報告。 例如,在期末考試中學生總數的平均值,某些國家的就業和失業工人比例,或者一天中的股票價格波動。 這些是統計數據。

However, the field of statistics is an academic discipline focusing on research methodology. The essential aspects as statisticians are developing new statistical tools, calculating statistics from data, and collaborating with the specialists to interpret those results in proper ways.

但是,統計學領域是一門專注于研究方法論的學術學科。 統計人員的基本工作是開發新的統計工具,從數據中計算統計數據,并與專家合作以適當的方式解釋這些結果。

Statistics is undoubtedly an evolving field and continuously growing. Furthermore, it provides challenges and opportunities.

統計學無疑是一個不斷發展的領域,并且在不斷增長。 此外,它提供了挑戰和機遇。

In data science, numerous statistical methods’ are under continual study to understand how to use it properly. Lots of new application areas are available, and those areas are leading to the necessity to develop innovative analytical methods. For example, an idea of how to measure the data, and new types of methods available leads to new kinds of data that need analysis. Hence, we are often relying on those advances in computing, not only enabling us to do data analysis but also a more sophisticated analysis of the large volume of data collected.

在數據科學中,正在不斷研究眾多統計方法以了解如何正確使用它。 有許多新的應用領域可用,這些領域導致開發創新的分析方法的必要性。 例如,關于如何測量數據的想法以及可用的新型方法會導致需要分析的新型數據。 因此,我們經常依靠那些在計算上的進步,不僅使我們能夠進行數據分析,而且能夠對收集到的大量數據進行更復雜的分析。

Statistics is a significant discipline, especially for data scientists and there are numerous schools thought about the field of statistics. It is including brand-new ideas from theory, practical, and relevant fields.

統計學是一門重要的學科,特別是對于數據科學家而言,并且有許多流派思考統計學領域。 它包括來自理論,實踐和相關領域的嶄新想法。

Numerous viewpoints on the field of statistics are:* The ability of summarizing data
* The idea of uncertainty
* The idea of decisions
* The idea of variation
* The art of forecasting
* The approach of measurement
* The principle of data collection

匯總數據的能力 (The Ability of Summarizing Data)

Data can be terrifying because there is a condition to understand that data, which generally involves reducing and summarizing. The main goal of the data reduction is to make the dataset comprehensible to the human observer. Statisticians have different techniques for summarizing that data, which is required to achieve the goals for the data to be meaningful. Therefore, a statistician is well trained in using appropriate, precise, and effective methods for summarizing data.

數據之所以令人恐懼,是因為有一種條件來理解該數據,這通常涉及精簡和匯總。 數據約簡的主要目的是使數據集對于人類觀察者而言是可理解的。 統計人員使用不同的技術來匯總數據,這是實現數據有意義的目標所必需的。 因此,統計學家在使用適當,精確和有效的方法來匯總數據方面受過良好的培訓。

不確定性的想法 (The Idea of Uncertainty)

Data can be misleading. The primary purpose of developing the statistics fields is to get a structure and framework for evaluating data. Generally, insights from data are not 100% accurate, but it’s absurd that we have a way to quantify how far away reported findings may be from the truth. Some evaluation reports return with a margin of error. This margin of error gives an idea of what that possible variance will be between the published and the actual cases of public opinion.

數據可能會產生誤導。 開發統計信息字段的主要目的是獲得用于評估數據的結構和框架。 通常,來自數據的見解并不是100%準確的,但是我們有一種方法可以量化所報告的發現與事實之間的距離是荒謬的。 一些評估報告以誤差幅度返回。 這種誤差幅度使人們了解了公開發表的輿論與實際情況之間可能存在的差異。

決策思想 (The Idea of Decisions)

Understanding data is critical, leads to the need to be able to work on what we’ve discovered. There are some domains of statistics where that idea of decision-making is the ultimate goal of any statistical analysis. In the personal and professional journey, we are making decisions in the face of difficulty. We have to compare what are the costs and the benefits of the different approaches.

了解數據至關重要,因此需要能夠對我們發現的內容進行處理。 在某些統計領域中,決策思想是任何統計分析的最終目標。 在個人和專業旅程中,我們面對困難時要做出決定。 我們必須比較不同方法的成本和收益。

For example, if a person finds that they might be at higher than average risk for a specific type of cancer, should they undergo a preventative procedure? Statistics can help in the decision-making process.

例如,如果某人發現自己患某種特定癌癥的風險可能高于平均風險,那么他們是否應該采取預防措施? 統計可以幫助決策過程。

變化的想法 (The Idea of Variation)

When we summarize data, commonly, our primary focus is on typical or central value. This means we have to place a high emphasis on understanding variation in data from a statistics perspective. For instance, if you know that on average Americans have around $8,000 of credit card bills each month, you have a good idea of that central value for credit card debt distribution. If you are provided that about 10 per cent more, that percentile gives you a bit more information about the variability in credit card debt.

通常,當我們匯總數據時,我們的主要重點是典型值或中心值。 這意味著我們必須高度重視從統計角度來理解數據的變化。 例如,如果您知道美國人平均每個月有大約8,000美元的信用卡賬單,那么您應該很好地了解信用卡債務分配的核心價值。 如果提供給您的信息大約多10%,則該百分比為您提供了更多有關信用卡債務可變性的信息。

預測的藝術 (The Art of Forecasting)

The fundamental responsibilities in statistics are forecasting or prediction. You don’t know the future with absolute certainty. Still, if you have effectively used the available data, it sometimes makes reasonably accurate predictions, such as weather predictions, stock market prices forecasting, and predicting the risk of a flood. Furthermore, trying to calculate future requests for the new product distribute to the market or predicting the outcome of an election.

統計的基本職責是預測或預測。 您無法絕對確定未來。 但是,如果您有效地使用了可用數據,它有時仍會做出相當準確的預測,例如天氣預報,股市價格預測以及洪水風險。 此外,嘗試計算對新產品向市場發布的未來要求或預測選舉結果。

測量方法 (The Approach of Measurement)

Let’s say that you are collecting lots of data. Some of those variables are measured, and some of those can be measured with pretty high accuracy. A person’s age or height, and some variables are a little bit more challenging to measure. For instance, blood pressure varies minute to minute, so that’s a little bit more difficult to pin down. Also, there are those constructs such as mood, personality, and political ideology, which are much more difficult to define and quantify. Statistics play a significant role in constructing and evaluating useful approaches for measuring these difficulties in identifying concepts and assessing the quality of the various methods.

假設您正在收集大量數據。 這些變量中的一些是可以測量的,而某些變量可以非常高精度地測量。 一個人的年齡或身高以及一些變量的測量更具挑戰性。 例如,血壓每分鐘變化一次,因此很難確定。 此外,還有諸如情緒,個性和政治意識形態等結構,這些結構很難定義和量化。 統計在構建和評估有用的方法中起著重要作用,這些方法可用來衡量在確定概念和評估各種方法的質量方面的這些困難。

數據收集原理 (The Principle of Data Collection)

Finally, statistics are the basis for principled data collection. Sometimes data can be costly and painful to collect. Resources restrict how much data can be obtained, which means if we have too little data, the findings will not be maximized. However, statistics provide an excellent way to manage this trade-off. You can get more data while knowing and allowing those resource limitations.

最后,統計數據是有原則的數據收集的基礎。 有時,數據收集起來可能既昂貴又痛苦。 資源限制了可獲取的數據量,這意味著如果我們的數據量太少,結果將不會被最大化。 但是,統計數據提供了一種管理這種折衷的極好方法。 在了解并允許這些資源限制的同時,您可以獲取更多數據。

Image for post
Jeremy Zero on Jeremy Zero的UnsplashUnsplash圖片

Back in ancient times, civilizations have been gathering data on harvests and population sizes. Right now, randomness and variation can be more mathematically defined. Modern statistics developed in the 19th century, coming from addressing topics from genetics, econometrics, and statistical theory progress in the 20th century with many new application areas in science and industry. For example, the appearance of the ability to have computers to do the data analysis. Next, the rise of Big Data, massive data, data science, and machine learning.

早在遠古時代,文明就一直在收集有關收成和人口規模的數據。 現在,隨機性和變異性可以在數學上進行更多定義。 現代統計學是在19世紀發展起來的,它來自于20世紀遺傳學,計量經濟學和統計理論進展的主題,在科學和工業中有許多新的應用領域。 例如,外觀具有讓計算機進行數據分析的能力。 接下來,大數據,海量數據,數據科學和機器學習的興起。

Statistics positively has a lot of intersections with it’s allied fields.

積極地,統計數據與其相關領域有很多交集。

Computer science provides us with the algorithms, the structures for working with data, and the programming languages for manipulating that data. In mathematics, we get the language and the figures for showing some of these statistical concepts more concisely, and the tools to evaluate and interpret the properties of those analytical methods.

計算機科學為我們提供了算法,用于處理數據的結構以及用于處理該數據的編程語言。 在數學中,我們獲得了用于更簡潔地顯示某些統計概念的語言和圖形,以及用于評估和解釋這些分析方法的屬性的工具。

One branch of mathematics is probability theory, a critical part of the foundation of statistics that allows us to reveal the ideas of randomness and uncertainty.

數學的一個分支是概率論,它是統計學基礎的關鍵部分,它使我們能夠揭示隨機性和不確定性的思想。

Then data science, which gives us the database management and machine learning, which infrastructure able to carry out data analysis.

然后是數據科學,它為我們提供了數據庫管理和機器學習,哪些基礎架構能夠執行數據分析。

結論 (Conclusion)

Statistics have evolved from a small to be a significant allied in research and industry. Numerous different applications include computer vision, self-driving cars, facial recognition, recommender systems for online searching, and online purchasing.

在研究和行業中,統計數據已從很小的演變為重要的聯盟。 許多不同的應用程序包括計算機視覺,自動駕駛汽車,面部識別,在線搜索的推薦系統和在線購買。

In the health domain, we have predictive and analytics, precision medicine, fraud detection, risk assessment in environment and infrastructure, social and government services in terms of job training, and behavioural therapy. Statistics and statistical thinking help us to understand that data and that information that surrounds us.

在健康領域,我們提供預測和分析,精準醫學,欺詐檢測,環境和基礎設施中的風險評估,在職業培訓方面的社會和政府服務以及行為療法。 統計和統計思考有助于我們理解周圍的數據和信息。

關于作者 (About the Author)

Wie Kiang is a researcher who is responsible for collecting, organizing, and analyzing opinions and data to solve problems, explore issues, and predict trends.

Wie Kiang是一名研究人員,負責收集,組織和分析意見和數據以解決問題,探索問題和預測趨勢。

He is working in almost every sector of Machine Learning and Deep Learning. He is carrying out experiments and investigations in a range of areas, including Convolutional Neural Networks, Natural Language Processing, and Recurrent Neural Networks.

他幾乎在機器學習和深度學習的每個領域工作。 他正在許多領域進行實驗和研究,包括卷積神經網絡,自然語言處理和遞歸神經網絡。

翻譯自: https://towardsdatascience.com/the-role-of-statistics-in-the-industry-d360f3056e4b

統計信息在數據庫中的作用

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390765.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390765.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390765.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

IOS手機關于音樂自動播放問題的解決辦法

2019獨角獸企業重金招聘Python工程師標準>>> 評估手機自帶瀏覽器不能識別 aduio標簽重的autoplay屬性 也不能自動執行play()方法 一個有效的解決方案是在微信jssdk中調用play方法 document.addEventListener("WeixinJSBridgeReady", function () { docum…

svg標簽和svg文件區別_什么是SVG文件? SVG圖片和標簽說明

svg標簽和svg文件區別SVG (SVG) SVG or Scalable Vector Graphics is a web standard for defining vector-based graphics in web pages. Based on XML the SVG standard provides markup to describe paths, shapes, and text within a viewport. The markup can be embedded…

開發人員怎么看實施人員

英文原文:What Developers Think Of Operations,翻譯:張紅月CSDN 在一個公司里面,開發和產品實施對于IS/IT的使用是至關重要的,一個負責產品的研發工作,另外一個負責產品的安裝、調試等工作。但是在開發人員…

怎么評價兩組數據是否接近_接近組數據(組間)

怎么評價兩組數據是否接近接近組數據(組間) (Approaching group data (between-group)) A typical situation regarding solving an experimental question using a data-driven approach involves several groups that differ in (hopefully) one, sometimes more variables.使…

代碼審計之DocCms漏洞分析

0x01 前言 DocCms[音譯:稻殼Cms] ,定位于為企業、站長、開發者、網絡公司、VI策劃設計公司、SEO推廣營銷公司、網站初學者等用戶 量身打造的一款全新企業建站、內容管理系統,服務于企業品牌信息化建設,也適應用個人、門戶網站建設…

你讓,勛爵? 使用Jenkins聲明性管道的Docker中的Docker

Resources. When they are unlimited they are not important. But when theyre limited, boy do you have challenges! 資源。 當它們不受限制時,它們并不重要。 但是,當他們受到限制時,男孩你有挑戰! Recently, my team has fa…

翻譯(九)——Clustered Indexes: Stairway to SQL Server Indexes Level 3

原文鏈接:www.sqlservercentral.com/articles/StairwaySeries/72351/ Clustered Indexes: Stairway to SQL Server Indexes Level 3 By David Durant, 2013/01/25 (first published: 2011/06/22) The Series 本文是階梯系列的一部分:SQL Server索引的階梯…

power bi 中計算_Power BI中的期間比較

power bi 中計算Just recently, I’ve come across a question on the LinkedIn platform, if it’s possible to create the following visualization in Power BI:就在最近,我是否在LinkedIn平臺上遇到了一個問題,是否有可能在Power BI中創建以下可視化…

-Hive-

Hive定義 Hive 是一種數據倉庫技術,用于查詢和管理存儲在分布式環境下的大數據集。構建于Hadoop的HDFS和MapReduce上,用于管理和查詢分析結構化/非結構化數據的數據倉庫; 使用HQL(類SQL語句)作為查詢接口;使用HDFS作…

CentOS 7 安裝 JDK

2019獨角獸企業重金招聘Python工程師標準>>> 1、下載oracle jdk 下載地址: http://www.oracle.com/technetwork/java/javase/downloads/index.html 選擇同一協議,下載rpm格式版本jdk,或tar.gz格式jdk。 2、卸載本機openjdk 2.1、查…

javascript 布爾_JavaScript布爾說明-如何在JavaScript中使用布爾

javascript 布爾布爾型 (Boolean) Booleans are a primitive datatype commonly used in computer programming languages. By definition, a boolean has two possible values: true or false.布爾值是計算機編程語言中常用的原始數據類型。 根據定義,布爾值有兩個…

如何進行數據分析統計_對您不了解的數據集進行統計分析

如何進行數據分析統計Recently, I took the opportunity to work on a competition held by Wells Fargo (Mindsumo). The dataset provided was just a bunch of numbers in various columns with no indication of what the data might be. I always thought that the analys…

經典:區間dp-合并石子

題目鏈接 :http://acm.nyist.edu.cn/JudgeOnline/problem.php?pid737 這個動態規劃的思是,要得出合并n堆石子的最優答案可以從小到大枚舉所有石子合并的最優情況,例如要合并5堆石子就可以從,最優的23和14中得到最佳的答案。從兩堆…

常見排序算法_解釋的算法-它們是什么以及常見的排序算法

常見排序算法In its most basic form, an algorithm is a set of detailed step-by-step instructions to complete a task. For example, an algorithm to make coffee in a french press would be:在最基本的形式中,算法是一組完成任務的詳細分步說明。 例如&…

020-Spring Boot 監控和度量

一、概述 通過配置使用actuator查看監控和度量信息 二、使用 2.1、建立web項目&#xff0c;增加pom <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency> 啟動項目&a…

matplotlib布局_Matplotlib多列,行跨度布局

matplotlib布局For Visualization in Python, Matplotlib library has been the workhorse for quite some time now. It has held its own even after more nimble rivals with easier code interface and capabilities like seaborn, plotly, bokeh etc. have arrived on the…

Hadoop生態系統

大數據架構-Lambda Lambda架構由Storm的作者Nathan Marz提出。旨在設計出一個能滿足實時大數據系統關鍵特性的架構&#xff0c;具有高容錯、低延時和可擴展等特性。Lambda架構整合離線計算和實時計算&#xff0c;融合不可變性&#xff08;Immutability&#xff09;&#xff0c…

javascript之 原生document.querySelector和querySelectorAll方法

querySelector和querySelectorAll是W3C提供的 新的查詢接口&#xff0c;其主要特點如下&#xff1a; 1、querySelector只返回匹配的第一個元素&#xff0c;如果沒有匹配項&#xff0c;返回null。 2、querySelectorAll返回匹配的元素集合&#xff0c;如果沒有匹配項&#xff0c;…

RDBMS數據定時采集到HDFS

[toc] RDBMS數據定時采集到HDFS 前言 其實并不難&#xff0c;就是使用sqoop定時從MySQL中導入到HDFS中&#xff0c;主要是sqoop命令的使用和Linux腳本的操作這些知識。 場景 在我們的場景中&#xff0c;需要每天將數據庫中新增的用戶數據采集到HDFS中&#xff0c;數據庫中有tim…

單詞嵌入_神秘的文本分類:單詞嵌入簡介

單詞嵌入Natural language processing (NLP) is an old science that started in the 1950s. The Georgetown IBM experiment in 1954 was a big step towards a fully automated text translation. More than 60 Russian sentences were translated into English using simple…