數據可視化及其重要性:Python

Data visualization is an important skill to possess for anyone trying to extract and communicate insights from data. In the field of machine learning, visualization plays a key role throughout the entire process of analysis.

對于任何試圖從數據中提取和傳達見解的人來說,數據可視化都是一項重要技能。 在機器學習領域,可視化在整個分析過程中都扮演著關鍵角色。

Why do we need to visualize the data?

為什么我們需要可視化數據?

Let’s say, we have data set of Car Sales across four continents in the first 11 months.

假設我們在前11個月擁有四大洲的汽車銷售數據集。

Image for post
Car Sales from Jan to Nov
1月至11月的汽車銷量

It is pretty cumbersome to analyze each column separately and draw some conclusions by the above data. So, what we generally do is, summarize the data and deduce some insights from it. Now, let’s see how the sales have performed in each continent when compared to others, for that, we’ll calculate the average of Discount and Sales for each continent,

分別分析各列并根據上述數據得出一些結論是非常麻煩的。 因此,我們通常要做的是匯總數據并從中得出一些見解。 現在,讓我們看看與其他大陸相比,每個大陸的銷售情況如何,為此,我們將計算每個大陸的折扣和銷售平均值,

Image for post
Average of Discount and Sales
折扣和銷售平均值

It looks like the Sales have been pretty equal across the continents for the first 11 months. Let’s also take a look at the Standard Deviation of each column by further inspecting the data,

前11個月,各大洲的銷售情況似乎相當。 讓我們通過進一步檢查數據來查看每列的標準差,

Image for post
Standard Deviation across the continents
各大洲的標準差

So, by the above data, we can infer that the performance of the sales has been the same when compared to the continents. See, this is where the summary statistics tend to mislead.

因此,根據以上數據,我們可以推斷出與各大洲相比,銷售業績是相同的。 瞧,這就是匯總統計數據容易引起誤解的地方。

If we plot the Sales performance across the Discount rate from the above data in Python on a scatter plot, we get the following graphs.

如果我們根據散點圖上Python中上述數據在折現率上繪制Sales性能,則會得到以下圖形。

Image for post
Scatter Plot
散點圖

Each of the continents had employed a different strategy to boost their sales and their discount rate, and the sales numbers were also quite different across all of them. It is difficult to understand the pattern or the strategy of each of the continents using the numbers alone. So, that’s why it is important to Visualize the data instead of drawing the conclusions based on only numbers.

每個大洲都采用了不同的策略來提高銷售量和折扣率,并且所有銷售量的差異也很大。 僅憑數字很難理解每個大洲的格局或戰略。 因此,這就是為什么要可視化數據而不是僅基于數字得出結論很重要的原因。

The above data-set is a modified version of Anscombe’s quartet, they were constructed in 1973 by the statistician Francis Anscombe, to counter the impression among statisticians that “numerical calculations are exact, but graphs are rough.”

上面的數據集是Anscombe四重奏的修改版本,它們是由統計學家Francis Anscombe于1973年構建的,目的是抵消統計學家的印象,即“數值計算是精確的,但圖形是粗糙的”。

You can find more about Anscombe’s quartet here.

您可以在此處找到有關Anscombe四重奏的更多信息。

So, now comes the million-dollar question,

因此,現在出現了百萬美元的問題,

我們應該使用哪個Python庫進行數據可視化? (Which Python Library should we use for Data Visualization?)

Python has some of the most interactive data visualization tools. The most basic plot types are shared between multiple libraries, but others are only available in certain libraries.

Python具有一些最具交互性的數據可視化工具。 最基本的繪圖類型在多個庫之間共享,但是其他類型僅在某些庫中可用。

The three main data visualization libraries used by every data scientist is:

每個數據科學家使用的三個主要的數據可視化庫是:

  1. Matplotlib

    Matplotlib
  2. Seaborn

    海生
  3. Plotly

    密謀

1. Matplotlib (1. Matplotlib)

Matplotlib is the most popular data visualization library of Python. It is used to generate simple yet powerful visualizations. Everyone, from beginners to seasoned professionals in Data science, Matplotlib is the most widely used library for plotting.

Matplotlib是最受歡迎的Python數據可視化庫。 它用于生成簡單而強大的可視化。 從初學者到經驗豐富的數據科學專業人士,Matplotlib是最廣泛使用的繪圖庫。

Advantages:

優點:

  1. Matplotlib supports various types of graphical representations like Bar Graphs, Histograms, Line Graph, Scatter Plot, Stem Plots, etc.

    Matplotlib支持各種類型的圖形表示,例如條形圖,直方圖,折線圖,散點圖,干圖等。
  2. Matplotlib can be used in multiple ways including Python scripts, the Python and iPython shells, Jupyter Notebooks.

    Matplotlib可以多種方式使用,包括Python腳本,Python和iPython shell,Jupyter Notebooks。
  3. Matplotlib is a 2-D plotting library. But there are some extensions that we can use to create advanced visualizations like 3-Dimensional plots, etc.

    Matplotlib是一個二維繪圖庫。 但是,我們可以使用一些擴展來創建高級可視化效果,例如3維圖等。
Image for post
3D representation using matplotlib
使用matplotlib的3D表示

2. Seaborn (2. Seaborn)

The Python library Seaborn is a data visualization library based on Matplotlib. Seaborn provides a variety of visualization patterns. It is more integrated to work with Pandas dataframe compared to matplotlib. Seaborn is widely used for statistics visualization because it has some of the best statistical tasks built with-in.

Python庫Seaborn是基于Matplotlib的數據可視化庫。 Seaborn提供了多種可視化模式。 與matplotlib相比,它與Pandas數據框的集成度更高。 Seaborn被廣泛用于統計可視化,因為它具有一些內置的最佳統計任務。

Advantages:

優點:

  1. Seaborn uses fewer Syntax and we write less code to achieve high-grade visualizations.

    Seaborn使用的語法更少,我們編寫的代碼更少,可以實現高級可視化。
  2. When compared to matplolib, the seaborn graphs are much more visually appealing by default.

    與matplolib相比,默認情況下,seaborn圖在視覺上更具吸引力。
Image for post
Matplotlib vs Seaborn using same dataset
Matplotlib vs Seaborn使用相同的數據集

3. Seaborn works with the whole dataset as a whole compared to matplotlib which deals with dataframes and arrays.

3.與處理數據幀和數組的matplotlib相比,Seaborn可以處理整個數據集。

3.密謀 (3. Plotly)

Plotly provides interactive plots and is easily readable to an audience who doesn’t have much knowledge of reading plots. Plotly is mostly used for handing the geographical, scientific, statistical, and financial data.

Plotly提供交互式繪圖,對于不了解繪圖的讀者很容易理解。 Plotly主要用于處理地理,科學,統計和財務數據。

Advantages:

優點:

  1. Plotly is highly compatible with Jupyter Notebook and Web-Browsers, which makes it easy to share the graphs with end-users.

    Plotly與Jupyter Notebook和Web瀏覽器高度兼容,這使得與最終用戶輕松共享圖形成為可能。
  2. The most important advantage is Plotly offers contour plots, which cannot be found in most libraries.

    最重要的優點是Plotly提供了等高線圖,這在大多數庫中都找不到。
Image for post
Basic Contour Plot using Plotly
使用Plotly的基本輪廓圖

3. While using Plotly, if we mouse over on the Graph, it shows the values of the axis at that particular point.

3.使用Plotly時,如果將鼠標懸停在Graph上,它將顯示該特定點處的軸值。

There are some more data visualization libraries available in Python like Bokeh, Altair, ggplot, etc. But, the ones mentioned above are the most common and widely used libraries across the world.

Python中還有更多可用的數據可視化庫,例如Bokeh,Altair,ggplot等。但是,上面提到的那些是世界上最常見且使用最廣泛的庫。

結論 (Conclusion)

In this article first, we learned why it is important to visualize the data instead of inferring solely based on datasheets. After that, we have seen the different types of data visualization libraries in Python. There are a wide variety of data visualization tools available in Python apart from the ones discussed and mentioned above. It is important to familiarize yourself with the libraries before proceeding with a particular approach.

首先,在本文中,我們了解了為什么對數據進行可視化而不是僅基于數據表進行推斷很重要。 之后,我們看到了Python中不同類型的數據可視化庫。 除了上面討論和提到的工具外,Python還提供了各種各樣的數據可視化工具。 在繼續使用特定方法之前,一定要熟悉這些庫,這一點很重要。

Thank you for reading and Happy Coding!!!

感謝您的閱讀快樂編碼!!!

在這里查看我以前有關Python的文章 (Check out my previous articles about Python here)

  • Pandas: Python

    熊貓:Python

  • Matplotlib: Python

    Matplotlib:Python

  • NumPy: Python

    NumPy:Python

  • Time Complexity and Its Importance in Python

    時間復雜度及其在Python中的重要性

  • Python Recursion or Recursive Function in Python

    Python中的Python遞歸或遞歸函數

  • Python Programs to check for Armstrong Number (n digit) and Fenced Matrix

    用于檢查Armstrong編號(n位)和柵欄矩陣的Python程序

  • Python: Problems for Basics Reference — Swapping, Factorial, Reverse Digits, Pattern Print

    Python:基本參考問題-交換,階乘,反向數字,圖案打印

翻譯自: https://levelup.gitconnected.com/data-visualization-and-its-importance-python-7599c1092a09

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389373.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389373.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389373.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

熊貓數據集_熊貓邁向數據科學的第三部分

熊貓數據集Data is almost never perfect. Data Scientist spend more time in preprocessing dataset than in creating a model. Often we come across scenario where we find some missing data in data set. Such data points are represented with NaN or Not a Number i…

Pytorch有關張量的各種操作

一,創建張量 1. 生成float格式的張量: a torch.tensor([1,2,3],dtype torch.float)2. 生成從1到10,間隔是2的張量: b torch.arange(1,10,step 2)3. 隨機生成從0.0到6.28的10個張量 注意: (1).生成的10個張量中包含0.0和6.28&#xff…

mongodb安裝失敗與解決方法(附安裝教程)

安裝mongodb遇到的一些坑 浪費了大量的時間 在此記錄一下 主要是電腦系統win10企業版自帶的防火墻 當然還有其他的一些坑 一般的問題在第6步驟都可以解決,本教程的安裝步驟不夠詳細的話 請自行百度或谷歌 安裝教程很多 我是基于node.js使用mongodb結合Robo 3T數…

【洛谷算法題】P1046-[NOIP2005 普及組] 陶陶摘蘋果【入門2分支結構】Java題解

👨?💻博客主頁:花無缺 歡迎 點贊👍 收藏? 留言📝 加關注?! 本文由 花無缺 原創 收錄于專欄 【洛谷算法題】 文章目錄 【洛谷算法題】P1046-[NOIP2005 普及組] 陶陶摘蘋果【入門2分支結構】Java題解🌏題目…

web性能優化(理論)

什么是性能優化? 就是讓用戶感覺你的網站加載速度很快。。。哈哈哈。 分析 讓我們來分析一下從用戶按下回車鍵到網站呈現出來經歷了哪些和前端相關的過程。 緩存 首先看本地是否有緩存,如果有符合使用條件的緩存則不需要向服務器發送請求了。DNS查詢建立…

python多項式回歸_如何在Python中實現多項式回歸模型

python多項式回歸Let’s start with an example. We want to predict the Price of a home based on the Area and Age. The function below was used to generate Home Prices and we can pretend this is “real-world data” and our “job” is to create a model which wi…

充分利用UC berkeleys數據科學專業

By Kyra Wong and Kendall Kikkawa黃凱拉(Kyra Wong)和菊川健多 ( Kendall Kikkawa) 什么是“數據科學”? (What is ‘Data Science’?) Data collection, an important aspect of “data science”, is not a new idea. Before the tech boom, every industry al…

文本二叉樹折半查詢及其截取值

using System;using System.ComponentModel;using System.Data;using System.Drawing;using System.Text;using System.Windows.Forms;using System.Collections;using System.IO;namespace CS_ScanSample1{ /// <summary> /// Logic 的摘要說明。 /// </summary> …

nn.functional 和 nn.Module入門講解

本文來自《20天吃透Pytorch》 一&#xff0c;nn.functional 和 nn.Module 前面我們介紹了Pytorch的張量的結構操作和數學運算中的一些常用API。 利用這些張量的API我們可以構建出神經網絡相關的組件(如激活函數&#xff0c;模型層&#xff0c;損失函數)。 Pytorch和神經網絡…

10.30PMP試題每日一題

SC>0&#xff0c;CPI<1&#xff0c;說明項目截止到當前&#xff1a;A、進度超前&#xff0c;成本超值B、進度落后&#xff0c;成本結余C、進度超前&#xff0c;成本結余D、無法判斷 答案將于明天和新題一起揭曉&#xff01; 10.29試題答案&#xff1a;A轉載于:https://bl…

02-web框架

1 while True:print(server is waiting...)conn, addr server.accept()data conn.recv(1024) print(data:, data)# 1.得到請求的url路徑# ------------dict/obj d["path":"/login"]# d.get(”path“)# 按著http請求協議解析數據# 專注于web業…

ai驅動數據安全治理_AI驅動的Web數據收集解決方案的新起點

ai驅動數據安全治理Data gathering consists of many time-consuming and complex activities. These include proxy management, data parsing, infrastructure management, overcoming fingerprinting anti-measures, rendering JavaScript-heavy websites at scale, and muc…

從Text文本中讀值插入到數據庫中

/// <summary> /// 轉換數據&#xff0c;從Text文本中導入到數據庫中 /// </summary> private void ChangeTextToDb() { if(File.Exists("Storage Card/Zyk.txt")) { try { this.RecNum.Visibletrue; SqlCeCommand sqlCreateTable…

Dataset和DataLoader構建數據通道

重點在第二部分的構建數據通道和第三部分的加載數據集 Pytorch通常使用Dataset和DataLoader這兩個工具類來構建數據管道。 Dataset定義了數據集的內容&#xff0c;它相當于一個類似列表的數據結構&#xff0c;具有確定的長度&#xff0c;能夠用索引獲取數據集中的元素。 而D…

鐵拳nat映射_鐵拳如何重塑我的數據可視化設計流程

鐵拳nat映射It’s been a full year since I’ve become an independent data visualization designer. When I first started, projects that came to me didn’t relate to my interests or skills. Over the past eight months, it’s become very clear to me that when cl…

Django2 Web 實戰03-文件上傳

作者&#xff1a;Hubery 時間&#xff1a;2018.10.31 接上文&#xff1a;接上文&#xff1a;Django2 Web 實戰02-用戶注冊登錄退出 視頻是一種可視化媒介&#xff0c;因此視頻數據庫至少應該存儲圖像。讓用戶上傳文件是個很大的隱患&#xff0c;因此接下來會討論這倆話題&#…

BZOJ.2738.矩陣乘法(整體二分 二維樹狀數組)

題目鏈接 BZOJ洛谷 整體二分。把求序列第K小的樹狀數組改成二維樹狀數組就行了。 初始答案區間有點大&#xff0c;離散化一下。 因為這題是一開始給點&#xff0c;之后詢問&#xff0c;so可以先處理該區間值在l~mid的修改&#xff0c;再處理詢問。即二分標準可以直接用點的標號…

從數據庫里讀值往TEXT文本里寫

/// <summary> /// 把預定內容導入到Text文檔 /// </summary> private void ChangeDbToText() { this.RecNum.Visibletrue; //建立文件&#xff0c;并打開 string oneLine ""; string filename "Storage Card/YD" DateTime.Now.…

DengAI —如何應對數據科學競賽? (EDA)

了解機器學習 (Understanding ML) This article is based on my entry into DengAI competition on the DrivenData platform. I’ve managed to score within 0.2% (14/9069 as on 02 Jun 2020). Some of the ideas presented here are strictly designed for competitions li…

Pytorch模型層簡單介紹

模型層layers 深度學習模型一般由各種模型層組合而成。 torch.nn中內置了非常豐富的各種模型層。它們都屬于nn.Module的子類&#xff0c;具備參數管理功能。 例如&#xff1a; nn.Linear, nn.Flatten, nn.Dropout, nn.BatchNorm2d nn.Conv2d,nn.AvgPool2d,nn.Conv1d,nn.Co…