Data visualization is an important skill to possess for anyone trying to extract and communicate insights from data. In the field of machine learning, visualization plays a key role throughout the entire process of analysis.
對于任何試圖從數據中提取和傳達見解的人來說,數據可視化都是一項重要技能。 在機器學習領域,可視化在整個分析過程中都扮演著關鍵角色。
Why do we need to visualize the data?
為什么我們需要可視化數據?
Let’s say, we have data set of Car Sales across four continents in the first 11 months.
假設我們在前11個月擁有四大洲的汽車銷售數據集。

It is pretty cumbersome to analyze each column separately and draw some conclusions by the above data. So, what we generally do is, summarize the data and deduce some insights from it. Now, let’s see how the sales have performed in each continent when compared to others, for that, we’ll calculate the average of Discount and Sales for each continent,
分別分析各列并根據上述數據得出一些結論是非常麻煩的。 因此,我們通常要做的是匯總數據并從中得出一些見解。 現在,讓我們看看與其他大陸相比,每個大陸的銷售情況如何,為此,我們將計算每個大陸的折扣和銷售平均值,

It looks like the Sales have been pretty equal across the continents for the first 11 months. Let’s also take a look at the Standard Deviation of each column by further inspecting the data,
前11個月,各大洲的銷售情況似乎相當。 讓我們通過進一步檢查數據來查看每列的標準差,

So, by the above data, we can infer that the performance of the sales has been the same when compared to the continents. See, this is where the summary statistics tend to mislead.
因此,根據以上數據,我們可以推斷出與各大洲相比,銷售業績是相同的。 瞧,這就是匯總統計數據容易引起誤解的地方。
If we plot the Sales performance across the Discount rate from the above data in Python on a scatter plot, we get the following graphs.
如果我們根據散點圖上Python中上述數據在折現率上繪制Sales性能,則會得到以下圖形。

Each of the continents had employed a different strategy to boost their sales and their discount rate, and the sales numbers were also quite different across all of them. It is difficult to understand the pattern or the strategy of each of the continents using the numbers alone. So, that’s why it is important to Visualize the data instead of drawing the conclusions based on only numbers.
每個大洲都采用了不同的策略來提高銷售量和折扣率,并且所有銷售量的差異也很大。 僅憑數字很難理解每個大洲的格局或戰略。 因此,這就是為什么要可視化數據而不是僅基于數字得出結論很重要的原因。
The above data-set is a modified version of Anscombe’s quartet, they were constructed in 1973 by the statistician Francis Anscombe, to counter the impression among statisticians that “numerical calculations are exact, but graphs are rough.”
上面的數據集是Anscombe四重奏的修改版本,它們是由統計學家Francis Anscombe于1973年構建的,目的是抵消統計學家的印象,即“數值計算是精確的,但圖形是粗糙的”。
You can find more about Anscombe’s quartet here.
您可以在此處找到有關Anscombe四重奏的更多信息。
So, now comes the million-dollar question,
因此,現在出現了百萬美元的問題,
我們應該使用哪個Python庫進行數據可視化? (Which Python Library should we use for Data Visualization?)
Python has some of the most interactive data visualization tools. The most basic plot types are shared between multiple libraries, but others are only available in certain libraries.
Python具有一些最具交互性的數據可視化工具。 最基本的繪圖類型在多個庫之間共享,但是其他類型僅在某些庫中可用。
The three main data visualization libraries used by every data scientist is:
每個數據科學家使用的三個主要的數據可視化庫是:
- Matplotlib Matplotlib
- Seaborn 海生
- Plotly 密謀
1. Matplotlib (1. Matplotlib)
Matplotlib is the most popular data visualization library of Python. It is used to generate simple yet powerful visualizations. Everyone, from beginners to seasoned professionals in Data science, Matplotlib is the most widely used library for plotting.
Matplotlib是最受歡迎的Python數據可視化庫。 它用于生成簡單而強大的可視化。 從初學者到經驗豐富的數據科學專業人士,Matplotlib是最廣泛使用的繪圖庫。
Advantages:
優點:
- Matplotlib supports various types of graphical representations like Bar Graphs, Histograms, Line Graph, Scatter Plot, Stem Plots, etc. Matplotlib支持各種類型的圖形表示,例如條形圖,直方圖,折線圖,散點圖,干圖等。
- Matplotlib can be used in multiple ways including Python scripts, the Python and iPython shells, Jupyter Notebooks. Matplotlib可以多種方式使用,包括Python腳本,Python和iPython shell,Jupyter Notebooks。
- Matplotlib is a 2-D plotting library. But there are some extensions that we can use to create advanced visualizations like 3-Dimensional plots, etc. Matplotlib是一個二維繪圖庫。 但是,我們可以使用一些擴展來創建高級可視化效果,例如3維圖等。

2. Seaborn (2. Seaborn)
The Python library Seaborn is a data visualization library based on Matplotlib. Seaborn provides a variety of visualization patterns. It is more integrated to work with Pandas dataframe compared to matplotlib. Seaborn is widely used for statistics visualization because it has some of the best statistical tasks built with-in.
Python庫Seaborn是基于Matplotlib的數據可視化庫。 Seaborn提供了多種可視化模式。 與matplotlib相比,它與Pandas數據框的集成度更高。 Seaborn被廣泛用于統計可視化,因為它具有一些內置的最佳統計任務。
Advantages:
優點:
- Seaborn uses fewer Syntax and we write less code to achieve high-grade visualizations. Seaborn使用的語法更少,我們編寫的代碼更少,可以實現高級可視化。
- When compared to matplolib, the seaborn graphs are much more visually appealing by default. 與matplolib相比,默認情況下,seaborn圖在視覺上更具吸引力。

3. Seaborn works with the whole dataset as a whole compared to matplotlib which deals with dataframes and arrays.
3.與處理數據幀和數組的matplotlib相比,Seaborn可以處理整個數據集。
3.密謀 (3. Plotly)
Plotly provides interactive plots and is easily readable to an audience who doesn’t have much knowledge of reading plots. Plotly is mostly used for handing the geographical, scientific, statistical, and financial data.
Plotly提供交互式繪圖,對于不了解繪圖的讀者很容易理解。 Plotly主要用于處理地理,科學,統計和財務數據。
Advantages:
優點:
- Plotly is highly compatible with Jupyter Notebook and Web-Browsers, which makes it easy to share the graphs with end-users. Plotly與Jupyter Notebook和Web瀏覽器高度兼容,這使得與最終用戶輕松共享圖形成為可能。
- The most important advantage is Plotly offers contour plots, which cannot be found in most libraries. 最重要的優點是Plotly提供了等高線圖,這在大多數庫中都找不到。

3. While using Plotly, if we mouse over on the Graph, it shows the values of the axis at that particular point.
3.使用Plotly時,如果將鼠標懸停在Graph上,它將顯示該特定點處的軸值。
There are some more data visualization libraries available in Python like Bokeh, Altair, ggplot, etc. But, the ones mentioned above are the most common and widely used libraries across the world.
Python中還有更多可用的數據可視化庫,例如Bokeh,Altair,ggplot等。但是,上面提到的那些是世界上最常見且使用最廣泛的庫。
結論 (Conclusion)
In this article first, we learned why it is important to visualize the data instead of inferring solely based on datasheets. After that, we have seen the different types of data visualization libraries in Python. There are a wide variety of data visualization tools available in Python apart from the ones discussed and mentioned above. It is important to familiarize yourself with the libraries before proceeding with a particular approach.
首先,在本文中,我們了解了為什么對數據進行可視化而不是僅基于數據表進行推斷很重要。 之后,我們看到了Python中不同類型的數據可視化庫。 除了上面討論和提到的工具外,Python還提供了各種各樣的數據可視化工具。 在繼續使用特定方法之前,一定要熟悉這些庫,這一點很重要。
Thank you for reading and Happy Coding!!!
感謝您的閱讀和快樂編碼!!!
在這里查看我以前有關Python的文章 (Check out my previous articles about Python here)
Pandas: Python
熊貓:Python
Matplotlib: Python
Matplotlib:Python
NumPy: Python
NumPy:Python
Time Complexity and Its Importance in Python
時間復雜度及其在Python中的重要性
Python Recursion or Recursive Function in Python
Python中的Python遞歸或遞歸函數
Python Programs to check for Armstrong Number (n digit) and Fenced Matrix
用于檢查Armstrong編號(n位)和柵欄矩陣的Python程序
Python: Problems for Basics Reference — Swapping, Factorial, Reverse Digits, Pattern Print
Python:基本參考問題-交換,階乘,反向數字,圖案打印
翻譯自: https://levelup.gitconnected.com/data-visualization-and-its-importance-python-7599c1092a09
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389373.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389373.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389373.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!