數據可視化工具_數據可視化

數據可視化工具

Visualizations are a great way to show the story that data wants to tell. However, not all visualizations are built the same. My rule of thumb is stick to simple, easy to understand, and well labeled graphs. Line graphs, bar charts, and histograms always work best. The most recognized libraries for visualizations are matplotlib and seaborn. Seaborn is built on top of matplotlib, so it is worth looking at matplotlib first, but in this article we’ll look at matplotlib only. Let’s get started. First, we will import all the libraries we will be working with.

可視化是顯示數據要講述的故事的好方法。 但是,并非所有可視化文件的構建都是相同的。 我的經驗法則是堅持簡單,易于理解且標簽清晰的圖形。 折線圖,條形圖和直方圖總是最有效。 最受認可的可視化庫是matplotlib和seaborn。 Seaborn是建立在matplotlib之上的,因此值得首先看一下matplotlib,但是在本文中,我們將只看一下matplotlib。 讓我們開始吧。 首先,我們將導入將要使用的所有庫。

import numpy as npimport matplotlib.pyplot as plt%matplotlib inline

We imported numpy, so we can generate random data. From matplotlib, we imported pyplot. If you are working on visualizations in jupyter notebook, you can call the %matplotlib inline command. This will allow jupyter notebook to display your visualizations directly under the code that was ran. If you’d like an interactive chart, you can call the %matplotlib command. This will allow you to manipulate your visualizations such as zoom in, zoom out, and move them around their axis.

我們導入了numpy,因此我們可以生成隨機數據。 從matplotlib中,我們導入了pyplot。 如果要在jupyter Notebook中進行可視化,則可以調用%matplotlib內聯命令。 這將使jupyter Notebook在運行的代碼下直接顯示可視化效果。 如果需要交互式圖表,可以調用%matplotlib命令。 這將允許您操縱可視化效果,例如放大,縮小并圍繞其軸移動它們。

直方圖 (Histograms)

Fist, lets take a look at histograms on matplotlib. We will look at a normal distribution. Let’s build it with numpy and visualize it with matplotlib.

拳頭,讓我們看看matplotlib上的直方圖。 我們將看一個正態分布。 讓我們使用numpy構建它,并使用matplotlib對其進行可視化。

normal_distribution = np.random.normal(0,1,10000)plt.hist(normal_distribution)plt.show()
Image for post

Great! We had a histogram. But, what did we just do? First, we took 10,000 random sample from a distribution of mean 0 and standard deviation of 1. Then, we called the method hist() from matplotlib. Last, we called the show() method to display our figure. However, our histogram looks kind of…squared. Fear not! You can modify the width of each bin with the bins argument. Matplotlib defaults to 10 if an argument isn’t given. There are multiple ways to calculate bins, but I prefer to set it to ‘auto’.

大! 我們有一個直方圖。 但是,我們只是做什么? 首先,我們從均值為0且標準差為1的分布中抽取了10,000個隨機樣本。然后,從matplotlib中調用了hist()方法。 最后,我們調用了show()方法來顯示我們的圖形。 但是,我們的直方圖看起來有點...平方。 不要怕! 您可以使用bins參數修改每個垃圾箱的寬度。 如果未提供參數,則Matplotlib默認為10。 有多種計算垃圾箱的方法,但我更喜歡將其設置為“自動”。

plt.hist(normal_distribution,bins='auto')plt.show()

!

Image for post

Much better. You know what would be twice as fun? If we visualize another distribution, but if we visualize it on the same histogram, then that would be three times as fun. That’s exactly what we are going to do.

好多了。 您知道會帶來兩倍的樂趣嗎? 如果我們可視化另一個分布,但是如果我們在相同的直方圖中可視化它,那將是三倍的樂趣。 這正是我們要做的。

modified_distribution = np.random.normal(3,1.5,10000)plt.hist(normal_distribution,bins='auto',color='purple',label='Purple')plt.hist(modified_distribution,bins='auto',color='green',alpha=.75,label='Green')plt.legend(['Purple','Green'])plt.show()
Image for post

Wow! Now that’s a fun histogram. What just happen? We went from one blue histogram to one purple and one green. Let’s go over what we added. First, we created another distribution named modified_distribution. Then, we changed the color of each distribution with the color argument, passed the labelargument to name each distribution, and we passed the alpha argument to make the green distribution see through. Last, we passed the name of each distribution to the legend() method. When you have more than one set of data on a single chart, it is required to label the data to be able to tell the data apart. In this example, the data can be told apart easy, but in the real world each data can represent things that cannot be identified by color. For example, green can represent height of male college students, and purple the height of female college students. Of course, if that was the case the X axis would be on a different scale.

哇! 現在,這是一個有趣的直方圖。 剛剛發生什么事 我們從一個藍色直方圖變為一個紫色和一個綠色。 讓我們來看看添加的內容。 首先,我們創建了另一個名為modified_distribution的發行版。 然后,我們使用color參數更改每個分布的顏色 ,傳遞label參數以命名每個分布,然后傳遞alpha參數使綠色分布透明。 最后,我們將每個發行版的名稱傳遞給legend()方法。 如果單個圖表上有多個數據集,則需要標記數據以便能夠區分數據。 在此示例中,可以輕松區分數據,但在現實世界中,每個數據都可以代表無法用顏色標識的事物。 例如,綠色可以代表男性大學生的身高,而紫色可以代表女性大學生的身高。 當然,如果是這種情況,則X軸將處于不同的比例。

條形圖 (Bar Charts)

Let’s continue the fun. Now we are going to look at bar charts. This kind of charts are really useful when trying to visualize quantities, so let’s look at an example. In this case we will visualize at how people voted when asked about what type of pet they have or would like to have. Let’s randomly generate data.

讓我們繼續樂趣。 現在我們來看看條形圖。 當試圖可視化數量時,這種圖表非常有用,因此讓我們看一個示例。 在這種情況下,我們將可視化當人們問起他們擁有或想要擁有哪種類型的寵物時人們如何投票。 讓我們隨機生成數據。

options = ['Cats','Dogs','Parrots','Hamsters']
votes = [np.random.randint(10,100) for i in range(len(options)]
votes.sort(reverse=True)

Perfect, we have a list of pets and a list of randomly generated numbers. Notice, we sorted the list in descending order. I like to order list this way because it is easier to see which category is the largest and smallest. Of course, in this example we just ordered the votes without ordering the options that match up to it. In reality, we would have to order both. I found that the easiest way to go about this is to make a dictionary and order the dictionary by values. Click herefor a helpful guide on stackoverflow on how to order dictionaries by values. Now, let’s visualize our data.

完美,我們有一個寵物清單和一個隨機生成的數字清單。 注意,我們以降序對列表進行排序。 我喜歡以此方式訂購商品,因為這樣可以更輕松地查看最大和最小的類別。 當然,在此示例中,我們只是對投票進行了排序,而沒有對與之匹配的選項進行排序。 實際上,我們將必須同時訂購兩者。 我發現最簡單的方法是制作字典并按值對字典進行排序。 單擊此處以獲取有關如何按值對字典進行排序的stackoverflow的有用指南。 現在,讓我們可視化我們的數據。

plt.bar(options,votes)plt.title('Favorite Pet Survey')plt.xlabel('Options')plt.ylabel('Votes')plt.show()
Image for post

Great! We have an amazing looking graph. Notice, we can easily tell cats got the most votes, and hamsters got the least votes. Let’s look at the code. After we defined our X and height, we called the bar() method to build a bar chart. We passed options as X and votes as height. Then, we labeled the title, X axis, and y axis with methods title(), xlabel(), and ylabel() respectively. Easy enough! However, this bar chart looks a bit boring. Let’s make it look fun.

大! 我們有一個驚人的外觀圖。 注意,我們可以很容易地看出貓的得票最多,而倉鼠的得票最少。 讓我們看一下代碼。 定義X和高度后,我們調用bar()方法來構建條形圖。 我們將選項作為X傳遞,將投票作為高度。 然后,我們分別使用方法title(),xlabel()和ylabel()標記標題,X軸和y軸。 很簡單! 但是,此條形圖看起來有些無聊。 讓它看起來有趣。

with plt.style.context('ggplot'):
plt.bar(options,votes)
plt.title('Favorite Pet Survey')
plt.xlabel('Options')
plt.ylabel('Votes')
plt.show()
Image for post

This graph is so much fun. How did we do this? Notice, all our code looks mostly the same, but there is important code we added, and we changed the format. We added the with keyword and the context() method from plt.style to change our chart style. Really cool thing is that it only changes it for everything that’s directly under it and indented. It is important to indent the code after the first line. We used the ggplot style to make our graph more fun. Click here to view all the styles available in matplotlib. If we want to compare two datasets with the same options, it is a little harder than in histograms, but it is equally as fun. Let’s say we want to visualize male vs female vote on each category.

該圖非常有趣。 我們是如何做到的? 注意,我們所有的代碼看起來幾乎相同,但是添加了重要的代碼,并且更改了格式。 我們從plt.style添加了with關鍵字和context()方法來更改圖表樣式。 真正很酷的事情是,它僅針對直接在其下方并縮進的所有內容進行更改。 在第一行之后縮進代碼很重要。 我們使用了ggplot樣式使我們的圖表更加有趣。 單擊此處查看matplotlib中可用的所有樣式。 如果我們想比較兩個具有相同選項的數據集,這比直方圖要難一些,但同樣有趣。 假設我們要形象化每個類別的男性和女性投票。

votes_male = votes
votes_female = [np.random.randint(10,100) for i in range(len(options))]import pandas as pdwith plt.style.context('ggplot'):
pd.DataFrame({'male':votes_male,'female':votes_female,index=options).plot(kind='bar')
plt.title('Favorite Pet Survey (Male vs Female)')
plt.xticks(rotation=0)
plt.ylabel('votes')
plt.show()
Image for post
Image for post

Lots going on here, but you have seen most of it already. Let’s start from the top. First, we renamed the votes data to votes_male, and we generated new data for votes_female. Then, we imported pandas which is a library to work with data frames. We created a data frame for our data with male and female as our columns and pet options as our index. After, we called the plot() method from the data frame and passed bar for the kind arguement, so we can plot a bar chart. With the data frame plot method, it adds X labels for you, but they are at a 90-degree angle. To fix this, you can call the xticks() method from pyplot and pass the argument rotation 0. This will make the text like the graph above.

這里有很多事情,但是您已經看到了大部分。 讓我們從頭開始。 首先,我們將選票數據重命名為voices_male,并為votes_female生成了新數據。 然后,我們導入了pandas,這是一個用于處理數據框的庫。 我們為數據創建了一個數據框,其中男性和女性為列,寵物選項為索引。 之后,我們從數據框中調用plot()方法,并通過柱形圖進行類型爭論,因此可以繪制條形圖。 使用數據框繪圖方法時,它會為您添加X標簽,但它們成90度角。 要解決此問題,您可以從pyplot調用xticks()方法并傳遞參數rotation0。這將使文本像上面的圖形一樣。

線形圖 (Line Graph)

Now, let’s look at line graphs. These graphs are great to visualize how Y changes as X changes. Most commonly, they are used to visualize time series data. In this example, we will visualize how much water a new town uses as their population grows.

現在,讓我們看一下折線圖。 這些圖表非常適合可視化Y隨著X的變化。 最常見的是,它們用于可視化時間序列數據。 在此示例中,我們將可視化一個新城鎮隨著人口增長而消耗的水量。

town_population = np.linspace(0,10,10)
town_water_usage = [i*5 for i in town_population]with plt.style.context('seaborn'):
plt.plot(town_population,town_water_usage)
plt.title('Water Usage of Cool Town by Population')
plt.xlabel('Population (in thousands)')
plt.ylabel('Water usage (in thousand gallons)')
plt.show()
Image for post

What a nice graph! As you can see, we used everything we learned so far to create this graph. The only difference is the method we called is not as intuitive as the other ones. In this case we called the plot() method. We passed our X and Y, labeled our chart, and we visualized it with our show() method. Let’s add more data. This time, we are going to add the water usage of a nearby town.

多么漂亮的圖! 如您所見,我們使用到目前為止所學的所有知識來創建該圖。 唯一的區別是我們調用的方法不像其他方法那樣直觀。 在這種情況下,我們稱為plot()方法。 我們傳遞了X和Y,標記了圖表,然后使用show()方法將其可視化。 讓我們添加更多數據。 這次,我們將增加附近城鎮的用水量。

nearby_town_water_usage = [i*.85 for i in town_water_usage]with plt.style.context('seaborn'):
plt.plot(town_population,town_water_usage,label='Cool Town')
plt.plot(town_population,nearby_town_water_usage,label='Lame Town')
plt.title('Water Usage of Cool Town and Lame Town')
plt.xlabel('Population (in thousands)')
plt.ylabel('Water usage (in thousand gallons)')
plt.legend(['Cool Town','Lame Town'])
plt.show()
Image for post

As you can see we just added another plot(), labeled, each line, updated the title, and we showed a legend of the graph. For the most part is the same process as other graphs. From the graph we can see that Lame Town is actually using less water than Cool town. I guess Lame Town isn’t so lame after all.

如您所見,我們只是添加了另一個plot(),標記為每行,更新了標題,并顯示了圖例。 在大多數情況下,該過程與其他圖形相同。 從圖中可以看出,me腳鎮實際上比涼爽鎮使用的水少。 我猜La子鎮畢竟不是那么la子。

結論 (Conclusion)

We covered some of the basics of visualizing data. We even went into how to generate random data! As you can see these are very versatile and efficient ways of showing data. Nothing too crazy, just old school ways of showing the story that the data tells.

我們介紹了可視化數據的一些基礎知識。 我們甚至研究了如何生成隨機數據! 如您所見,這些是顯示數據的非常通用和有效的方法。 沒什么太瘋狂的了,只是老式的方式來顯示數據所講述的故事。

翻譯自: https://medium.com/@a.colocho/data-visualization-9e151698a921

數據可視化工具

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389274.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389274.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389274.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Android Studio調試時遇見Install Repository and sync project的問題

我們可以看到,報的錯是“Failed to resolve: com.android.support:appcompat-v7:16.”,也就是我們在build.gradle中最后一段中的compile項內容。 AS自動生成的“com.android.support:appcompat-v7:16.”實際上是根據我們的最低版本16來選擇16.x.x及以上編…

Apache Ignite 學習筆記(二): Ignite Java Thin Client

前一篇文章,我們介紹了如何安裝部署Ignite集群,并且嘗試了用REST和SQL客戶端連接集群進行了緩存和數據庫的操作。現在我們就來寫點代碼,用Ignite的Java thin client來連接集群。 在開始介紹具體代碼之前,讓我們先簡單的了解一下Ig…

VGAE(Variational graph auto-encoders)論文及代碼解讀

一,論文來源 論文pdf Variational graph auto-encoders 論文代碼 github代碼 二,論文解讀 理論部分參考: Variational Graph Auto-Encoders(VGAE)理論參考和源碼解析 VGAE(Variational graph auto-en…

IIS7設置

IIS 7.0和IIS 6.0相比改變很大誰都知道,而且在IIS 7.0中用VS2005來調試Web項目也不是什么新鮮的話題,但是我還是第一次運用這個東東,所以在此記下我的一些過程,希望能給更多的后來者帶了一點參考。其實我寫這篇文章時也參考了其他…

tableau大屏bi_Excel,Tableau,Power BI ...您應該使用什么?

tableau大屏biAfter publishing my previous article on data visualization with Power BI, I received quite a few questions about the abilities of Power BI as opposed to those of Tableau or Excel. Data, when used correctly, can turn into digital gold. So what …

python 可視化工具_最佳的python可視化工具

python 可視化工具Disclaimer: I work for Datapane免責聲明:我為Datapane工作 動機 (Motivation) There are amazing articles on data visualization on Medium every day. Although this comes at the cost of information overload, it shouldn’t prevent you …

網絡編程 socket介紹

Socket介紹 Socket是應用層與TCP/IP協議族通信的中間軟件抽象層,它是一組接口。在設計模式中,Socket其實就是一個門面模式,它把復雜的TCP/IP協議族隱藏在Socket接口后面,對用戶來說,一組簡單的接口就是全部。 Socket通…

猿課python 第三天

字典 字典是python中唯一的映射類型,字典對象是可變的,但是字典的鍵是不可變對象,字典中可以使用不同的鍵值字典功能> dict.clear()          -->清空字典 dict.keys()          -->獲取所有key dict.values()      …

在C#中使用代理的方式觸發事件

事件(event)是一個非常重要的概念,我們的程序時刻都在觸發和接收著各種事件:鼠標點擊事件,鍵盤事件,以及處理操作系統的各種事件。所謂事件就是由某個對象發出的消息。比如用戶按下了某個按鈕,某…

BP神經網絡反向傳播手動推導

BP神經網絡過程: 基本思想 BP算法是一個迭代算法,它的基本思想如下: 將訓練集數據輸入到神經網絡的輸入層,經過隱藏層,最后達到輸出層并輸出結果,這就是前向傳播過程。由于神經網絡的輸出結果與實際結果…

使用python和pandas進行同類群組分析

背景故事 (Backstory) I stumbled upon an interesting task while doing a data exercise for a company. It was about cohort analysis based on user activity data, I got really interested so thought of writing this post.在為公司進行數據練習時,我偶然發…

3.Contructor(構造器)模式—精讀《JavaScript 設計模式》Addy Osmani著

同系列友情鏈接: 1.設計模式之初體驗—精讀《JavaScript 設計模式》Addy Osmani著 2.設計模式的分類—精讀《JavaScript 設計模式》Addy Osmani著 Construct(構造器)模式 在經典的面向對象編程語言中,Construtor是一種在內存已分配給該對象的…

BZOJ 3653: 談笑風生(離線, 長鏈剖分, 后綴和)

題意 給你一顆有 \(n\) 個點并且以 \(1\) 為根的樹。共有 \(q\) 次詢問,每次詢問兩個參數 \(p, k\) 。詢問有多少對點 \((p, a, b)\) 滿足 \(p,a,b\) 為三個不同的點,\(p, a\) 都為 \(b\) 的祖先,且 \(p\) 到 \(a\) 的距離不能超過 \(k\) 。 …

搜索引擎優化學習原理_如何使用數據科學原理來改善您的搜索引擎優化工作

搜索引擎優化學習原理Search Engine Optimisation (SEO) is the discipline of using knowledge gained around how search engines work to build websites and publish content that can be found on search engines by the right people at the right time.搜索引擎優化(SEO…

Siamese網絡(孿生神經網絡)詳解

SiameseFCSiamese網絡(孿生神經網絡)本文參考文章:Siamese背景Siamese網絡解決的問題要解決什么問題?用了什么方法解決?應用的場景:Siamese的創新Siamese的理論Siamese的損失函數——Contrastive Loss損失函…

Dubbo 源碼分析 - 服務引用

1. 簡介 在上一篇文章中,我詳細的分析了服務導出的原理。本篇文章我們趁熱打鐵,繼續分析服務引用的原理。在 Dubbo 中,我們可以通過兩種方式引用遠程服務。第一種是使用服務直聯的方式引用服務,第二種方式是基于注冊中心進行引用。…

期權價格的上限和下限

期權按照買方權利性質分為:看漲期權和看跌期權 1、首先,看漲期權的上限和下限 看漲期權價格上限為其標的資產價格。 看漲期權是給予買方一個在未來買入標的資產的權利,如果該權利的價格高于標的資產的價格,那么投資者不如直接購買…

一件登錄facebook_我從Facebook的R教學中學到的6件事

一件登錄facebookBetween 2018 to 2019, I worked at Facebook as a data scientist — during that time I was involved in developing and teaching a class for R beginners. This was a two-day course that was taught about once a month to a group of roughly 15–20 …

SiameseFC超詳解

SiameseFC前言論文來源參考文章論文原理解讀首先要知道什么是SOT?(Siamese要做什么)SiameseFC要解決什么問題?SiameseFC用了什么方法解決?SiameseFC網絡效果如何?SiameseFC基本框架結構SiameseFC網絡結構Si…