熊貓tv新功能介紹_您應該知道的4種熊貓繪圖功能

熊貓tv新功能介紹

Pandas is a powerful package for data scientists. There are many reasons we use Pandas, e.g. Data wrangling, Data cleaning, and Data manipulation. Although, there is a method that rarely talks about regarding Pandas package and that is the Data plotting.

Pandas是數據科學家的強大工具包。 我們使用Pandas的原因很多,例如數據整理,數據清理和數據操作。 雖然,有一種方法很少談論有關Pandas軟件包的問題,??那就是Data plotting

Data plotting, just like the name implies, is a process to plot the data into some graph or chart to visualise the data. While we have much fancier visualisation package out there, some method is just available in the pandas plotting API.

顧名思義,數據繪制是將數據繪制到某些圖形或圖表中以可視化數據的過程。 雖然我們有很多更好的可視化程序包,但熊貓繪圖API中僅提供了一些方法。

Let’s see a few selected method I choose.

讓我們看看我選擇的一些選定方法。

1.拉德維茲 (1. radviz)

RadViz is a method to visualise N-dimensional data set into a 2D plot. The problem where we have more than 3-dimensional (features) data or more is that we could not visualise it, but RadViz allows it to happen.

RadViz是一種將N維數據集可視化為2D圖的方法。 我們擁有超過3維(特征)數據或更多數據的問題是我們無法可視化它,但是RadViz允許它發生。

According to Pandas, radviz allows us to project an N-dimensional data set into a 2D space where the influence of each dimension can be interpreted as a balance between the importance of all dimensions. In a simpler term, it means we could project a multi-dimensional data into a 2D space in a primitive way.

根據Pandas的說法,radviz允許我們將N維數據集投影到2D空間中,其中每個維的影響可以解釋為所有維的重要性之間的平衡。 簡單來說,這意味著我們可以以原始方式將多維數據投影到2D空間中

Let’s try to use the function in a sample dataset.

讓我們嘗試在樣本數據集中使用該函數。

#RadViz example
import pandas as pd
import seaborn as sns#To use the pd.plotting.radviz, you need a multidimensional data set with all numerical columns but one as the class column (should be categorical).mpg = sns.load_dataset('mpg')pd.plotting.radviz(mpg.drop(['name'], axis =1), 'origin')
Image for post
RadViz Result
RadViz結果

Above is the result of RadViz function, but how you would interpret the plot?

上面是RadViz函數的結果,但是如何解釋該圖呢?

So, each Series in the DataFrame is represented as an evenly distributed slice on a circle. Just look at the example above, there is a circle with the series name.

因此,DataFrame中的每個Series均表示為圓上均勻分布的切片。 只要看一下上面的例子,就會有一個帶有系列名稱的圓圈。

Each data point then is plotted in the circle according to the value on each Series. Highly correlated Series in the DataFrame are placed closer on the unit circle. In the example, we could see the japan and europe car data are closer to the model_year while the usa car is closer to the displacement. It means japan and europe car are most likely correlated to the model_year while usa car is with the displacement.

然后,根據每個系列的值將每個數據點繪制在圓圈中。 DataFrame中高度相關的Series位于單位圓上。 在示例中,我們可以看到日本和歐洲的汽車數據更接近model_year,而美國汽車的數據更接近排量。 這意味著日本和歐洲的汽車最有可能與model_year相關,而美國汽車則與排量相關。

If you want to know more about RadViz, you could check the paper here.

如果您想了解有關RadViz的更多信息,可以在此處查看該論文。

2. bootstrap_plot (2. bootstrap_plot)

According to Pandas, the bootstrap plot is used to estimate the uncertainty of a statistic by relying on random sampling with replacement. In simpler words, it is used to trying to determine the uncertainty in fundamental statistic such as mean and median by resampling the data with replacement (you could sample the same data multiple times). You could read more about bootstrap here.

根據Pandas的說法, 引導程序圖依賴于隨機抽樣和替換來估計統計的不確定性。 用簡單的話來說, 它用于嘗試通過替換對數據進行重采樣來確定基本統計數據的不確定性,例如均值和中位數 (您可以多次采樣同一數據)。 您可以在此處閱讀有關引導的更多信息。

The boostrap_plot function will generate bootstrapping plots for mean, median and mid-range statistics for the given number of samples of the given size. Let’s try using the function with an example dataset.

boostrap_plot函數將為給定大小的給定數量的樣本生成均值,中值和中間范圍統計量的自舉圖。 讓我們嘗試將函數與示例數據集一起使用。

For example, I have the mpg dataset and already have the information regarding the mpg feature data.

例如,我有mpg數據集,并且已經有了有關mpg特征數據的信息。

mpg['mpg'].describe()
Image for post

We could see that the mpg mean is 23.51 and the median is 23. Although this is just a snapshot of the real-world data. How are the values actually is in the population is unknown, that is why we could measure the uncertainty with the bootstrap methods.

我們可以看到mpg平均值為23.51,中位數為23。盡管這只是真實數據的快照。 實際值如何在總體中是未知的,這就是為什么我們可以使用自舉法來測量不確定性的原因。

#bootstrap_plot examplepd.plotting.bootstrap_plot(mpg['mpg'],size = 50 , samples = 500)
Image for post

Above is the result example of bootstap_plot function. Mind that the result could be different than the example because it relies on random resampling.

上面是bootstap_plot函數的結果示例。 請注意,結果可能與示例不同,因為它依賴于隨機重采樣。

We could see in the first set of the plots (first row) is the sampling result, where the x-axis is the repetition, and the y-axis is the statistic. In the second set is the statistic distribution plot (Mean, Median and Midrange).

我們可以在第一組圖(第一行)中看到采樣結果,其中x軸是重復項,y軸是統計量。 第二組是統計分布圖(均值,中位數和中位數)。

Take an example of the mean, most of the result is around 23, but it could be between 22.5 and 25 (more or less). This set the uncertainty in the real world that the mean in the population could be between 22.5 and 25. Note that there is a way to estimate the uncertainty by taking the values in the position 2.5% and 97.5% quantile (95% confident) although it is still up to your judgement.

以平均值為例,大多數結果在23左右,但可能在22.5到25之間(或多或少)。 這設置了現實世界中的不確定性,即總體平均值可能在22.5和25之間。請注意,盡管有2.5%和97.5%的分位數(95%的置信度),但是有一種方法可以估計不確定性這仍然取決于您的判斷。

3. lag_plot (3. lag_plot)

A lag plot is a scatter plot for a time series and the same data lagged. Lag itself is a fixed amount of passing time; for example, lag 1 is a day 1 (Y1) with a 1-day time lag (Y1+1 or Y2).

滯后圖是時間序列的散點圖,并且相同數據滯后。 滯后本身是固定的通過時間; 例如,滯后1是第1天(Y1),時滯為1天(Y1 + 1或Y2)。

A lag plot is used to checks whether the time series data is random or not, and if the data is correlated with themselves. Random data should not have any identifiable patterns, such as linear. Although, why we bother with randomness or correlation? This is because many Time Series models are based on the linear regression, and one assumption is no correlation (Specifically is no Autocorrelation).

滯后圖用于檢查時間序列數據是否隨機,以及數據是否與自身相關。 隨機數據不應具有任何可識別的模式,例如線性。 雖然,為什么我們要擾亂隨機性或相關性? 這是因為許多時間序列模型都基于線性回歸,并且一個假設是不相關的(特別是沒有自相關)。

Let’s try with an example data. In this case, I would use a specific package to scrap stock data from Yahoo Finance called yahoo_historical.

讓我們嘗試一個示例數據。 在這種情況下,我將使用一個名為yahoo_historical的特定程序包從Yahoo Finance抓取股票數據。

pip install yahoo_historical

With this package, we could scrap a specific stock data history. Let’s try it.

有了這個軟件包,我們可以抓取特定的庫存數據歷史記錄。 讓我們嘗試一下。

from yahoo_historical import Fetcher#We would scrap the Apple stock data. I would take the data between 1 January 2007 to 1 January 2017 
data = Fetcher("AAPL", [2007,1,1], [2017,1,1])
apple_df = data.getHistorical()#Set the date as the index
apple_df['Date'] = pd.to_datetime(apple_df['Date'])
apple_df = apple_df.set_index('Date')
Image for post

Above is our Apple stock dataset with the date as the index. We could try to plot the data to see the pattern over time with a simple method.

上面是我們的Apple股票數據集,其中以日期為索引。 我們可以嘗試使用一種簡單的方法來繪制數據以查看隨時間變化的模式。

apple_df['Adj Close'].plot()
Image for post

We can see the Adj Close is increasing over time but is the data itself shown any pattern in with their lag? In this case, we would use the lag_plot.

我們可以看到,隨著時間的推移,“關閉收盤價”(Adj Close)不斷增加,但是數據本身是否顯示出任何與滯后有關的模式? 在這種情況下,我們將使用lag_plot。

#Try lag 1 day
pd.plotting.lag_plot(apple_df['Adj Close'], lag = 1)
Image for post

As we can see in the plot above, it is almost near linear. It means there is a correlation between daily Adj Close. It is expected as the daily price of the stock would not be varied much in each day.

如上圖所示,它幾乎接近線性。 這意味著每日調整關閉之間存在相關性。 可以預期,因為股票的每日價格每天不會有太大變化。

How about a weekly basis? Let’s try to plot it

每周一次如何? 讓我們嘗試繪制它

#The data only consist of work days, so one week is 5 dayspd.plotting.lag_plot(apple_df['Adj Close'], lag = 5)
Image for post

We can see the pattern is similar to the lag 1 plot. How about 365 days? would it have any differences?

我們可以看到該模式類似于滯后1圖。 365天怎么樣? 有什么區別嗎?

pd.plotting.lag_plot(apple_df['Adj Close'], lag = 365)
Image for post

We can see right now the pattern becomes more random, although the non-linear pattern still exists.

現在我們可以看到模式變得更加隨機,盡管非線性模式仍然存在。

4. scatter_matrix (4. scatter_matrix)

The scatter_matrix is just like the name implies; it creates a matrix of scatter plot. Let’s try it with an example at once.

顧名思義, scatter_matrix就是一樣。 它創建了散點圖矩陣。 讓我們立即嘗試一個示例。

import matplotlib.pyplot as plttips = sns.load_dataset('tips')
pd.plotting.scatter_matrix(tips, figsize = (8,8))
plt.show()
Image for post

We can see the scatter_matrix function automatically detects the numerical features within the Data Frame we passed to the function and create a matrix of the scatter plot.

我們可以看到scatter_matrix函數自動檢測我們傳遞給該函數的數據框內的數字特征,并創建散點圖的矩陣。

In the example above, between two numerical features are plotted together to create a scatter plot (total_bill and size, total_bill and tip, and tip and size). Whereas, the diagonal part is the histogram of the numerical features.

在上面的示例中,兩個數字特征之間被繪制在一起以創建散點圖(total_bill和size,total_bill和tip,以及tip和size)。 而對角線部分是數值特征的直方圖。

This is a simple function but powerful enough as we could get much information with a single line of code.

這是一個簡單的功能,但功能足夠強大,因為我們可以用一行代碼來獲取很多信息。

結論 (Conclusion)

Here I have shown you 4 different pandas plotting functions that you should know, that includes:

在這里,我向您展示了您應該了解的4種不同的熊貓繪圖功能,其中包括:

  1. radviz

    拉德維茲
  2. bootstrap_plot

    bootstrap_plot
  3. lag_plot

    lag_plot
  4. scatter_matrix

    scatter_matrix

I hope it helps!

希望對您有所幫助!

翻譯自: https://towardsdatascience.com/4-pandas-plotting-function-you-should-know-5a788d848963

熊貓tv新功能介紹

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388564.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388564.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388564.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

CPP_封裝_繼承_多態

類的三方法:封裝,繼承,多態。封裝:使用一整套方法去創建一個新的類型,這叫類的封裝。繼承:從一個現有的類型基礎上,稍作改動,得到一個新的類型的方法,叫類的繼承。多態&a…

win與linux淵源,微軟與Linux從對立走向合作,WSL是如何誕生的

原標題:微軟與Linux從對立走向合作,WSL是如何誕生的正文Windows Subsystem for Linux(WSL)的開發,讓微軟從Linux的對立面走向合作,并且不斷加大對開源社區的支持力度。而作為微軟歷史上的重要轉折點,外界對WSL技術在Pr…

文件編輯器 vi

1、關于文本編輯器; 文本編輯器有很多,比如圖形模式的gedit、kwrite、OpenOffice ... ... ,文本模式下的編輯器有vi、vim(vi的增強版本)和nano ... ... vi和vim是我們在Linux中最常用的編輯器。我們有必要介紹一下vi&a…

MFC80.DLL復制到程序目錄中,也有的說復制到安裝目錄中

在用VS2005學習C調試程序的時候,按F5鍵,總提示這個問題, 不曉得什么原因,網上有的說找到MFC80.DLL復制到程序目錄中,也有的說復制到安裝目錄中,可結果很失望,也有的VS2005安裝有問題&#xff0…

vs顯示堆棧數據分析_什么是“數據分析堆棧”?

vs顯示堆棧數據分析A poor craftsman blames his tools. But if all you have is a hammer, everything looks like a nail.一個可憐的工匠責怪他的工具。 但是,如果您只有一把錘子,那么一切看起來都像釘子。 It’s common for web developers or databa…

服務器

服務器主流品牌:華為、浪潮、戴爾、惠普華為服務器:華為FusionServer RH2288 V3 華為FusionServer RH5885 V3 浪潮服務器: 浪潮英信NP3020M4 浪潮英信NF5280M4 戴爾服務器: 戴爾PowerEdge R730 機架式服務器 戴爾PowerEdge R740 機…

樹莓派 zero linux,樹莓派 zero基本調試

回家之前就從網上購買了一堆設備,回去也不能閑著,可以利用家里相對齊全的準備安裝調試。結果人還沒回來,東西先到了。購買的核心裝備是樹莓派zero w,雖然已經知道它比家族大哥樹莓派小不少,但拿到手里還是驚奇它的小巧…

error C2440 “static_cast” 無法從“void (__thiscall CPppView )(void)”轉換為“LRESULT (__thiscall

error C2440 “static_cast” 無法從“void (__thiscall CPppView )(void)”轉換為“LRESULT (__thiscall CWnd )(WPARAM,LPARAM)” 不能轉換void (_thiscall CMainFrame::*)(void)to LRESULT (__thiscall CWnd::* )(WPARAM,LPARAM)開發平臺由VC6.0升級至VS2005,需要…

簡單的編譯流程

簡易編譯器流程圖: 一個典型的編譯器,可以包含為一個前端,一個后端。前端接收源程序產生一個中間表示,后端接收中間表示繼續生成一個目標程序。所以,前端處理的是跟源語言有關的屬性,后端處理跟目標機器有關的屬性。 復…

廣告投手_測量投手隱藏自己的音高的程度

廣告投手As the baseball community has recently seen with the Astros 2017 cheating scandal, knowing what pitch is being thrown gives batters a game-breaking advantage. However, unless you have an intricate system of cameras and trash cans set up, knowing wh…

linux事務隔離級別,事務的隔離級別(Transaction isolation levels)2

READ COMMITTEDREAD COMMITTED這是數據庫默認的隔離級別。它能保證你不能讀取那張表格數據,只要有其它事務還在改變這張表格數據。可是,因為sql server在select操作的時,鎖表格時間就那么一小會兒,如果一個事務在READ COMMITTED級…

Asp導出到Excel之二

response.contentType "application/vnd.ms-excel" response.addheader "Content-Disposition", "attachment; filename引出文件.xls" 一、適用于動態和表態表。 二、頁面最好只存放數據表,不要有其它內容。 三、對于分頁的情…

warning C4996: “strcpy”被聲明為否決的解決辦法

VC2005中,使用了很多標準的C函數,比如fopen,strcpy之類的。編譯時會出現警告,比如這個: d:\xxxx.c(1893) : warning C4996: “strcpy”被聲明為否決的 緊接著IDE有提示說:“This function or variable…

驗證部分表單是否重復

1. 效果 圖片中的名稱、機構編碼需要進行重復驗證2. 思路及實現 表單驗證在獲取數據將需要驗證的表單數據進行保存this.nameChangeTemp response.data.orgName;this.codeChangeTemp response.data.orgCode; 通過rule對表單進行驗證 以名字的驗證為例rules: {orgName: [// 設置…

python bokeh_提升視覺效果:使用Python和Bokeh制作交互式地圖

python bokehLet’s face it, fellow data scientists: our clients LOVE dashboards. Why wouldn’t they? Visualizing our data helps us tell a story. Visualization turns thousands of rows of data into a compelling and beautiful narrative. In fact, dashboard vi…

用C#寫 四舍五入函數(原理版)

doubled 0.06576523;inti (int)(d/0.01);//0.01決定了精度 doubledd (double)i/100;//還原 if(d-dd>0.005)dd0.01;//四舍五入 MessageBox.Show((dd*100).ToString()"%");//7%,dd*100就變成百分的前面那一部分了

C++設計UDP協議通訊示例

UDP是一種面向非連接,不可靠的通訊協議,相對于TCP來說,雖然可靠性不及,但傳輸效率較高   一、緒言   UDP是一種面向非連接,不可靠的通訊協議,相對于TCP來說,雖然可靠性不及,但…

浪里個浪 FZU - 2261

TonyY是一個喜歡到處浪的男人,他的夢想是帶著蘭蘭姐姐浪遍天朝的各個角落,不過在此之前,他需要做好規劃。 現在他的手上有一份天朝地圖,上面有n個城市,m條交通路徑,每條交通路徑都是單行道。他已經預先規劃…

C#設計模式(9)——裝飾者模式(Decorator Pattern)

一、引言 在軟件開發中,我們經常想要對一類對象添加不同的功能,例如要給手機添加貼膜,手機掛件,手機外殼等,如果此時利用繼承來實現的話,就需要定義無數的類,如StickerPhone(貼膜是手…

北大青鳥c語言課后答案,北大青鳥C語言教程--第一章 C語言基礎.ppt

《北大青鳥C語言教程--第一章 C語言基礎.ppt》由會員分享,可在線閱讀,更多相關《北大青鳥C語言教程--第一章 C語言基礎.ppt(20頁珍藏版)》請在人人文庫網上搜索。1、第一章,C 語言基礎,2,課程地位,.Net ,以 # 開始的語句稱為預處理器指令,#include語句不…