數據圖表可視化_數據可視化如何選擇正確的圖表第1部分

數據圖表可視化

According to the World Economic Forum, the world produces 2.5 quintillion bytes of data every day. With so much data, it’s become increasingly difficult to manage and make sense of it all. It would be impossible for any person to wade through data line-by-line and see distinct patterns and make observations.

根據世界經濟論壇的數據 ,世界每天產生2.5億字節的數據。 擁有如此多的數據,變得越來越難以管理和理解所有內容。 任何人都不可能逐行瀏覽數據并看到不同的模式并進行觀察。

Data visualization is one of the data science processes; that is, a framework for approaching data science tasks. After data is collected, processed, and modeled, the relationships need to be visualized for the conclusions.

數據可視化是數據科學過程之一; 即,用于處理數據科學任務的框架。 在收集,處理和建模數據之后,需要對關系進行可視化以得出結論。

We use data visualization as a technique to communicate insights from data through visual representation. Our main goal is to distill large datasets into visual graphics to allow for a straightforward understanding of complex relationships within the data.

我們使用數據可視化作為一種??技術,通過可視化表示傳達來自數據的見解。 我們的主要目標是將大型數據集提取為可視圖形,以便直接了解數據中的復雜關系。

So now, we know data visualization can provide insight that traditional descriptive statistics cannot. Our big question is how to choose the right chart for the data?

因此,現在,我們知道數據可視化可以提供傳統描述性統計不能提供的見解。 我們的主要問題是如何為數據選擇正確的圖表?

This note will give us an overview of the different chart types. For each type of chart, we will introduce a short description. We then discuss when to use it and when we should avoid using it. Next, we will look at some Python code for implementation. I only present the primary principle; the full version will be provided at the end of this article.

本說明將概述不同的圖表類型。 對于每種類型的圖表,我們將簡要介紹。 然后,我們討論何時使用它以及何時應避免使用它。 接下來,我們將看一些用于實現的Python代碼。 我只介紹主要原則; 完整版本將在本文結尾處提供。

I hope this note is interesting enough to pick up the slack. Let’s hop to it.

我希望此筆記足夠有趣,可以彌補這一不足。 讓我們跳到它。

您想講什么故事? (What story do you want you to tell?)

Before making a chart, it’s essential to understand why we need one. Graphs, plots, maps, and diagrams help people understand complex data, find patterns, identify trends, and tell stories. Think about the message we want to share with our audience. Here, I group the charts by their data visualization functions, that is, what we want our charts to communicate with our audience. While each chart’s allocation into specific functions isn’t a perfect system, it still works as a useful guide for selecting a chart based on our analysis or communication needs.

在制作圖表之前,必須了解我們為什么需要一個圖表。 圖形,曲線圖,地圖和圖表可幫助人們理解復雜的數據,查找模式,識別趨勢并講故事。 考慮一下我們想與聽眾分享的信息。 在這里,我通過圖表的數據可視化功能(即我們希望圖表與受眾交流)對圖表進行分組。 盡管每個圖表分配到特定功能上并不是一個完美的系統,但它仍然可以作為根據我們的分析或溝通需求選擇圖表的有用指南。

The first part of this note will introduce us to different charts to display the connection between variables, the trend over time, and the relative order of variables within category(ies)

本說明的第一部分將向我們介紹不同的圖表,以顯示變量之間的聯系,隨時間的趨勢以及類別中變量的相對順序。

關系 (Relationship)

1. Scatter plot using Matplotlib2. Marginal Histogram3. Scatter plot using Seaborn4. Pair Plot in Seaborn5. Heat Map

1.使用Matplotlib2進行散點圖。 邊際直方圖3。 使用Seaborn4的散點圖。 Seaborn5中的配對圖。 熱圖

隨時間變化的數據 (Data over Time)

6. Line Chart7. Area Chart8. Stack Area Chart9. Area Chart Unstacked

6.折線圖7。 面積圖8。 堆棧面積圖9。 區域圖未堆疊

排行 (Ranking)

10. Vertical Bar Chart11. Horizontal Bar Chart12. Multi-set Bar Chart13. Stack Bar Chart14. Lollipop Chart

10.垂直條形圖11。 水平條形圖12。 多組條形圖13。 堆棧條形圖14。 棒棒糖圖

The second part of this note will introduce us to different chart types use to compare variables and their distribution.

本說明的第二部分將向我們介紹用于比較變量及其分布的不同圖表類型。

分配 (Distribution)

15. Histogram16. Density Curve with Histogram17. Density Plot18. Box Plot19. Strip Plot20. Violin Plot21. Population Pyramid

15.直方圖16。 帶直方圖的密度曲線17。 密度圖18。 箱線圖19。 帶狀圖20。 小提琴圖21。 人口金字塔

比較 (Comparisons)

22. Bubble Chart23. Bullet Chart24. Pie Chart25. Net Pie Chart26. Donut Chart27. TreeMap28. Diverging Bar29. Choropleth Map30. Bubble Map

22.氣泡圖23。 項目符號圖24。 餅圖25。 凈餅圖26。 甜甜圈圖27。 樹圖28。 分流欄29。 Choropleth Map 30。 氣泡圖

關系 (Relationship)

We use a relationship method to display a connection or correlation between two or more variables.

我們使用一種關系方法來顯示兩個或多個變量之間的聯系或相關性。

When assessing a relationship between data sets, we are trying to understand how two or more data sets combine and interact with each other.

在評估數據集之間的關系時,我們試圖了解兩個或更多數據集如何組合并彼此交互。

This relationship is called correlation, and it can be positive or negative, meaning that the variables considered might be supportive or working against each other.

這種關系稱為相關性,它可以是正數或負數,意味著所考慮的變量可能是相互支持的或相互影響的。

1.使用Matplotlib進行散點圖 (1. Scatter plot using Matplotlib)

Image for post
Author作者

A scatter plot is a type of chart that is often used in statistics and data science. It consists of multiple data points plotted across two axes. Each variable depicted in a scatter plot would have various observations. It can be an advantageous chart type whenever we see any relationship between the two data sets.

散點圖是統計和數據科學中經常使用的一種圖表。 它由在兩個軸上繪制的多個數據點組成。 散點圖中描繪的每個變量將具有各種觀察結果。 每當我們看到兩個數據集之間的任何關系時,它都是一種有利的圖表類型。

We use a scatter plot to identify the data’s relationship with each variable (i.e., correlation or trend patterns.) It also helps in detecting outliers in the plot.

我們使用散點圖來識別數據與每個變量的關系(即相關性或趨勢模式)。這也有助于檢測圖中的離群值。

In machine learning, scatter plots are often used in regression, where x and y are continuous variables. They are also used in clustering scatters or outlier detection.

在機器學習中,散點圖通常用于回歸,其中x和y是連續變量。 它們還用于聚類分散或離群值檢測。

Scatter plots are not suitable if we are interested in observing time patterns.

如果我們對觀察時間模式感興趣,則散點圖不適合。

A scatter plot is used with numerical data or numbers. So, if we have categories such as three divisions, five products, etc., a scatter plot would not reveal much.

散點圖與數字數據或數字一起使用。 因此,如果我們具有三個部門,五個產品等類別,則散點圖不會顯示太多。

Python實現 (Python Implementation)

We use the Iris dataset for visualization.

我們使用虹膜數據集進行可視化。

plt.scatter(iris_df['sepal length (cm)'], iris_df['sepal width (cm)'])
plt.title('Scatterplot of Distribution of Sepal Length and Sepal Width', fontsize=15)
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
Image for post
Author作者

2.邊際直方圖 (2. Marginal Histogram)

Marginal histograms are histograms added to the margin of each axis of a scatter plot for analyzing the distribution of each measure.

邊際直方圖是添加到散點圖各軸邊緣的直方圖,用于分析每個度量的分布。

We use a marginal histogram to assess the relationship between two variables and examine their distributions. Putting marginal histograms in scatter plots or adding marginal bars on highlighted tables makes the visualization interactive, informative, and impressive.

我們使用邊際直方圖來評估兩個變量之間的關系并檢查它們的分布。 將邊際直方圖放在散點圖中或在突出顯示的表格上添加邊際條形圖,使可視化具有交互性,信息量和令人印象深刻。

Python實現 (Python Implementation)

# A seaborn jointplot shows bivariate scatterplots and univariate histograms in the same figure
p = sns.jointplot(iris_df['sepal length (cm)'], iris_df['sepal width (cm)'], height=10)
Image for post
Author作者

3.使用Seaborn的散點圖 (3. Scatter plot using Seaborn)

Our goal here is to produce a legend to understand the differences between groups. We will use seaborn’s FacetGrid to color the scatterplot by species.

我們的目標是產生一個圖例,以了解組之間的差異。 我們將使用seaborn的FacetGrid按物種對散點圖進行著色。

sns.FacetGrid(iris_df, hue=’species’, size=10) \
.map(plt.scatter, ‘sepal length (cm)’, ‘sepal width (cm)’) \
.add_legend()
plt.title(‘Scatterplot with Seaborn’, fontsize=15)
Image for post
Author作者

4. Seaborn中的配對圖 (4. Pair plot in Seaborn)

Another useful seaborn plot is pairplot, which shows the bivariate relationship between each pair of features. From the pair plot, we’ll see that the Iris-setosa species is separated from the other two across all feature combinations.

另一個有用的seaborn圖是pairplot ,它顯示了每對pairplot之間的雙變量關系。 從對圖中,我們可以看到鳶尾(Iris-setosa)物種在所有特征組合中都與其他兩個物種分開。

sns.pairplot(iris_df.drop(“target”, axis=1), hue=”species”, height=3)
Image for post
Author作者

5.熱圖 (5. Heat map)

A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heatmaps are useful for cross-examining multivariate data, through placing variables in rows and columns and coloring cells within the table.

熱圖是使用顏色編碼系統表示不同值的數據的圖形表示。 熱圖可用于交叉檢查多元數據,方法是將變量放置在行和列中,并為表格中的單元格上色。

Image for post
Author作者

All the rows are one category (labels displayed on the left side), and all the columns are another category (labels displayed on the bottom). The individual rows and columns are divided into the subcategories, which all match each other in a matrix. The cells within the table either contain color-coded categorical data or numerical data based on a color scale. Data in a cell demonstrates the relationship between two variables in the connecting row and column.

所有行都是一個類別(標簽顯示在左側),所有列都是另一類別(標簽顯示在底部)。 各個行和列分為子類別,它們在一個矩陣中彼此匹配。 該表中的單元格包含顏色編碼的分類數據或基于色標的數字數據。 單元格中的數據演示了連接的行和列中兩個變量之間的關系。

Heatmaps are useful for showing variance across multiple variables, revealing any patterns, displaying whether any variables are similar, and detecting any correlations between them.

熱圖可用于顯示多個變量之間的方差,顯示任何模式,顯示任何變量是否相似以及檢測它們之間的任何相關性。

Heatmap can be super useful when we want to see which intersections of the categorical values have a higher concentration of the data than others.

當我們要查看分類值的哪些交集比其他交集具有更高的數據集中度時,熱圖可能會非常有用。

Heatmaps are better suited to displaying a more generalized view of numerical data. It is harder to accurately tell the differences between color shades and extract specific data points (unless we include the cells’ raw data).

熱圖更適合于顯示更通用的數值數據視圖。 很難準確分辨出顏色之間的差異并提取特定的數據點(除非我們包括單元格的原始數據)。

Heatmaps can also show the changes in data over time if one of the rows or columns is set to time intervals. An example of this would be to use a heatmap to compare the temperature changes across the year in the city(ies), to see the hottest or coldest places. So the rows contain each month, the columns indicate hours, and the cells would have the temperature values.

如果將行或列之一設置為時間間隔,則熱圖還可以顯示數據隨時間的變化。 例如,可以使用熱圖比較城市中全年的溫度變化,以查看最熱或最冷的地方。 因此,行包含每個月,列表示小時,單元格將具有溫度值。

Image for post
Data Viz with Python and R使用Python和R的Data Viz

Python實現 (Python Implementation)

We use the World Happiness Report dataset from Kaggle. I cleaned the data and combined all files into the happiness_rank.csv file. You can download and clean the data or download the final result here. I recommend you check out my data cleaning codes on Github.

我們使用來自Kaggle的世界幸福報告數據集。 我清理了數據,并將所有文件合并到了happiness_rank.csv文件中。 您可以在此處下載和清除數據或下載最終結果。 我建議您在Github上查看我的數據清理代碼。

sns.heatmap(happy[usecols].corr(),linewidths=0.25,
vmax=0.7,square=True,cmap="Blues",
linecolor='w',annot=True,annot_kws={"size":8},
mask=mask, cbar_kws={"shrink": .9})
Image for post
Author作者

隨時間變化的數據 (Data over time)

Sometimes it isn’t enough to know that a relationship exists between variables; in some cases, better analysis is possible if we can also visualize when the relationship took place. Because relationships are denoted with links between variables, the date/time appears as a link property. This visualization method shows data over the period to find trends or changes over time.

有時僅僅知道變量之間存在關系是不夠的。 在某些情況下,如果我們還可以可視化關系發生的時間,則可以進行更好的分析。 因為關系是用變量之間的鏈接表示的,所以日期/時間顯示為鏈接屬性。 這種可視化方法顯示了一段時間內的數據,以查找隨時間變化的趨勢或變化。

6.折線圖 (6. Line Chart)

Line charts are used to display quantitative values over a continuous interval or period.

折線圖用于顯示連續間隔或一段時間內的定量值。

Line charts are drawn by first plotting data points on a cartesian coordinate grid and then connecting them. Typically, the y-axis has a quantitative value, while the x-axis is a timescale or a sequence of intervals. The direction of the lines on the graph works as an excellent metaphor for the data: an upward slope indicates increasing values, and a downward slope indicates where values have decreased. The line’s journey across the graph can create patterns that reveal trends in a dataset.

通過首先在笛卡爾坐標網格上繪制數據點,然后將它們連接起來來繪制折線圖。 通常,y軸具有定量值,而x軸是時間刻度或間隔序列。 圖上線條的方向很好地隱喻了數據:向上的斜率指示值增加,而向下的斜率指示值減少的位置。 線在圖形上的行程可以創建揭示數據集中趨勢的模式。

Line charts are most frequently used to show trends and analyze how the data has changed over time.

折線圖最常用于顯示趨勢并分析數據隨時間的變化。

Line charts are best for continuous data as it connects many variables that all belong to the same category.

折線圖最適合連續數據,因為它連接了許多都屬于同一類別的變量。

Image for post
The Python Chart GalleryPython圖表庫

When grouped with other lines or other data series, individual lines can be compared. However, we should avoid using more than four lines per graph, as this makes the chart more cluttered and harder to read. A solution to this is to split our chart into multiples subplots.

與其他行或其他數據系列分組時,可以比較各個行。 但是,我們應該避免在每個圖形上使用多于??4條線,因為這會使圖形更加混亂并且難以閱讀。 解決方案是將圖表拆分為多個子圖。

Python實現 (Python Implementation)

Suppose we have a dataset containing information about Medium members. We want to see the trend of articles that have been read in 2019.

假設我們有一個包含有關Medium成員信息的數據集。 我們希望看到2019年已閱讀文章的趨勢。

plt.plot(data['Month'], data['All Views'], color='#4870a0', marker='o')
Image for post
Author作者

7.面積圖 (7. Area Chart)

The idea of an area chart is based on the line chart. The colored region shows us the development of a variable over time.

面積圖的概念基于折線圖。 彩色區域向我們展示了變量隨時間的變化。

Image for post
Matplotlib Cheat SheetMatplotlib備忘單

Area charts are ideal for clearly illustrating the magnitude of change between two or more data points. For example, the happiness score has six generating divisions; we would like to see each of these divisions’ contributions.

區域圖非常適合清晰地說明兩個或多個數據點之間的變化幅度。 例如,幸福感分數有六個生成部分; 我們希望看到每個部門的貢獻。

Image for post
Matplotlib Cheat SheetMatplotlib備忘單

Moreover, if we are interested in the portion generated by each division and not that much of the total amount of the division self, we can use a 100% stacked area chart. This will show each division’s percentage contribution over time.

此外,如果我們對每個分區生成的部分感興趣,而不是對分區自身總量的感興趣,則可以使用100%堆疊面積圖。 這將顯示每個部門隨時間的百分比貢獻。

Area charts are not the best choice if we want to present fluctuating values, like the stock market or price changes.

如果我們要顯示波動的值(例如股票市場或價格變化),則面積圖不是最佳選擇。

Python實現 (Python Implementation)

Here, we want to present an accumulative number of external views over time.

在這里,我們要介紹一段時間內累積的外部視圖數量。

plt.stackplot(data['Month'], data['External Views'], colors='#7289da', alpha=0.8)
Image for post
Author作者

7.堆棧面積圖 (7. Stack Area Chart)

The idea of a stack area chart is based on the simple area charts. It displays the value of several groups on the same graphic. Values of each group are displayed on top of each other. The entire graph represents the total of all data plotted over time.

堆棧面積圖的概念基于簡單面積圖。 它在同一圖形上顯示幾個組的值。 每個組的值都顯示在彼此的頂部。 整個圖形表示隨時間繪制的所有數據的總數。

The stacked area chart type is a powerful chart as it allows grouping of data and seeing trends over a selected date range.

堆積面積圖類型是一種功能強大的圖表,它允許對數據進行分組并查看選定日期范圍內的趨勢。

Stacked area charts use the areas to convey whole numbers, so they do not work for negative values.

堆疊面積圖使用面積來傳達整數,因此它們不適用于負值。

Stacked area charts are colorful and fun, but we should use them with caution because they can quickly become a mess. We shouldn’t stack together more than five categories.

堆積面積圖豐富多彩而且有趣,但是我們應該謹慎使用它們,因為它們會很快變得一團糟。 我們不應將五個類別疊加在一起。

Python實現 (Python Implementation)

plt.stackplot(data['Month'], data['Internal Views'], data['External Views'],
alpha=0.75,
colors=['#7289da','#f29fa9'],
labels=['Internal Views', 'External Views'])
Image for post
Author作者

9.區域圖未堆疊 (9. Area Chart Unstacked)

Unlike a stack area chart, an area chart unstacked shows the overlap of several groups on the same graphic.

與堆疊面積圖不同,未堆疊的面積圖顯示同一圖形上多個組的重疊。

x = data['Internal Views']
y = data['External Views']# plot the data
ax.plot(x, color='#49a7c3', alpha=0.3, label='Internal Views')
ax.plot(y, color='#f04747', alpha=0.3, label='External Views')# fill the areas between the plots and the x axis
# this can create overlapping areas between lines
ax.fill_between(x.index, 0, x, color='blue', alpha=0.2)
ax.fill_between(x.index, 0, y, color='red', alpha=0.2)
Image for post
Author作者

排行 (Ranking)

A visualization method displays the relative order of data values.

可視化方法顯示數據值的相對順序。

條形圖 (Bar Chart)

Bar charts are among the most frequently used chart types. As the name suggests, a bar chart is composed of a series of bars illustrating a variable’s development.

條形圖是最常用的圖表類型之一。 顧名思義,條形圖由一系列說明變量發展的條形圖組成。

There are four types of bar charts: horizontal bar chart, verticle bar chart, group bar chart, and stacked bar chart.

條形圖有四種類型:水平條形圖,垂直條形圖,組條形圖和堆疊條形圖。

Bar charts are great when we want to track the development of one or two variables over time. One axis of the chart shows the specific categories being compared, and the other axis represents a measured value.

當我們想跟蹤一個或兩個變量隨時間的變化時,條形圖非常有用。 圖表的一個軸顯示要比較的特定類別,另一個軸代表測量值。

A simple bar chart isn’t suitable when we have a single period breakdown of a variable. For example, if I want to portray the main business lines that contributed to a company’s revenues, I wouldn’t use a bar chart. Instead, I would create a pie chart or one of its variations.

當我們對變量進行單周期細分時,簡單的條形圖不適用。 例如,如果我要描繪對公司收入有貢獻的主要業務,則不會使用條形圖。 相反,我將創建一個餅圖或其變體之一。

10.垂直條形圖 (10. Vertical Bar Chart)

Vertical bar charts (column chart) are distinguished from histograms, as they do not display continuous developments over an interval. Vertical bar chart’s discrete data is categorical and therefore answers the question of “how many?” in each category.

垂直條形圖(柱形圖)與直方圖有所區別,因為它們在一定的時間間隔內不顯示連續的變化。 垂直條形圖的離散數據是分類數據,因此回答了“多少?”的問題。 在每個類別中。

Vertical bar charts are typically used to compare several items in a specific range of values. So it is ideal for comparing a single category of data between individual sub-items, for example, corresponding revenue between regions.

垂直條形圖通常用于比較特定值范圍內的多個項目。 因此,比較單個子項目之間的單個數據類別(例如,區域之間的相應收入)是理想的。

Image for post
Author作者

Python實現 (Python Implementation)

We use mpg_ggplot2 data frame. It is a rectangular collection of variables (in the columns) and observations (in the rows). mpg contains observations collected by the US Environmental Protection Agency on 38 popular models of car.

我們使用mpg_ggplot2數據幀。 它是變量(在列中)和觀察值(在行中)的矩形集合。 mpg包含美國環境保護署收集的關于38種流行車型的觀察結果。

Python實現 (Python Implementation)

Here, we want to compare car models.

在這里,我們要比較汽車型號。

plt.bar(value_count.index, value_count.values, color='#49a7c3')
Image for post
Author作者

11.水平條形圖 (11. Horizontal bar chart)

Horizontal bar charts represent the data horizontally. The data categories are shown on the y-axis, and the data values are shown on the x-axis. The length of each bar is equal to the value corresponding to the data category, and all bars go across from left to right.

水平條形圖水平表示數據。 數據類別顯示在y軸上,數據值顯示在x軸上。 每個條的長度等于與數據類別對應的值,并且所有條從左到右穿過。

Python Implementation (Python Implementation)

plt.barh(value_count.index, value_count.values, color='#b28eb2')
Image for post
Author作者

13.多組條形圖 (13. Multi-set Bar Chart)

Also known as a Grouped Bar Chart or Clustered Bar Chart.

也稱為分組條形圖或群集條形圖。

This variation of a bar chart is used when two or more data series are plotted side-by-side and grouped under categories, all on the same axis.

當兩個或多個數據系列并排繪制并按類別分組時,都使用條形圖的這種變化,它們都在同一軸上。

We use multi-set bar charts to compare grouped variables or categories to other groups with those same variables or category types.

我們使用多組條形圖將分組的變量或類別與具有相同變量或類別類型的其他組進行比較。

The downside of group bar charts is that they become harder to read the more bars we have in one group.

組條形圖的不利之處在于,它們變得難以閱讀一組中更多條形圖。

Python實現 (Python Implementation)

ax = views.plot.bar(rot=0,color='#E6E9ED',width=1, figsize=(14,8))
ax = df.plot.bar(rot=0, ax=ax, color=['#7289da', '#dd546e', '#99aab5', '#f3c366'],
width=0.8, figsize=(14,8))
Image for post
Author作者

13.堆棧條形圖 (13. Stack Bar Chart)

Unlike a multi-set bar chart that displays their bars side-by-side, stacked bar charts segment their bars. Stacked bar charts are used to show how a broader category is divided into smaller categories and what the relationship of each part has on the total amount.

與多組條形圖并排顯示的條形圖不同,堆疊式條形圖將條形圖分段。 堆疊的條形圖用于顯示如何將較寬的類別劃分為較小的類別,以及各個部分與總量的關系。

Stacked bar charts place each value for the segment after the previous one. The total value of the bar is all the segment values added together. It is ideal for comparing the total amounts across each group/segmented bar.

堆疊的條形圖將細分的每個值放在前一個值之后。 條的總和是所有細分值的總和。 比較每個組/分段條上的總量非常理想。

One major flaw of Stacked bar charts is that they become harder to read the more segments each bar has. Also, comparing each component to each other is difficult, as they are not aligned on a common baseline.

堆積條形圖的一個主要缺陷是,它們變得難以閱讀每個條形圖具有的更多細分。 而且,很難將每個組件彼此進行比較,因為它們沒有在同一基線上對齊。

Python實現 (Python Implementation)

rect1 = plt.bar(data['Month'] ,data['Internal Views'],
width=0.5, color='lightblue')
rect2 = plt.bar(data['Month'], data['External Views'],
width=0.5, color='#1f77b4')
Image for post
Author作者

14.棒棒糖圖 (14. Lollipop Chart)

Lollipop chart serves a similar purpose as an ordered bar chart in a visually pleasing way. We use lollipop charts to show the relationship between a numerical variable and another numerical or categorical variable.

棒棒糖圖表在視覺上令人愉悅,其功能與訂購條形圖相似。 我們使用棒棒糖圖表來顯示數值變量與另一個數值或類別變量之間的關系。

The lollipop chart is often claimed to be useful compared to a standard bar chart if we are dealing with a large number of values and when values are all high, such as in the 80–90% range (out of 100%). Then a broad set of tall columns can be visually aggressive.

如果我們要處理大量值并且值都很高時(例如80-90%的范圍(超過100%)),則通常認為棒棒糖圖與標準條形圖相比是有用的。 然后,一系列高大的柱子可能會在視覺上具有侵略性。

If our data has unsorted bars of very similar length — it is harder to compare the sizes of two very identical lollipops than standard bars.

如果我們的數據具有長度非常相似的未排序條形圖,那么與標準條形圖相比,很難比較兩個完全相同的棒棒糖的大小。

Python Implementation

Python實現

(markerline, stemlines, baseline) = plt.stem(value_count.index, value_count.values)
Image for post
Author作者

That’s it for the first part. The code is available on Github. We will continue with distributions and comparisons on part two.

就是第一部分。 該代碼在Github上可用。 我們將在第二部分繼續進行分布和比較。

So far, we know that data visualization is a quick, easy way to convey concepts universally — and we can experiment with different scenarios by making slight adjustments.

到目前為止,我們知道數據可視化是一種普遍表達概念的快速,簡便的方法-我們可以通過稍作調整來嘗試不同的方案。

There are dozens of tools for data visualization and data analysis — these range from simple — zero codings required (Tableau) to complex — coding required (JaveScript). Not every tool is right for every person looking to learn visualization techniques, and not every tool can scale to industry or enterprise purposes.

有數十種用于數據可視化和數據分析的工具,范圍從簡單的零編碼( Tableau )到復雜的編碼( JaveScript )。 并非每種工具都適合希望學習可視化技術的每個人,也不是每種工具都能滿足行業或企業目的。

My favorite professor told me that “Good data visualization theory and skills will transcend specific tools and products.” When we learn this skill, focus on best practices, and explore our style when it comes to visualizations and dashboards. Data visualization isn’t going away anytime soon, so it’s essential to build a foundation of analysis and storytelling, and exploration that you can carry with regardless of the tools or software you end up using.

我最喜歡的教授告訴我:“良好的數據可視化理論和技能將超越特定的工具和產品。” 當我們學習該技能時,請專注于最佳實踐,并在可視化和儀表板方面探索我們的風格。 數據可視化不會很快消失,因此建立分析和講故事以及探索的基礎非常重要,無論最終使用什么工具或軟件,都可以隨身攜帶。

If you want to dig deeper into this particular topic, here are some excellent places to start.

如果您想更深入地研究這個特定主題,那么這里是一些不錯的起點。

  1. Information is Beautiful

    信息很美

  2. Visualizing data

    可視化數據

  3. Data Visualization Catalogue

    數據可視化目錄

  4. Color Hex

    顏色十六進制

  5. Matplotlib Cheat Sheet

    Matplotlib備忘單

  6. How to make a heatmap with Seaborn in Python?

    如何在Python中使用Seaborn制作熱圖?

翻譯自: https://towardsdatascience.com/data-visualization-how-to-choose-the-right-chart-part-1-d4c550085ea7

數據圖表可視化

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389451.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389451.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389451.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Keras框架:實例分割Mask R-CNN算法實現及實現

實例分割 實例分割(instance segmentation)的難點在于: 需要同時檢測出目標的位置并且對目標進行分割,所以這就需要融合目標檢測(框出目標的位置)以及語義分割(對像素進行分類,分割…

機器學習 缺陷檢測_球檢測-體育中的機器學習。

機器學習 缺陷檢測🚩 目標 (🚩Objective) We want to evaluate the quickest way to detect the ball in a sport event in order to develop an Sports AI without spending a million dollars on tech or developers. Quickly we find out that detec…

莫煩Pytorch神經網絡第二章代碼修改

import torch import numpy as np""" Numpy Torch對比課程 """ # #tensor與numpy格式數據相互轉換 # np_data np.arange(6).reshape((2,3)) # print(np_data) # # torch_data torch.from_numpy(np_data) # print(\n,torch_data) # # tensor2ar…

自定義字符類

當 VC不使用MFC,無法使用屬于MFC的CString,為此自定義一個,先暫時使用,后續完善。 頭文件: #pragma once#define MAX_LOADSTRING 100 // 最大字符數class CString {public:char *c_str, cSAr[MAX_LOADSTRING];WCHAR *w…

使用python和javascript進行數據可視化

Any data science or data analytics project can be generally described with the following steps:通常可以通過以下步驟來描述任何數據科學或數據分析項目: Acquiring a business understanding & defining the goal of a project 獲得業務理解并定義項目目…

Android 事件處理

事件就是用戶對圖形的操作,在android手機和平板電腦上,主要包含物理按鍵事件和觸摸屏事件兩類。物理按鍵事件包含:按下、抬起、長按等;觸摸屏事件主要包含按下、抬起、滾動、雙擊等。 在View中提供了onTouchEvent()方法&#xff0…

莫煩Pytorch神經網絡第三章代碼修改

3.1Regression回歸 import torch import torch.nn.functional as F from torch.autograd import Variable import matplotlib.pyplot as plt""" 創建數據 """x torch.unsqueeze(torch.linspace(-1,1,100),dim1) y x.pow(2) 0.2*torch.rand(x…

為什么餅圖有問題

介紹 (Introduction) It seems as if people are split on pie charts: either you passionately hate them, or you are indifferent. In this article, I am going to explain why pie charts are problematic and, if you fall into the latter category, what you can do w…

New Distinct Substrings(后綴數組)

New Distinct Substrings&#xff08;后綴數組&#xff09; 給定一個字符串&#xff0c;求不相同的子串的個數。\(n<50005\)。 顯然&#xff0c;任何一個子串一定是后綴上的前綴。先&#xff08;按套路&#xff09;把后綴排好序&#xff0c;對于當前的后綴\(S_i\)&#xff0…

Android dependency 'com.android.support:support-v4' has different version for the compile (26.1.0...

在項目中加入react-native-camera的時候 出現的錯誤. 解決方案: 修改 implementation project(:react-native-camera)為 implementation (project(:react-native-camera)) {exclude group: "com.android.support"}查看原文 Could not find play-services-basement.aa…

先知模型 facebook_使用Facebook先知進行犯罪率預測

先知模型 facebookTime series prediction is one of the must-know techniques for any data scientist. Questions like predicting the weather, product sales, customer visit in the shopping center, or amount of inventory to maintain, etc - all about time series …

莫煩Pytorch神經網絡第四章代碼修改

4.1CNN卷積神經網絡 import torch import torch.nn as nn from torch.autograd import Variable import torch.utils.data as Data import torchvision import matplotlib.pyplot as pltEPOCH 1 BATCH_SIZE 50 LR 0.001 DOWNLOAD_MNIST False #如果數據集已經下載到…

github gists 101使代碼共享漂亮

If you’ve been going through Medium, looking at technical articles, you’ve undoubtedly seen little windows that look like the below:如果您一直在閱讀Medium&#xff0c;并查看技術文章&#xff0c;那么您無疑會看到類似于以下內容的小窗口&#xff1a; def hello_…

loj #6278. 數列分塊入門 2

題目 題解 區間修改&#xff0c;詢問區間小于c的個數。分塊排序&#xff0c;用vector。至于那個塊的大小&#xff0c;好像要用到均值不等式 我不太會。。。就開始一個個試&#xff0c;發現sizsqrt(n)/4時最快&#xff01;&#xff01;&#xff01;明天去學一下算分塊復雜度的方…

基于Netty的百萬級推送服務設計要點

1. 背景1.1. 話題來源最近很多從事移動互聯網和物聯網開發的同學給我發郵件或者微博私信我&#xff0c;咨詢推送服務相關的問題。問題五花八門&#xff0c;在幫助大家答疑解惑的過程中&#xff0c;我也對問題進行了總結&#xff0c;大概可以歸納為如下幾類&#xff1a;1&#x…

莫煩Pytorch神經網絡第五章代碼修改

5.1動態Dynamic import torch from torch import nn import numpy as np import matplotlib.pyplot as plt# torch.manual_seed(1) # reproducible# Hyper Parameters INPUT_SIZE 1 # rnn input size / image width LR 0.02 # learning rateclass…

鮮為人知的6個黑科技網站_6種鮮為人知的熊貓繪圖工具

鮮為人知的6個黑科技網站Pandas is the go-to Python library for data analysis and manipulation. It provides numerous functions and methods that expedice the data analysis process.Pandas是用于數據分析和處理的Python庫。 它提供了加速數據分析過程的眾多功能和方法…

VRRP網關冗余

實驗要求?1、R1創建環回口&#xff0c;模擬外網?2、R2&#xff0c;R3使用VRRP技術?3、路由器之間使用EIGRP路由協議? 實驗拓撲? 實驗配置??R1(config)#interface loopback 0R1(config-if)#ip address 1.1.1.1 255.255.255.0R1(config-if)#int e0/0R1(config-if)#ip addr…

網頁JS獲取當前地理位置(省市區)

網頁JS獲取當前地理位置&#xff08;省市區&#xff09; 一、總結 一句話總結&#xff1a;ip查詢接口 二、網頁JS獲取當前地理位置&#xff08;省市區&#xff09; 眼看2014又要過去了&#xff0c;翻翻今年的文章好像沒有寫幾篇&#xff0c;忙真的或許已經不能成為借口了&#…

大熊貓卸妝后_您不應錯過的6大熊貓行動

大熊貓卸妝后數據科學 (Data Science) Pandas is used mainly for reading, cleaning, and extracting insights from data. We will see an advanced use of Pandas which are very important to a Data Scientist. These operations are used to analyze data and manipulate…