標記偏見
Data is important — it is the logical justification for world-changing decisions. Unfortunately, arrays of numbers don’t tell as interpretable a story as a picture does, providing an insatiable need for data visualizations.
數據很重要-這是改變世界的決策的邏輯依據。 不幸的是,數字數組不能像圖片那樣講述一個故事,這提供了對數據可視化的無限需求。
Visualizations will always be used to tell a story with the data, but unfortunately can be distorted in many ways to give a false impression. What once was regarded as the golden truth — data — can be just as trustworthy and credible as an anecdote when it falls into several visualization pitfalls.
可視化將始終用于通過數據講述故事,但不幸的是,可視化可以通過多種方式失真以產生錯誤的印象。 曾經被視為黃金真理的東西-數據-當它陷入多個可視化陷阱時,可以像軼事一樣值得信賴和可信。
In this article, we’ll explore eight charts that demonstrate the ability to bias the data towards your narrative. Each chart will be structured in the format:
在本文中,我們將探索八個圖表,這些圖表展示了將數據偏向您的敘述的能力。 每個圖表的格式如下:
- Chart context and information. 圖表上下文和信息。
- Objections to the integrity of the chart. 反對圖表的完整性。
- A statistically honest version of the chart (if applicable). 圖表的統計真實版本(如果適用)。
- Tips for using the chart to distort all contexts of data (generalizing the method). 使用圖表扭曲數據的所有上下文的提示(概括該方法)。
氣泡大小| 面積還是半徑? 哪種適合您的敘述 (Bubble Size | Area or Radius? Whichever suits your narrative)
This State of the Union Address gives the impression that the United State’s GDP is much higher than other countries.
國情咨文給人的印象是美國的GDP遠遠高于其他國家。

In this chart, the value determined the radius of the circle. Because the radius is squared in the area of a circle, setting the radius as a differentiating factor misleads the reader. Setting the area of the circle, instead, to be proportionate to the value it represents reveals a more honest comparison.
在此圖表中,該值確定了圓的半徑。 由于半徑在一個圓的區域內平方,因此將半徑設置為微分因子會誤導讀者。 相反,將圓的面積設置為與其所代表的值成比例,可以顯示出更真實的比較。

The United States doesn’t seem to have as dramatic a difference in GDP with second place now, more representative of the real numbers.
美國的GDP似乎與第二名沒有太大的區別,更能代表實際數字。
If you need to emphasize a drastic difference, you should use radius as the proportionate factor. However, if you are a country like China whose political motive is to de-emphasize America’s GDP, use area as the proportionate factor.
如果需要強調巨大差異,則應使用“半徑”作為比例因子。 但是,如果您是一個像中國這樣的國家,其政治動機是不重視美國的GDP,請使用面積作為比例因子。
用顏色作弊| 利用讀者的不耐煩 (Cheating with Colors | Take advantage of readers’ impatience)
Clearly purposely chosen, this and more charts than you realize will cheat with colors. In this graph, the 41% — the minority — is shaded grey, whereas the other 59% is shaded with a vibrant, noticeable green.
明確選擇的目的是,此圖表以及比您想象的更多的圖表都會用顏色作弊。 在此圖中,占41%(少數)的陰影為灰色,而其他59%為充滿活力的,明顯的綠色。

It’s important to be conscious of corporate and political interests deliberately cheating with colors. Of course, since one graph hidden in a forest of others draws little attention, cheating with colors is a clever and defensible way of forcing the reader to draw biased conclusions as the result of their rush.
重要的是要意識到故意欺騙公司和政治利益的色彩。 當然,由于隱藏在其他人的森林中的一個圖形很少引起注意,因此用顏色作弊是迫使讀者由于匆忙而得出有偏見的結論的聰明而合理的方法。
箱線圖| 如何隱藏發行版 (Boxplots | How to hide a distribution)
Boxplots can often conceal a distribution. Consisting of five statistical measures of the data:
箱線圖通常可以隱藏分布。 由五種統計數據組成:
- The minimum (non-outlier) data point 最小(非異常)數據點
- The 25th percentile (first quartile) 第25個百分點(第一個四分位數)
- The median (50th percentile, second quartile) 中位數(第50個百分點,第二個四分位數)
- The 75th percentile (third quartile) 第75個百分點(第三四分位數)
- The maximum (non-outlier) data point 最大(非異常)數據點
Statistical outliers are marked as such and excluded from the boxplot itself.
統計異常值被標記為此類,并從箱線圖本身中排除。

Unfortunately, these five statistical measures are not as representative of a distribution as one would hope — certainty not worthy of being graphed.
不幸的是,這五種統計指標不能像人們希望的那樣代表分布-確定性不值得繪制。

These boxplots are somewhat misleading of a distribution. The boxplot assumes that a distribution has only one hump and is some variation of Bell curve with adjustable parameters.
這些箱線圖在某種程度上誤導了發行。 箱線圖假定分布只有一個駝峰,并且是具有可調參數的Bell曲線的某些變化。
Using a violin plot on top of a boxplot displays the true distribution of a data over its boxplot. Adding data points or a distribution can be referred to as ‘jitter’.
在箱圖的頂部使用小提琴圖可以顯示其箱圖上數據的真實分布。 添加數據點或分布可以稱為“抖動”。
- Boxplot A is nowhere representative of the large range of its distribution. 箱線圖A不能代表其廣泛分布。
- Boxplot B has two peaks, but the median assumes that a valley is really the peak of a singular-peak distribution. 箱線圖B有兩個峰,但中位數假設一個谷實際上是奇峰分布的峰。

Many different distributions will have the same boxplots. They should only be used if they are verifiably representative of the real distribution. However, if you wanted to, you could definitely obscure the real distribution of data with a boxplot to support your narrative.
許多不同的發行版將具有相同的箱型圖。 僅當它們可證實地代表實際分布時才應使用它們。 但是,如果您愿意,您肯定可以使用箱形圖來掩蓋數據的真實分布以支持您的敘述。
切斷Y軸| 規模無所謂 (Chopping off the Y-Axis | Scale never mattered)
This Fox news bar plot is irresponsible in not displaying the full scale of deaths. Granted, the numbers are not concealed, but humans instinctively set the zero baseline even when there are y-axis tick marks. Additionally, the gradient from orange to yellow further accents the idea of a zero baseline.
福克斯新聞欄的情節是不負責任的,不顯示全部死亡人數。 當然,這些數字并沒有被隱藏,但是即使有y軸刻度線,人類本能地將基線設置為零。 此外,從橙色到黃色的漸變進一步強調了零基準的想法。

This chart dramatizes an increase in southwest border apprehensions by not showing the full scale. Yes, the increase is large — a 20,000 increase from 2012 to 2013 — but compared to the complete scale, it is visually less ginormous.
此圖表通過不顯示完整比例來生動體現西南邊境憂慮的增加。 是的,增長幅度很大-從2012年到2013年增加了20,000,但與整個規模相比,它在視覺上沒有那么巨大。
A zero baseline makes any human draw numerical conclusions, such as two times the number of border apprehensions in 2013. Even though numerically it is made obvious this is not true, the bar lengths are suggestive of the opposite.
基線為零會得出任何人類得出的數字結論,例如是2013年邊界憂慮次數的兩倍。盡管從數字上可以明顯看出這是不正確的,但條形長度卻暗示了相反的情況。
Plotting the bars on a complete scale with a baseline of 0 shows the increase against a more complete scale.
在基線為0的完整比例尺上繪制條形顯示相對于更完整比例尺的增加。

If it’s not in your narrative, however, you can adjust the y-intercept as much as you want. If you choose, you don’t even need to be as gracious at Fox News and label only the y-axis, not the individual bars. Removing background lines makes comparison more difficult.
但是,如果不在您的敘述中,則可以根據需要調整y截距。 如果您選擇的話,您甚至不必在Fox News上那么客氣,只標記y軸,而不用標記單個條。 刪除背景線會使比較更加困難。
餅圖| 將百分比置于360°刻度上以使比較更加困難 (Pie Charts | Put percentages on a 360° scale to make comparison harder)
Pie charts have always been criticized for putting an intuitive base 10 scale, percentages, on a completely unintuitive 360° scale.
餅圖一直被批評將直觀的10級刻度(百分比)放在完全不直觀的360°刻度上。

Comparing categories entails measuring the angle of a slice. Humans generally are unfamiliar with circles, and comparing slices doesn’t tell an accurate story. Pie charts are usually bottom-heavy, meaning that if it is not very skinny it will seem larger than the other slices.
比較類別需要測量切片的角度。 人們通常不熟悉圈子,并且比較切片并不能說出準確的故事。 餅圖通常較重,這意味著如果它不是很緊的話,它將看起來比其他切片大。
Combine a fundamentally flawed idea with special three-dimensional effects, legends, several pie slices, and exploded slices, pie charts are often ineffective.
將根本有缺陷的想法與特殊的三維效果,圖例,幾個餅圖和爆炸圖結合起來,餅圖通常無效。
Compare the pie chart with the bar chart, which provides much more clean and easy comparison, even if the idea of everything summing to a whole is lost.
將餅形圖與條形圖進行比較,即使丟失了所有內容求和的想法,該條形圖也可以提供更加清晰和輕松的比較。

Pie charts can be used when the comparison of groups is discouraged — for example, when a political group wants to suggest that all groups are relatively equal, instead of highlighting differences. Pie charts, naturally difficult to compare, are a good choice for these motives.
當不鼓勵對團體進行比較時,例如,當一個政治團體希望暗示所有團體相對平等,而不是強調差異時,可以使用餅圖。 自然很難比較的餅圖是這些動機的不錯選擇。
意大利面條 用更多信息模糊信息 (Spaghetti Plots | Obscure information with more information)
Spaghetti plots are plots generally with five or more lines. Trying to follow a specific line is difficult, especially when lines of similar colors reach the same point and diverge.
意大利面條圖通常是具有五行或更多行的圖。 試圖遵循一條特定的線是困難的,特別是當相似顏色的線到達同一點并發散時。

It can be argued that there should be no more than third lines in a line plot, not to mention five. The issue with multiple line plots is the inability for easily distinguishable colors to be assigned. Using different symbols like a circle or a cross for each line not only clutters the space but also doesn’t work with many data points.
可以說,線圖中不應多于第三條線,更不用說五條線了。 多個線圖的問題是無法分配易于區分的顏色。 為每條線使用不同的符號(例如圓形或十字形)不僅會使空間混亂,而且不適用于許多數據點。
Instead, highlight at most three particular name of interests’ evolution throughout time. Not only can they be compared to a general trend in the background, it tells a more clear story.
取而代之的是,在整個時間內最多突出顯示利益演變的三個特定名稱。 不僅可以將它們與背景中的總體趨勢進行比較,而且還可以講述一個更清晰的故事。

However, you should always use spaghetti charts when you want individual attention to one line to be masked with vast swathes of data, or when you want to obscure the pattern of a particular line that may discredit your narrative.
但是,當您希望個人注意一條線被大量數據掩蓋時,或者當您希望掩蓋可能損害您的敘述的特定線的圖案時,您應該始終使用意大利面條圖。
雙軸圖表| 變形者 (Dual Axis Charts | The Shapeshifter)
Putting two y-axes — one on the left and one on the right of a chart — is a common way to compare two variables of different units, such as gross domestic product against life expectancy.
將兩個y軸(一個在圖表的左側,另一個在圖表的右側)放置是比較不同單位的兩個變量(例如國內生產總值與預期壽命)的一種常用方法。
For instance, this chart graphs the global GDP against the German GDP.
例如,此圖表將全球GDP與德國GDP對比。

There are several issues with using dual axis plots. One issue with dual axis plots is the ability to manipulate where the y-axis baseline is. Continuing the baseline yields a different result:
使用雙軸圖有幾個問題。 雙軸繪圖的一個問題是能夠操縱y軸基線的位置。 繼續基線會產生不同的結果:

While the chart looks like the German GDP and the global GDP go up at roughly the same rate (at least until 2014), they don’t. The global GDP increased by 80% until 2014; the GDP of Germany by 40%.
雖然該圖表看起來德國的GDP和全球GDP的增長速度大致相同(至少到2014年為止),但事實并非如此。 到2014年,全球GDP增長了80%; 德國的GDP增長了40%。
Tweaking the y-axis zero base line and the scale is the equivalent of changing the data itself. Especially with no coordinate lines, y-axis scales mean little and can be used to make any statement about the data you want it to.
調整y軸零基線和小數位數等同于更改數據本身。 尤其是在沒有坐標線的情況下, y軸刻度幾乎沒有什么意義,可以用于對所需數據進行任何聲明。

This and many other plots don’t suck because they violate the data’s integrity, but because they make the relationship and main point, if any, is not as obvious to the reader as it should be. A reader who is good at math and has a strong sense of numbers may be able to develop a strong image of the real relationship after a few minutes of staring, but the goal of visualization is to do the math and make the message clear to the reader.
該圖和其他許多圖不會因為它們破壞了數據的完整性而使它們無法正常運行,而是因為它們使關系和要點(如果有的話)對讀者而言不如應有的明顯。 精通數學并具有較強數字意識的讀者也許可以在凝視了幾分鐘后就可以建立出真實關系的強烈印象,但是可視化的目的是進行數學運算,并使信息清晰易懂。讀者。
A 2011 study conducted by Petra Isenberg, Anastasia Bezerianos, Pierre Dragicevic and Jean-Daniel Fekete showed 15 people four different charts that all showed values in different magnitudes, and observed how well these people could read the charts. One of them was a chart with a dual axis, which the researchers call “superimposed chart”.
Petra Isenberg,Anastasia Bezerianos,Pierre Dragicevic和Jean-Daniel Fekete于2011年進行的一項研究向15個人展示了四個不同的圖表,均顯示了不同的數值,并觀察了這些人對圖表的理解程度。 其中之一是帶有雙軸的圖表,研究人員稱其為“疊加圖表”。
We found across the board that the superimposed chart performed poorly both in terms of accuracy and time. Participants’ feedback from the questionnaire was also clearly against the superimposed chart and it was ranked lowest by all but one participant. Participants called it very confusing and demanding too much concentration or reflection to decipher the non-monotonic and discontinuous nature of the two scales. — A Study on Dual-Scale Data Charts
我們發現,疊加圖表在準確性和時間方面均表現不佳。 參與者從問卷中得到的反饋也明顯與疊加圖相對,除一名參與者外,其他所有參與者的反饋均名列最低。 與會者稱此舉非常令人困惑,需要過多的專注或反思才能破譯這兩個音階的非單調和不連續的本質。 —雙尺度數據圖表研究
Alternatives include two separate charts…
備選方案包括兩個單獨的圖表…

…or charts that show an increase metric on one scale, such as % increase.
…或顯示一個指標的增長指標的圖表,例如增長百分比。

Since dual axis plots give you complete control over the shape of the line, to distort the data to fit your narrative, stretch and move the axes vertically as much as you please.
由于雙軸圖使您可以完全控制線條的形狀,因此為了使數據變形以適合您的敘述,請盡可能多地垂直拉伸和移動軸。
徑向桿圖| 周罪 (Radial Bar Plots | A Circumference Sin)
Radial bar plots are often used because they are eye-catching, unique, and can be animated in a sleek way. The radial bar plot below shows the quantity of weapons exported by the top six largest exporters in 2017.
經常使用徑向條形圖,因為它們引人注目,獨特且可以以流暢的方式進行動畫處理。 下面的徑向條形圖顯示了2017年排名前六的最大出口商出口的武器數量。

Radial bar plots are not based by length of bar but by, like the pie chart, angle. The plots are based on a circular degree system, meaning that bars on the outside will always inevitably seem longer than ones closer to the center.
徑向條形圖不是基于條形的長度,而是基于餅形圖的角度。 這些圖基于圓度系統,這意味著外側的條形圖總是不可避免地要比靠近中心的條形圖更長。
A standard bar plot is a good alternative, but a lollipop plot may be suitable for the need of uniqueness and style in some plots that don’t distort the data.
標準條形圖是一個很好的選擇,但在某些不使數據失真的圖中,棒棒糖圖可能適合滿足唯一性和樣式的需要。
Naturally, if you needed to emphasize a particular group’s length, you can use a radial bar plot to make that group seem particularly larger. Put a group you want to de-emphasize at the center.
自然地,如果您需要強調特定組的長度,則可以使用徑向條形圖使該組顯得特別大。 將要取消強調的組放在中間。
Now, you’re set to distort data to fit whatever narrative you please! Just hope your readers don’t look too closely.
現在,您可以對數據進行變形以適合您所需要的任何敘述! 只是希望您的讀者不要看起來太近。
Plots from data to viz.
從數據繪制到可視化。
翻譯自: https://medium.com/dataseries/how-to-cleverly-distort-a-visualization-to-support-your-biased-narrative-41da2e826f95
標記偏見
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/274759.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/274759.shtml 英文地址,請注明出處:http://en.pswp.cn/news/274759.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!