奇跡網站可視化排行榜]
When reading a visualization is what we see really what we get?
閱讀可視化內容時,我們真正看到的是什么?
This post summarizes and accompanies our paper “Surfacing Visualization Mirages” that was presented at CHI 2020 with a best paper honorable mention. This post was written collaboratively by Andrew McNutt, Gordon Kindlmann, and Michael Correll.
這篇文章總結并伴隨了我們 在 2020年CHI上 發表的 論文“ 堆焊可視化奇跡 ”,并 獲得了最佳論文榮譽獎。 這篇文章是由 Andrew McNutt , Gordon Kindlmann 和 Michael Correll 合作撰寫的 。
TL; DR (TL;DR)
When reading a visualization, is what we see really what we get? There are a lot of ways that visualizations can mislead us, such that they appear to show us something interesting that disappears on closer inspection. Such visualization mirages can lead us to see patterns or draw conclusions that don’t exist in our data. We analyze these quarrelsome entities and provide a testing strategy for dispelling them.
閱讀可視化內容時,我們所看到的就是我們真正得到的嗎? 可視化有很多方式可以誤導我們,從而使它們看上去向我們展示了一些有趣的東西,這些東西在仔細檢查后就會消失。 這種可視化的幻影可以使我們看到數據中不存在的模式或得出結論。 我們分析了這些爭吵的實體,并提供了消除它們的測試策略。
介紹 (Intro)
The trained data visualization eye notices red flags that indicate that something misleading is going on. Dual axes that don’t quite match up. Misleading color ramps. Dubious sources. While learning how visualizations mislead is every bit as important as learning how they are created, even the studious can be deceived!
訓練有素的數據可視化眼睛會注意到紅旗,表明發生了誤導性事件。 不完全匹配的雙軸。 誤導的色帶。 可疑來源。 在學習可視化如何產生誤導時,與學習如何創建可視化一樣重要,即使是好學也可以被欺騙!
These dastardly deceptions need not be deviously devised either. While some visualizations are of course created by bad actors, most are not. Even designs crafted with the best of intentions yield all kinds of confusions and mistakes. An uncareful or careless analyst might hallucinate meaning where there isn’t any or jump to a conclusion that is only hazily supported.
這些卑鄙的欺騙也不需要被巧妙地設計出來。 雖然某些可視化當然是由不良參與者創建的,但大多數可視化不是。 即使是精心設計的設計也會產生各種混亂和錯誤。 粗心或粗心的分析師可能會產生幻覺,這意味著沒有答案或得出結論只是模糊地支持。
What can we say about the humble bar chart below on the left? It appears that location B has about 50% more sales than location A. Is the store in location A underperforming? Given the magnitude of the difference, I’d bet your knee jerk answer would be yes.
我們可以說一下左側下方的條形圖嗎? 看來位置B的銷售額比位置A多50%。位置A的商店表現不佳嗎? 考慮到差異的嚴重性,我敢打賭你的膝蓋混蛋的回答是肯定的。
Many patterns can hide behind aggregated data. For example, a simple average might hide dirty data, irregular population sizes, or a whole host of other problems. Simple aggregations like our humble bar chart are the foundation of many analytics tools, with subsequent analyses often being built on top of these potentially shaky grounds.
許多模式可以隱藏在聚合數據的后面 。 例如,簡單的平均值可能會隱藏臟數據,不規則的人口規模或其他許多問題。 像我們簡陋的條形圖這樣的簡單聚合是許多分析工具的基礎,隨后的分析通常建立在這些可能不穩定的基礎之上。
What are we to do about these problems? Should we stop analyzing data visually? Throw out our computers? Perhaps we can form a theory that will help us build a method for automatically surfacing and catching these quarrelsome errors?
這些問題我們該怎么辦? 我們應該停止視覺分析數據嗎? 扔掉我們的計算機嗎? 也許我們可以形成一種理論,以幫助我們建立一種自動顯示并捕獲這些爭端錯誤的方法?
輸入幻影 (Enter Mirages)
On the road to making a chart or visualization there are many steps and stages, each of which are liable to let error in. Consider a simplified model: an analyst decides how to curate data, how to wrangle it into a usable form, how to visually encode that data, and then finally actually how to read it. When the analyst makes a decision, they exercise agency and create an opportunity for error, which can cascade along this pipeline, creating illusory insights.
在制作圖表或可視化的過程中,有許多步驟和階段,每個階段都容易出錯。考慮一個簡化的模型:分析師決定如何整理數據,如何將數據整理成可用的形式,如何對數據進行視覺編碼,然后最終實際讀取數據。 當分析師做出決定時,他們會發揮代理作用,并創造出錯的機會,而錯誤的機會會沿著這條流水線級聯,從而產生虛幻的見解。
Something as innocuous as defining the bins of a histogram can mask underlying data quality issues, which might in turn lead to incorrect inferences about a trend. Arbitrary choices about axis ordering in a radar chart can cause a reader to falsely believe one job candidate is good while another is lacking. Decisions about what type of crime actually counts as a crime can lead to maps that drive radically different impressions about the role of crime in a particular area.
定義直方圖的bin之類的無害操作可能掩蓋了潛在的數據質量問題 ,從而可能導致對趨勢的錯誤推斷。 雷達圖上有關軸排序的任意選擇可能導致讀者錯誤地認為一個求職者是好的,而另一個則缺乏。 關于實際上將什么類型的犯罪視為犯罪的決定可能會導致地圖產生對特定區域中犯罪角色的根本不同印象。

The first step in addressing a problem is often to name it, so we introduce a term for these errors: Visualization Mirages. We define them as
解決問題的第一步通常是為其命名,因此我們為這些錯誤引入一個術語:可視化幻影。 我們將它們定義為
“any visualization where the cursory reading of the visualization would appear to support a particular message arising from the data, but where a closer re-examination would remove or cast significant doubt on this support.”
任何可視化,其中可視化顯示的粗讀似乎都支持來自數據的特定消息,但是更仔細的重新檢查將消除這種支持或對該支持產生重大懷疑。 ”
Mirages arise throughout visual analytics. They occur as the result of choices made about data. They come from design choices. They depend on what you are trying to do with the visualization. What may be misleading in the context of one task may not interfere with another. For instance, a poorly selected aspect ratio could produce a mirage for a viewer who wanted to know about the correlation in a scatterplot, but is unlikely to affect someone who just wants to find the biggest value.
視覺分析中出現了許多奇跡。 它們是由于對數據進行選擇而產生的。 它們來自設計選擇。 它們取決于您要如何處理可視化。 在一項任務中可能引起誤解的內容可能不會干擾另一項任務。 例如,對于那些想了解散點圖中的相關性,但不太可能影響只想找到最大價值的人,觀看者選擇的寬高比可能會產生幻影。
The errors that create mirages have both familiar and unfamiliar names: Drill-down Bias, Forgotten Population or Missing Dataset, Cherry Picking, Modifiable Areal Unit Problem, Non-sequitur Visualizations, and so many more. An annotated and expanded version of this list is included in the paper supplement. There is a sprawling universe of subtle and tricky ways that mirages can arise.
產生海市ages樓的錯誤既有熟悉的名稱,又有不熟悉的名稱: 向下鉆取偏差 , 被遺忘的總體或缺失的數據集 ,Cherry采摘, 可修改的地域單位問題 , 非sequitur可視化等等。 此列表的帶注釋的擴展版本包含在論文補充中 。 幻影出現的范圍是微妙而棘手的。
To make matters worse, there are few automated tools to help the reader or chart creator know that they haven’t deceived themselves in pursuit of insight.
更糟的是,幾乎沒有自動化工具可以幫助讀者或圖表制作者知道他們在追求洞察力方面并沒有欺騙自己。
這些事情真的發生了嗎? (Do these things really happen?)
Imagine you are curious about the trend of global energy usage over time. A natural way to address these questions would be to fire up Tableau and drop in the World Indicators dataset, which consists of vital world statistics from 2000 to 2012. The trend over time (a) shows that there was a sharp decrease in 2012! This would be great news for the environment, were it not illusory, as we see in (b) when checking the set of missing records.
想象一下,您對全球能源使用量隨時間變化的趨勢感到好奇。 解決這些問題的自然方法是啟動Tableau并放下World Indicators數據集 ,該數據集包含2000年至2012年的重要世界統計數據。隨著時間的推移(a),表明2012年急劇下降! 如果不是虛幻的話,這對于環境而言將是一個好消息,正如我們在(b)中檢查缺失記錄集時所看到的那樣。
If we try to quash these data problems by switching the aggregation in our line chart from SUM to MEAN, we find that the opposite is true!! There was a sharp increase in 2012. Unfortunately this conclusion is another mirage. The only non-null entries for 2012 are OECD countries. These countries have much higher energy usage than other countries across all years (d).
如果我們嘗試通過將折線圖中的匯總從SUM切換到MEAN來緩解這些數據問題,則會發現相反的事實!! 2012年急劇增加。不幸的是,這一結論是另一個幻象。 2012年唯一的非空條目是經合組織國家。 這些年來,這些國家的能源使用量比其他國家高得多(d)。
Given these irregularities we can try removing 2012 from the data, and focus on the gradual upward trend in energy usage in the rest of the data. As we can see on the left, it appears that energy usage is tightly correlated with average life expectancy, perhaps more power means a happier life for everyone after all. Unfortunately this too is a mirage. The y-axis of this chart has been altered to make the trends appear similar, and obscures the fact that energy use is flat for most countries.
鑒于這些違規情況,我們可以嘗試從數據中刪除2012年,并關注其余數據中能源使用量的逐漸上升趨勢。 正如我們在左側看到的那樣,能源使用似乎與平均預期壽命緊密相關,也許更高的功率畢竟意味著每個人的幸福生活。 不幸的是,這也是一個海市rage樓。 更改了此圖表的y軸,以使趨勢看起來相似,并且掩蓋了大多數國家的能源使用量持平的事實。
Now of course, you’re probably saying:
當然,現在您可能會說:
但是我真的很聰明,我不會犯這種錯誤 (But I’m really smart, I wouldn’t make this type of mistake)
That’s great! Congrats on being smart. Unfortunately, even those with high data visualization literacy make mistakes. Visualizations are rhetorical devices that are easy to trust too deeply. Charting systems often give an air of credibility that they don’t necessarily warrant. It is often easier to trust your initial inferences and move on. Interactive visualizations with exploratory tools that help to might dispel a mirage are often only glanced at by casual readers. Sometimes you are just tired and miss something “obvious”.
那很棒! 恭喜你聰明。 不幸的是,即使那些具有較高數據可視化素養的人也會犯錯。 可視化是易于深深信任的修辭手段 。 制圖系統通常會給人一種不一定要保證的可信度。 相信最初的推論并繼續前進通常會更容易。 具有探索性工具的交互式可視化工具有助于驅散海市rage樓,通常只有休閑讀者才能看一眼 。 有時您只是累了而錯過了一些“顯而易見的”東西。

Some visualization problems are easy to detect, such as axes pointed in an un-intuitive or unconventional direction or a pie chart with more than a handful of wedges. This type of best practice knowledge isn’t always available, for instance, what if you are trying to use a novel type of visualization? (A xenographic perhaps?) There’d be nothing beyond your intuition to help guide you.
某些可視化問題很容易檢測,例如指向非直覺或非常規方向的軸或帶有多個楔形的餅圖。 這種類型的最佳實踐知識并不總是可用,例如,如果您嘗試使用新穎的可視化類型怎么辦? (也許是xenographic ?)除了您的直覺之外,沒有什么可以幫助指導您。
Other, more terrifying, problems only arise for particular datasets when paired with particular charts. To address these we introduce a testing strategy (derived from Metamorphic Testing) that can identify some of this thorny class of errors, such as the aggregation masking unreliable inputs that we saw earlier with our humble bar chart.
其他更可怕的問題僅在與特定圖表配對時才針對特定數據集出現。 為了解決這些問題,我們引入了一種測試策略(源自Metamorphic Testing ),該策略可以識別一些棘手的錯誤類別,例如聚合掩蓋了我們之前在謙虛的條形圖中看到的不可靠的輸入。
Testing for errors is easy if you know the correct behavior of a system. Simply inspect the system and report your findings. In errors in the hinterlands of data and encoding we are left without such a compass. Instead, we try to find guidance by identifying symmetries across data changes.
如果您知道系統的正確行為,則測試錯誤很容易。 只需檢查系統并報告您的發現。 在數據和編碼腹地的錯誤中,我們沒有指南針。 相反,我們嘗試通過識別跨數據更改的對稱性來找到指導。
The order in which you draw the dots in a scatterplot shouldn’t matter, right? Yet, depending on the dataset, it often can!!! This can erase data classes or cause false inferences. We test for this property by shuffling the order of the input data and then comparing the pixel-wise difference between the two images. If the difference is above a certain threshold we know that there may be a problem. This is the essence of our technique: for a particular dataset, execute a change that should have a predictable result (here no change), and compare the results.
在散點圖中繪制點的順序不重要,對吧? 但是,根據數據集,通常可以!!! 這可能會擦除數據類或導致錯誤的推斷。 我們通過改組輸入數據的順序,然后比較兩個圖像之間的像素差異來測試此屬性。 如果差異高于某個閾值,我們知道可能存在問題。 這是我們技術的本質:對于特定的數據集,執行應具有可預測結果(此處無變化)的更改,然后比較結果。
While it’s still in early development, we find that this approach can effectively catch a wide variety of visualization errors that fall in this intersection of matching encoding to data. These techniques can help surface errors in over-plotting, aggregation, missing aggregation, and a variety of other contexts. It remains an open challenge on how to effectively compute these errors (as their computation can be burdensome) as well as how to best describe these errors to the user.
盡管它仍處于早期開發階段,但我們發現這種方法可以有效地捕獲由于將編碼與數據進行匹配而出現的各種可視化錯誤。 這些技術可以幫助在過度繪圖,聚合,缺少聚合以及其他各種情況下出現表面錯誤。 如何有效地計算這些錯誤(因為它們的計算可能很麻煩)以及如何最好地向用戶描述這些錯誤仍然是一個公開的挑戰。
那在哪里離開我們? (Where does that leave us?)
Visualizations, and the people who create them, are prone to failure in subtle and difficult ways. We believe that visual analytics systems should do more to protect their users from themselves. One way these systems can do this is to surface visualization mirages to their users as part of the analytics process, which, hopefully will guide them towards safer and more effective analyses. Applying our metamorphic testing for visualization approach is just one tool in the visualization validation toolbox. The right interfaces to accomplish this goal is still unknown, although applying a metaphor of software linting seems promising. For more details check out our paper, take a look at the code repo for the project, or watch our CHI talk.
可視化及其創建人員很容易以微妙而困難的方式失敗。 我們認為視覺分析系統應該做更多的事情來保護用戶免受自身傷害。 這些系統可以做到這一點的一種方法是在分析過程中向用戶展現可視化的幻影,這有望引導他們進行更安全,更有效的分析。 將我們的變質測試應用于可視化方法只是可視化驗證工具箱中的一種工具。 盡管應用軟件掉落的隱喻似乎 很有希望 ,但實現該目標的正確接口仍然未知。 有關更多詳細信息,請查看我們的論文 ,查看該項目的代碼存儲庫 ,或觀看我們的CHI演講 。
翻譯自: https://medium.com/multiple-views-visualization-research-explained/surfacing-visualization-mirages-8d39e547e38c
奇跡網站可視化排行榜]
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/391462.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/391462.shtml 英文地址,請注明出處:http://en.pswp.cn/news/391462.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!