奇跡網站可視化排行榜]_外觀可視化奇跡

奇跡網站可視化排行榜]

When reading a visualization is what we see really what we get?

閱讀可視化內容時,我們真正看到的是什么?

This post summarizes and accompanies our paper “Surfacing Visualization Mirages” that was presented at CHI 2020 with a best paper honorable mention. This post was written collaboratively by Andrew McNutt, Gordon Kindlmann, and Michael Correll.

這篇文章總結并伴隨了我們 2020年CHI上 發表的 論文“ 堆焊可視化奇跡 ”,并 獲得了最佳論文榮譽獎。 這篇文章是由 Andrew McNutt Gordon Kindlmann Michael Correll 合作撰寫的

TL; DR (TL;DR)

When reading a visualization, is what we see really what we get? There are a lot of ways that visualizations can mislead us, such that they appear to show us something interesting that disappears on closer inspection. Such visualization mirages can lead us to see patterns or draw conclusions that don’t exist in our data. We analyze these quarrelsome entities and provide a testing strategy for dispelling them.

閱讀可視化內容時,我們所看到的就是我們真正得到的嗎? 可視化有很多方式可以誤導我們,從而使它們看上去向我們展示了一些有趣的東西,這些東西在仔細檢查后就會消失。 這種可視化的幻影可以使我們看到數據中不存在的模式或得出結論。 我們分析了這些爭吵的實體,并提供了消除它們的測試策略。

介紹 (Intro)

The trained data visualization eye notices red flags that indicate that something misleading is going on. Dual axes that don’t quite match up. Misleading color ramps. Dubious sources. While learning how visualizations mislead is every bit as important as learning how they are created, even the studious can be deceived!

訓練有素的數據可視化眼睛會注意到紅旗,表明發生了誤導性事件。 不完全匹配的雙軸。 誤導的色帶。 可疑來源。 在學習可視化如何產生誤導時,與學習如何創建可視化一樣重要,即使是好學也可以被欺騙!

These dastardly deceptions need not be deviously devised either. While some visualizations are of course created by bad actors, most are not. Even designs crafted with the best of intentions yield all kinds of confusions and mistakes. An uncareful or careless analyst might hallucinate meaning where there isn’t any or jump to a conclusion that is only hazily supported.

這些卑鄙的欺騙也不需要被巧妙地設計出來。 雖然某些可視化當然是由不良參與者創建的,但大多數可視化不是。 即使是精心設計的設計也會產生各種混亂和錯誤。 粗心或粗心的分析師可能會產生幻覺,這意味著沒有答案或得出結論只是模糊地支持。

What can we say about the humble bar chart below on the left? It appears that location B has about 50% more sales than location A. Is the store in location A underperforming? Given the magnitude of the difference, I’d bet your knee jerk answer would be yes.

我們可以說一下左側下方的條形圖嗎? 看來位置B的銷售額比位置A多50%。位置A的商店表現不佳嗎? 考慮到差異的嚴重性,我敢打賭你的膝蓋混蛋的回答是肯定的。

Many patterns can hide behind aggregated data. For example, a simple average might hide dirty data, irregular population sizes, or a whole host of other problems. Simple aggregations like our humble bar chart are the foundation of many analytics tools, with subsequent analyses often being built on top of these potentially shaky grounds.

許多模式可以隱藏在聚合數據的后面 。 例如,簡單的平均值可能會隱藏臟數據,不規則的人口規模或其他許多問題。 像我們簡陋的條形圖這樣的簡單聚合是許多分析工具的基礎,隨后的分析通常建立在這些可能不穩定的基礎之上。

What are we to do about these problems? Should we stop analyzing data visually? Throw out our computers? Perhaps we can form a theory that will help us build a method for automatically surfacing and catching these quarrelsome errors?

這些問題我們該怎么辦? 我們應該停止視覺分析數據嗎? 扔掉我們的計算機嗎? 也許我們可以形成一種理論,以幫助我們建立一種自動顯示并捕獲這些爭端錯誤的方法?

A flow chart describing the visual analytics process.
The chart-making process is full of moments of agency for the chart creator. What counts as data? What is an appropriate way to manipulate that data? How do I show this data? How do I go about understanding it? The answers to all of these questions can affect the readers ultimate takeaways.
圖表制作過程充滿了圖表創建者的代理商活動。 什么算作數據? 什么是處理該數據的合適方法? 如何顯示此數據? 我如何去了解它? 所有這些問題的答案都會影響讀者的終極收獲。

輸入幻影 (Enter Mirages)

On the road to making a chart or visualization there are many steps and stages, each of which are liable to let error in. Consider a simplified model: an analyst decides how to curate data, how to wrangle it into a usable form, how to visually encode that data, and then finally actually how to read it. When the analyst makes a decision, they exercise agency and create an opportunity for error, which can cascade along this pipeline, creating illusory insights.

在制作圖表或可視化的過程中,有許多步驟和階段,每個階段都容易出錯。考慮一個簡化的模型:分析師決定如何整理數據,如何將數據整理成可用的形式,如何對數據進行視覺編碼,然后最終實際讀取數據。 當分析師做出決定時,他們會發揮代理作用,并創造出錯的機會,而錯誤的機會會沿著這條流水線級聯,從而產生虛幻的見解。

Something as innocuous as defining the bins of a histogram can mask underlying data quality issues, which might in turn lead to incorrect inferences about a trend. Arbitrary choices about axis ordering in a radar chart can cause a reader to falsely believe one job candidate is good while another is lacking. Decisions about what type of crime actually counts as a crime can lead to maps that drive radically different impressions about the role of crime in a particular area.

定義直方圖的bin之類的無害操作可能掩蓋了潛在的數據質量問題 ,從而可能導致對趨勢的錯誤推斷。 雷達圖上有關軸排序的任意選擇可能導致讀者錯誤地認為一個求職者是好的,而另一個則缺乏。 關于實際上將什么類型的犯罪視為犯罪的決定可能會導致地圖產生對特定區域中犯罪角色的根本不同印象。

Image for post
While charts tend to feel trust worthy, the harmless-seeming choices that create them can cause all sorts of hallucinations.
雖然圖表傾向于值得信任,但創建圖表的無害選擇可能會引起各種幻覺。

The first step in addressing a problem is often to name it, so we introduce a term for these errors: Visualization Mirages. We define them as

解決問題的第一步通常是為其命名,因此我們為這些錯誤引入一個術語:可視化幻影。 我們將它們定義為

any visualization where the cursory reading of the visualization would appear to support a particular message arising from the data, but where a closer re-examination would remove or cast significant doubt on this support.

任何可視化,其中可視化顯示的粗讀似乎都支持來自數據的特定消息,但是更仔細的重新檢查將消除這種支持或對該支持產生重大懷疑。

Mirages arise throughout visual analytics. They occur as the result of choices made about data. They come from design choices. They depend on what you are trying to do with the visualization. What may be misleading in the context of one task may not interfere with another. For instance, a poorly selected aspect ratio could produce a mirage for a viewer who wanted to know about the correlation in a scatterplot, but is unlikely to affect someone who just wants to find the biggest value.

視覺分析中出現了許多奇跡。 它們是由于對數據進行選擇而產生的。 它們來自設計選擇。 它們取決于您要如何處理可視化。 在一項任務中可能引起誤解的內容可能不會干擾另一項任務。 例如,對于那些想了解散點圖中的相關性,但不太可能影響只想找到最大價值的人,觀看者選擇的寬高比可能會產生幻影。

A man crawls across a desert following a a sign labeled “VA process” towards a mirage that is labeled “insights”
We all thirst for insight in visual analytics (or anywhere else). This desire can cause us to overlook important details or forget best practices.
我們都渴望在可視化分析(或其他任何方面)上獲得見識。 這種渴望會導致我們忽略重要的細節或忘記最佳實踐。

The errors that create mirages have both familiar and unfamiliar names: Drill-down Bias, Forgotten Population or Missing Dataset, Cherry Picking, Modifiable Areal Unit Problem, Non-sequitur Visualizations, and so many more. An annotated and expanded version of this list is included in the paper supplement. There is a sprawling universe of subtle and tricky ways that mirages can arise.

產生海市ages樓的錯誤既有熟悉的名稱,又有不熟悉的名稱: 向下鉆取偏差 , 被遺忘的總體或缺失的數據集 ,Cherry采摘, 可修改的地域單位問題 , 非sequitur可視化等等。 此列表的帶注釋的擴展版本包含在論文補充中 。 幻影出現的范圍是微妙而棘手的。

To make matters worse, there are few automated tools to help the reader or chart creator know that they haven’t deceived themselves in pursuit of insight.

更糟的是,幾乎沒有自動化工具可以幫助讀者或圖表制作者知道他們在追求洞察力方面并沒有欺騙自己。

這些事情真的發生了嗎? (Do these things really happen?)

Imagine you are curious about the trend of global energy usage over time. A natural way to address these questions would be to fire up Tableau and drop in the World Indicators dataset, which consists of vital world statistics from 2000 to 2012. The trend over time (a) shows that there was a sharp decrease in 2012! This would be great news for the environment, were it not illusory, as we see in (b) when checking the set of missing records.

想象一下,您對全球能源使用量隨時間變化的趨勢感到好奇。 解決這些問題的自然方法是啟動Tableau并放下World Indicators數據集 ,該數據集包含2000年至2012年的重要世界統計數據。隨著時間的推移(a),表明2012年急劇下降! 如果不是虛幻的話,這對于環境而言將是一個好消息,正如我們在(b)中檢查缺失記錄集時所看到的那樣。

A line chart with the caption energy down? A bar chart with the caption Count of Nulls. A line chart with energy up?

If we try to quash these data problems by switching the aggregation in our line chart from SUM to MEAN, we find that the opposite is true!! There was a sharp increase in 2012. Unfortunately this conclusion is another mirage. The only non-null entries for 2012 are OECD countries. These countries have much higher energy usage than other countries across all years (d).

如果我們嘗試通過將折線圖中的匯總從SUM切換到MEAN來緩解這些數據問題,則會發現相反的事實!! 2012年急劇增加。不幸的是,這一結論是另一個幻象。 2012年唯一的非空條目是經合組織國家。 這些年來,這些國家的能源使用量比其他國家高得多(d)。

Two line charts. Left one shows Energy Usage vs Life Expectancy over time, the right one show energy use over time

Given these irregularities we can try removing 2012 from the data, and focus on the gradual upward trend in energy usage in the rest of the data. As we can see on the left, it appears that energy usage is tightly correlated with average life expectancy, perhaps more power means a happier life for everyone after all. Unfortunately this too is a mirage. The y-axis of this chart has been altered to make the trends appear similar, and obscures the fact that energy use is flat for most countries.

鑒于這些違規情況,我們可以嘗試從數據中刪除2012年,并關注其余數據中能源使用量的逐漸上升趨勢。 正如我們在左側看到的那樣,能源使用似乎與平均預期壽命緊密相關,也許更高的功率畢竟意味著每個人的幸福生活。 不幸的是,這也是一個海市rage樓。 更改了此圖表的y軸,以使趨勢看起來相似,并且掩蓋了大多數國家的能源使用量持平的事實。

Now of course, you’re probably saying:

當然,現在您可能會說:

但是我真的很聰明,我不會犯這種錯誤 (But I’m really smart, I wouldn’t make this type of mistake)

That’s great! Congrats on being smart. Unfortunately, even those with high data visualization literacy make mistakes. Visualizations are rhetorical devices that are easy to trust too deeply. Charting systems often give an air of credibility that they don’t necessarily warrant. It is often easier to trust your initial inferences and move on. Interactive visualizations with exploratory tools that help to might dispel a mirage are often only glanced at by casual readers. Sometimes you are just tired and miss something “obvious”.

那很棒! 恭喜你聰明。 不幸的是,即使那些具有較高數據可視化素養的人也會犯錯。 可視化是易于深深信任的修辭手段 。 制圖系統通常會給人一種不一定要保證的可信度。 相信最初的推論并繼續前進通常會更容易。 具有探索性工具的交互式可視化工具有助于驅散海市rage樓,通常只有休閑讀者才能看一眼 。 有時您只是累了而錯過了一些“顯而易見的”東西。

A chart showing the gun deaths in florida over time
This infamous chart appears on first glance to be saying that ‘Stand Your Ground’ decreased gun deaths, but on closer inspection it shows the opposite! Terrifying! (The author of this chart wasn’t actually trying to confuse anyone, they were just trying to explore a new design language)
這張臭名昭著的圖表乍一看似乎是在說“站起來”減少了槍支死亡,但仔細檢查卻發現情況恰恰相反! 太恐怖了! (此圖表的作者實際上并沒有試圖使任何人困惑,他們只是在嘗試探索一種新的設計語言)

Some visualization problems are easy to detect, such as axes pointed in an un-intuitive or unconventional direction or a pie chart with more than a handful of wedges. This type of best practice knowledge isn’t always available, for instance, what if you are trying to use a novel type of visualization? (A xenographic perhaps?) There’d be nothing beyond your intuition to help guide you.

某些可視化問題很容易檢測,例如指向非直覺或非常規方向的軸或帶有多個楔形的餅圖。 這種類型的最佳實踐知識并不總是可用,例如,如果您嘗試使用新穎的可視化類型怎么辦? (也許是xenographic ?)除了您的直覺之外,沒有什么可以幫助指導您。

Other, more terrifying, problems only arise for particular datasets when paired with particular charts. To address these we introduce a testing strategy (derived from Metamorphic Testing) that can identify some of this thorny class of errors, such as the aggregation masking unreliable inputs that we saw earlier with our humble bar chart.

其他更可怕的問題僅在與特定圖表配對時才針對特定數據集出現。 為了解決這些問題,我們引入了一種測試策略(源自Metamorphic Testing ),該策略可以識別一些棘手的錯誤類別,例如聚合掩蓋了我們之前在謙虛的條形圖中看到的不可靠的輸入。

Testing for errors is easy if you know the correct behavior of a system. Simply inspect the system and report your findings. In errors in the hinterlands of data and encoding we are left without such a compass. Instead, we try to find guidance by identifying symmetries across data changes.

如果您知道系統的正確行為,則測試錯誤很容易。 只需檢查系統并報告您的發現。 在數據和編碼腹地的錯誤中,我們沒有指南針。 相反,我們嘗試通過識別跨數據更改的對稱性來找到指導。

The order in which you draw the dots in a scatterplot shouldn’t matter, right? Yet, depending on the dataset, it often can!!! This can erase data classes or cause false inferences. We test for this property by shuffling the order of the input data and then comparing the pixel-wise difference between the two images. If the difference is above a certain threshold we know that there may be a problem. This is the essence of our technique: for a particular dataset, execute a change that should have a predictable result (here no change), and compare the results.

在散點圖中繪制點的順序不重要,對吧? 但是,根據數據集,通常可以!!! 這可能會擦除數據類或導致錯誤的推斷。 我們通過改組輸入數據的順序,然后比較兩個圖像之間的像素差異來測試此屬性。 如果差異高于某個閾值,我們知道可能存在問題。 這是我們技術的本質:對于特定的數據集,執行應具有可預測結果(此處無變化)的更改,然后比較結果。

A series of 3 scatterplots. The first two show the same data but appear different. The third image highlights the differences
A simple scatterplot can hide the distributions it displays through draw order. This problem won’t affect every dataset, but here it hides the prevalence of the Americas in the middle of the distribution.
一個簡單的散點圖可以通過繪制順序隱藏其顯示的分布。 這個問題不會影響每個數據集,但是在這里它掩蓋了美洲在分布中間的普遍性。

While it’s still in early development, we find that this approach can effectively catch a wide variety of visualization errors that fall in this intersection of matching encoding to data. These techniques can help surface errors in over-plotting, aggregation, missing aggregation, and a variety of other contexts. It remains an open challenge on how to effectively compute these errors (as their computation can be burdensome) as well as how to best describe these errors to the user.

盡管它仍處于早期開發階段,但我們發現這種方法可以有效地捕獲由于將編碼與數據進行匹配而出現的各種可視化錯誤。 這些技術可以幫助在過度繪圖,聚合,缺少聚合以及其他各種情況下出現表面錯誤。 如何有效地計算這些錯誤(因為它們的計算可能很麻煩)以及如何最好地向用戶描述這些錯誤仍然是一個公開的挑戰。

那在哪里離開我們? (Where does that leave us?)

Visualizations, and the people who create them, are prone to failure in subtle and difficult ways. We believe that visual analytics systems should do more to protect their users from themselves. One way these systems can do this is to surface visualization mirages to their users as part of the analytics process, which, hopefully will guide them towards safer and more effective analyses. Applying our metamorphic testing for visualization approach is just one tool in the visualization validation toolbox. The right interfaces to accomplish this goal is still unknown, although applying a metaphor of software linting seems promising. For more details check out our paper, take a look at the code repo for the project, or watch our CHI talk.

可視化及其創建人員很容易以微妙而困難的方式失敗。 我們認為視覺分析系統應該做更多的事情來保護用戶免受自身傷害。 這些系統可以做到這一點的一種方法是在分析過程中向用戶展現可視化的幻影,這有望引導他們進行更安全,更有效的分析。 將我們的變質測試應用于可視化方法只是可視化驗證工具箱中的一種工具。 盡管應用軟件掉落的隱喻似乎 很有希望 ,但實現該目標的正確接口仍然未知。 有關更多詳細信息,請查看我們的論文 ,查看該項目的代碼存儲庫 ,或觀看我們的CHI演講 。

翻譯自: https://medium.com/multiple-views-visualization-research-explained/surfacing-visualization-mirages-8d39e547e38c

奇跡網站可視化排行榜]

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/391462.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/391462.shtml
英文地址,請注明出處:http://en.pswp.cn/news/391462.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Oracle自動性能統計

Oracle自動性能統計 高效診斷性能問題,需要提供完整可用的統計信息,好比醫生給病人看病的望聞問切,才能夠正確的確診,然后再開出相應的藥方。Oracle數據庫為系統、會話以及單獨的sql語句生成多種類型的累積統計信息。本文主要描述…

numpy2

1、通用函數,是一種在ndarray數據中進行逐元素操作的函數。某些函數接受一個或多個標量數值,并產生一個或多個標量結果,通用函數就是對這些函數的封裝。 1、常用的一元通用函數有:abs\fabs  sqrt   square  exp  log\log2…

Apache Prefork、Worker和Event三種MPM簡單分析

(1) Prefork MPM (優點) :使用多個子進程,每個子進程只有一個線程來處理一個 http 連接,不用擔心線程安全問題缺點:內存消耗大,不擅長處理高并發環境,使用keep-alive長連接時要等到超…

grasshopper_如何使用Google的Grasshopper編碼應用程序來學習手機上的編碼基礎知識...

grasshopper什么是蚱hopper? (What is Grasshopper?) Grasshopper is an interactive education app for learning about coding. It began at Google as an experimental project created by a group called Area 120. Grasshopper是一個用于學習編碼的交互式教育…

機器學習 量子_量子機器學習:神經網絡學習

機器學習 量子My last articles tackled Bayes nets on quantum computers (read it here!), and k-means clustering, our first steps into the weird and wonderful world of quantum machine learning.我的最后一篇文章討論了量子計算機上的貝葉斯網絡( 在這里閱讀&#xf…

leetcode 179. 最大數(排序)

給定一組非負整數 nums,重新排列每個數的順序(每個數不可拆分)使之組成一個最大的整數。 注意:輸出結果可能非常大,所以你需要返回一個字符串而不是整數。 示例 1: 輸入:nums [10,2] 輸出&a…

test3

test3 轉載于:https://www.cnblogs.com/Forever77/p/11441068.html

linux滲透測試_滲透測試:選擇正確的(Linux)工具棧來修復損壞的IT安全性

linux滲透測試Got IT infrastructure? Do you know how secure it is? The answer will probably hurt, but this is the kind of bad news you’re better off getting sooner rather than later.有IT基礎架構嗎? 你知道它有多安全嗎? 答案可能會很痛…

BZOJ 1176: [Balkan2007]Mokia

一道CDQ分治的模板題,然而我De了一上午Bug...... 按時間分成左右兩半,按x坐標排序然后把y坐標丟到樹狀數組里,掃一遍遇到左邊的就add,遇到右邊的query 幾個弱智出了bug的點, 一是先分了左右兩半再排序,保證的是這次的左…

深入理解InnoDB(1)—行的存儲結構

1.InnoDB頁的簡介 頁(Page)是 Innodb 存儲引擎用于管理數據的最小磁盤單位。常見的頁類型有數據頁、Undo 頁、系統頁、事務數據頁等 2.InnoDB行的存儲格式 我們插入MySQL的記錄在InnoDB中可能以4中行格式存儲,分別是Compact、Redundant、D…

做嵌入式的必須學Android嗎

做嵌入式的必須學Android嗎Android方向適合哪些人呢?適合那些已經在自己領域有了一定的工作經驗的人,適合作為自己的拓展,適合提升自己的能力,譬如說已經做三年Linux驅動,就可以嘗試拓展去做Android驅動首先從技術角度…

test4

test4 轉載于:https://www.cnblogs.com/Forever77/p/11441980.html

boltzmann_推薦系統系列第7部分:用于協同過濾的Boltzmann機器的3個變體

boltzmannRecSys系列 (RecSys Series) Update: This article is part of a series where I explore recommendation systems in academia and industry. Check out the full series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, and Part 7.更新: 本文是我探索…

.net 初學者_在此初學者課程中學習使用TensorFlow 2.0開發神經網絡

.net 初學者Learn how to use TensorFlow 2.0 in this full video course from Tech with Tim. This course will show you how to create neural networks with Python and TensorFlow 2.0.在Tech與Tim的完整視頻課程中,學習如何使用TensorFlow 2.0。 本課程將向您…

AndroidStudio怎樣導入library項目開源庫 - 轉

https://jingyan.baidu.com/article/1974b2898917aff4b1f77415.html轉載于:https://www.cnblogs.com/EasyLive2006/p/7477719.html

深入理解InnoDB(2)—頁的存儲結構

1. 記錄頭信息 上一篇博客說到每行記錄都會有記錄頭信息,用來記錄每一行的一些屬性 Compact行記錄的記錄頭信息為例 1.1 delete_mask 這個屬性標記著當前記錄是否被刪除,占用1個二進制位,值為0的時候代表記錄并沒有被刪除,為1的…

PHP中的命名空間

1. PHP中的命名空間是什么? 官方解釋在此: 命名空間概述 命名空間用一句話說,就是:把 類、函數、變量 等放到邏輯子文件夾中去,以避免命名沖突。 注:命名空間跟實際代碼文件在文件系統中的路徑沒有任何關系…

pandas 入門

pandas簡介:pandas包含的數據結構和數據處理工具的設計使得利用進行數據清洗和數據分析非常快捷;與numpy的區別,pandas用來處理表格型或異質型數據的,而numpy更適合處理同質型的數值類數據。 1、Series簡介 1、Series是一種一維的…

傳智播客軟件測試第一期_播客:冒險如何推動一位軟件工程師的職業發展

傳智播客軟件測試第一期On this weeks episode of the freeCodeCamp podcast, Abbey chats with developer and wearer of many hats Princiya about how she changed careers, moved to Berlin, and worked her way up to a lead role.在本周的freeCodeCamp播客節目中&#xf…

爬蟲神經網絡_股市篩選和分析:在投資中使用網絡爬蟲,神經網絡和回歸分析...

爬蟲神經網絡與AI交易 (Trading with AI) Stock markets tend to react very quickly to a variety of factors such as news, earnings reports, etc. While it may be prudent to develop trading strategies based on fundamental data, the rapid changes in the stock mar…