web數據交互

Most good data projects start with the analyst doing something to get a feel for the data that they are dealing with.

大多數好的數據項目都是從分析師開始做一些事情，以便對他們正在處理的數據有所了解。

They might hack together a Jupyter notebook to look at data summaries, first few rows of data and matplotlib charts. Some might look through the data as an Excel sheet and fidget with pivot tables. The ones truly one with the data might even prefer to stare directly at the raw table of data.

他們可能會聚在一起使用Jupyter筆記本來查看數據摘要，數據的前幾行和matplotlib圖表。有些人可能會將數據視為Excel工作表和帶有數據透視表的小工具。真正擁有數據的人甚至可能更喜歡直接盯著原始數據表。

None of these are ideal solutions. Some of these solutions might be only suitable for the masochistic among us. So what is a person to do?

這些都不是理想的解決方案。其中一些解決方案可能僅適用于我們中間的受虐狂。那么一個人該怎么辦？

For me, I prefer to build a web app for data exploration.

對我來說，我更喜歡構建一個用于數據探索的Web應用程序。

There’s something about the ability to slice, group, filter, and most importantly — see the data, that helps me to understand it and help me to formulate questions and hypotheses that I want answered in the .

關于切片，分組，過濾的功能，最重要的是- 查看數據，這有助于我理解數據，并幫助我提出要在回答的問題和假設。

It allows me to interact with the data visually.

它使我可以直觀地與數據進行交互。

My preferred toolkit of choice for this task these days is Plotly and Streamlit. I’ve written enough about Plotly over the last while — I think it’s the best data visualisation package out there for Python. But Streamlit has really changed the way I work. Because it is so terse, it takes almost no extra effort to turn my plots and comments in a python script into a web app with interactivity as I tinker. (FYI — I wrote a comparison between Dash and Streamlit here)

這些天，我首選的首選工具包是Plotly和Streamlit 。上一陣子我已經寫了足夠多的有關Plotly的文章-我認為這是Python最好的數據可視化軟件包。但是Streamlit確實改變了我的工作方式。因為它是如此的簡潔，所以我幾乎不需要花費額外的精力就可以將python腳本中的繪圖和注釋轉換為具有交互性的Web應用程序。 (僅供參考-我在這里寫了Dash和Streamlit之間的比較 )

I prefer to build a web app for data exploration
我更喜歡構建用于數據探索的Web應用程序

So in this article, I’d like to share with a simple example building a data exploration app with these tools.

因此，在本文中，我想與一個使用這些工具構建數據探索應用程序的簡單示例分享。

Now, for a data project – we need data, and here I will be using stats from the NBA. Learning programming can be dry, so using something relatable like sports data helps me to stay engaged; and hopefully it will for you too.

現在，對于數據項目，我們需要數據，在這里，我將使用NBA的統計數據。學習編程可能很枯燥，因此使用諸如體育數據之類的相關內容有助于我保持專注。希望它也對您有用。

(Don’t worry if you don’t follow the NBA as the focus is on the data science and programming!)

(如果您不關注NBA，請不要擔心，因為重點是數據科學和編程！)

在開始之前 (Before we get started)

To follow along, install a few packages — plotly, streamlit and pandas. Install each (in your virtual environment) with a simple pip install [PACKAGE_NAME].

plotly ，請安裝一些軟件包plotly ， streamlit和pandas 。通過簡單的pip install [PACKAGE_NAME]安裝每個組件(在您的虛擬環境中)。

The code for this article is on my GitHub repo here, so you can download/copy/fork away to your heart’s content.

本文的代碼位于我的GitHub存儲庫中，因此您可以下載/復制/分叉到您的內心。

The script is called data_explorer_app.py — so you can run it from the shell with:

該腳本名為data_explorer_app.py ，因此您可以使用以下命令從Shell運行該腳本：

streamlit run data_explorer_app.py

Oh, this is the first in a set of data science / data analysis articles that I plan to write about using NBA data. It’ll all go to that repo, so keep your eyes peeled!

哦，這是我計劃就使用NBA數據撰寫的一組數據科學/數據分析文章中的第一篇。一切都會去那個倉庫，所以要睜大眼睛！

If you are following along, import the key libraries with:

如果您遵循以下步驟，請使用以下命令導入密鑰庫：

import pandas as pd
import plotly.express as px
import streamlit as st

And we are ready to go.

我們已經準備好出發了。

數據深度潛水 (Data Deep Diving)

流式照明 (Streamlit-ing)

We use Streamlit here, as it is designed to help us build data apps quickly. So what we are going to build is a Streamlit app that will then run locally. (For more information — you can check out my Dash v Streamlit article here.)

我們在這里使用Streamlit，因為它旨在幫助我們快速構建數據應用程序。因此，我們要構建的是Streamlit應用程序，該應用程序然后將在本地運行。 (有關更多信息，您可以在此處查看我的Dash v Streamlit文章。)

If you’ve never used Streamlit, this is all you need to build bare-bones app:

如果您從未使用過Streamlit，這就是構建準系統應用程序所需要的：

import streamlit as st
st.write("Hello, world!")

Save this as app.py, and then execute it with a shell command streamlit run app.py:

將其另存為app.py ，然后使用shell命令streamlit run app.py執行它：

Image for post — Look, ma, it’s a web app!

And you have a functioning web app! Building a streamlit app is that easy. Even more amazingly, though, building a useful app isn’t much harder.

而且您有一個運行良好的Web應用程序！構建流式應用很容易。但是，更令人驚訝的是，構建有用的應用程序并不難。

Oh, by the way, you don’t need to stop and restart the server every time the script is changed. Whenever the underlying script file is updated, you will see a button pop-up on the top right corner like so:

哦，順便說一句，您不必在每次更改腳本時都停止并重新啟動服務器。每當基礎腳本文件更新時，您都會在右上角看到一個按鈕彈出，如下所示：

Just keep the script running, and hit Rerun here every time you want to see the latest version at work.

只需保持腳本運行，然后在每次要查看最新版本時都單擊“重新運行”即可。

Ready? Okay, let’s go!

準備？好吧，走吧！

原始數據探索 (Raw data exploration)

What I like to do initially it to look at the entire raw dataset. As a first step, we load the data from a CSV file:

我最初想要做的是查看整個原始數據集。第一步，我們從CSV文件加載數據：

df = pd.read_csv("data/player_per_game.csv", index_col=0).reset_index(drop=True)

Once the data has been loaded, simply typing st.write(df) creates a dynamic, interactive table of the entire dataframe.

加載數據后，只需鍵入st.write(df)創建整個數據幀的動態交互式表。

And the various statistics for columns can be similarly plotted with st.write(df.describe()).

并且可以使用st.write(df.describe())類似地繪制列的各種統計信息。

I know you can plot a table in Jupyter notebooks, but the difference is in the interactivity. For one, tables rendered with Streamlit are sortable by columns. And as you will see later, you can incorporate filters and other dynamic elements that aren’t as easy to incorporate in notebooks — which is where the real power comes in.

我知道您可以在Jupyter筆記本中繪制表格，但區別在于交互性。首先，使用Streamlit渲染的表可以按列排序。就像您稍后將看到的那樣，您可以合并過濾器和其他動態元素，而這些元素和合并到筆記本中并不那么容易-這才是真正的動力所在。

Now we are ready to start adding a few charts to our app.

現在，我們準備開始向我們的應用程序添加一些圖表。

分布可視化 (Distribution visualisations)

Statistical visualisation of individual variables are extremely useful, to an extent that I think it’s an indispensable tool above and beyond looking at the raw data.

單個變量的統計可視化非常有用，在某種程度上，我認為它是查看原始數據之外不可或缺的工具。

We will begin the analysis by visualising the data by one variable, with an interactive histogram. A histogram can be constructed with Plotly like so:

我們將通過交互式變量直方圖通過一個變量將數據可視化來開始分析。可以使用Plotly構造直方圖，如下所示：

hist_fig = px.histogram(df, x=hist_x, nbins=hist_bins)

Traditionally, we would have to manually adjust the x and nbins variables to see what happens, or create a huge wall of histograms from various permutations of these variables. Instead, let’s see how they can be taken in as inputs to interactively investigate the data.

傳統上，我們將不得不手動調整x和nbins變量以查看會發生什么，或者從這些變量的各種排列中創建巨大的直方圖墻。相反，讓我們看看如何將它們作為交互研究數據的輸入。

The histogram will analyse data from one column of the pandas dataframe. Let’s render it as a drop-down box by calling the st.selectbox() module. We can just grab a list of the columns as df.columns, and additionally we provide a default choice, which we get the column number of using df.columns.get_loc() method. Putting it together, we get:

直方圖將分析來自熊貓數據框一列的數據。通過調用st.selectbox()模塊，將其呈現為下拉框。我們可以僅以df.columns獲取列的列表，此外，我們還提供了一個默認選擇，即使用df.columns.get_loc()方法獲得列號。放在一起，我們得到：

hist_x = st.selectbox("Histogram variable", options=df.columns, index=df.columns.get_loc("mp_per_g"))

Then, a slider can be called with the st.slider() module for the user to select the number of bins in the histogram. The module can be customised a minimum/maximum/default and increment parameters as you see below.

然后，可以使用st.slider()模塊調用滑塊，以供用戶選擇直方圖中的bin數量。您可以自定義模塊的最小/最大/默認和增量參數，如下所示。

hist_bins = st.slider(label="Histogram bins", min_value=5, max_value=50, value=25, step=1)

These parameters can then be combined to produce the figure:

然后可以將這些參數組合以產生圖形：

hist_fig = px.histogram(df, x=hist_x, nbins=hist_bins, title="Histogram of " + hist_x,
                        template="plotly_white")
st.write(hist_fig)

Putting it together with a little heading st.header(“Histogram”), we get:

將其與一個小標題st.header(“Histogram”)放在一起，我們得到：

I recommend taking a second here to explore the data. For example, take a look at different stats like rebounds per game:

我建議在這里花點時間瀏覽數據。例如，看一下不同的統計數據，例如每場比賽的籃板數：

Or positions:

或職位：

The interactivity makes for easier, dynamic, active exploration of the data.

交互性使對數據的更輕松，動態， 主動的 探索成為可能。

You might have noticed in this last graph that the histogram categories are not in any sort of sensible order. This is due to the fact that this is a categorical variable. So without a provided order, Plotly is (I think) plotting these categories based on the order that it starts to encounter each category for the first time. So, let’s make one last change to would fix that.

您可能已經在最后一張圖中注意到，直方圖類別沒有任何合理的順序。這是由于這是一個類別變量。因此，如果沒有提供的順序，Plotly(我認為)將根據第一次遇到每個類別的順序來繪制這些類別。因此，讓我們做最后一個更改來解決該問題。

Since Plotly allows for a category_orders parameter, we could pass a sorted order of positions. But then it wouldn’t be relevant for any of the other parameters. Instead, what we can do is to isolate the column based on the chosen input value, and pass them on by sorting them alphabetically like so:

由于Plotly允許有category_orders參數，因此我們可以傳遞排名的排序順序。但這與其他任何參數都不相關。相反，我們可以做的是根據所選的輸入值隔離列，然后按字母順序對它們進行傳遞，如下所示：

df[hist_x].sort_values().unique()

All together, we get:

總之，我們得到：

hist_cats = df[hist_x].sort_values().values
hist_fig = px.histogram(df, x=hist_x, nbins=hist_bins, title="Histogram of " + hist_x,
                        template="plotly_white", category_orders={hist_x: hist_cats})

This way, any categorical (or ordinal) variables would be presented in order

這樣，任何分類(或有序)變量將按順序顯示

Now we can go another step and categorise our data with boxplots. Boxplots do a similar job as histograms in that they show distributions, but they are really best at showing how those distributions changed according to another variable.

現在，我們可以再進行一步，用箱線圖對數據進行分類。箱線圖的功能與直方圖類似，因為它們可以顯示分布，但實際上最能顯示出這些分布如何根據另一個變量而變化。

So, the boxplot portion of our app is going to include two pulldown menus like below.

因此，我們應用程序的箱線圖部分將包括兩個下拉菜單，如下所示。

box_x = st.selectbox("Boxplot variable", options=df.columns, index=df.columns.get_loc("pts_per_g"))
box_cat = st.selectbox("Categorical variable", ["pos_simple", "age", "season"], 0)

And it’s just a matter of passing those two inputs to Plotly to build a figure:

只需將這兩個輸入傳遞給Plotly即可構建圖形：

box_fig = px.box(df, x=box_cat, y=box_x, title="Box plot of " + box_x, template="plotly_white", category_orders={"pos_simple": ["PG", "SG", "SF", "PF", "C"]})
st.write(box_fig)

Then… voila! You have an interactive box plot!

然后……瞧！您有一個交互式箱形圖！

You will notice here that I manually passed an order for my simplified positions column. The reason is that this order is a relatively arbitrary, basketball-specific order (from PG to C), not an alphabetical order. As much as I would like everything to be parametric, sometimes you do have to resort to manual specifications!

您會在這里注意到，我為簡化倉位欄手動傳遞了一個訂單。原因是該順序是相對任意的，籃球特定的順序(從PG到C)，而不是字母順序。盡管我希望所有參數都是參數化的，但有時您還是不得不求助于手動規格！

熱圖的廣義相關 (Generalised correlations with heatmaps)

Scatter plots are useful for seeing individual data points, but sometimes it’s good to just visualise datasets such that we can immediately see which columns might be well correlated, not correlated, or inversely correlated.

散點圖對于查看單個數據點很有用，但是有時最好只是可視化數據集，這樣我們就可以立即查看哪些列可能具有良好的相關性，不相關性或反相關性。

Heatmaps are perfect for this job, by setting it up to visualise what are called correlation matrices.

通過將熱圖設置為可視化所謂的相關矩陣，熱圖非常適合此工作。

Since a heatmap is best at visualising correlations between sets of input categories, let’s use an input that will take multiple categories. As a result, st.multiselect() is the module of choice here, and df.corr() is all we need to create the correlation matrix.

由于熱圖最適合可視化輸入類別集之間的相關性，因此讓我們使用將采用多個類別的輸入。結果， st.multiselect()是這里選擇的模塊，而df.corr()是創建相關矩陣所需的全部。

The combined code is:

組合的代碼為：

hmap_params = st.multiselect("Select parameters to include on heatmap", options=list(df.columns), default=[p for p in df.columns if "fg" in p])
hmap_fig = px.imshow(df[hmap_params].corr())
st.write(hmap_fig)

And we get:

我們得到：

It’s so clear which of these columns are positively correlated or not correlated. and I also suggest playing with different colour scales / swatches for extra fun!

很明顯，這些列中的哪一列是正相關的或不相關的。并且我還建議您使用不同的色標/色板來獲得更多的樂趣！

That’s it for today — I hope that was interesting. For my money, it’s hard to beat interactive apps like this for exploration, and the power of Plotly and Streamlit make it so easy to build these customised apps for my purpose.

今天就這樣-我希望這很有趣。為了我的錢，很難擊敗像這樣的交互式應用程序進行探索，而Plotly和Streamlit的強大功能使為我的目的構建這些定制的應用程序變得如此容易。

And keep in mind that what I have suggested here are just basic suggestions, and what I am sure that you could build something far more useful for your own purpose and to your preference. I look forward to seeing them all!

并且請記住，我在這里提出的只是基本建議，并且我確信您可以針對自己的目的和自己的喜好構建一些更有用的東西。我期待看到他們全部！

But just before you go — if you liked this, say hi / follow on twitter, or follow here for updates. ICYMI: I also wrote these articles, which you might find useful:

但是在您出發之前-如果您愿意，請打個招呼/在twitte r上關注，或在此處關注更新。 ICYMI：我也寫了這些文章，您可能會發現它們有用：

See you later! Stay safe :)

回頭見！注意安全：)

翻譯自: https://towardsdatascience.com/explore-any-data-with-a-custom-interactive-web-app-data-science-with-sports-410644ac742

web數據交互

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/389309.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/389309.shtml
英文地址，請注明出處：http://en.pswp.cn/news/389309.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！

C# .net 對圖片操作

using System.Drawing;using System.Drawing.Drawing2D;using System.Drawing.Imaging;public class ImageHelper{/// <summary>/// 獲取圖片中的各幀/// </summary>/// <param name"pPath">圖片路徑</param>/// <param name"pSaveP…

數據類型之Integer與int

數據類型之Integer與int Java入門基本數據類型眾所周知，Java是面向對象的語言，一切皆對象。但是為了兼容人類根深蒂固的數據處理習慣，加快常規數據的處理速度，提供了9種基本數據類型，他們都不具備對象的特性&#xf…

PCA(主成分分析)思想及實現

PCA的概念： PCA是用來實現特征提取的。特征提取的主要目的是為了排除信息量小的特征，減少計算量等。簡單來說： 當數據含有多個特征的時候，選取主要的特征，排除次要特征或者不重要的特征。比如說：我們要…

【安富萊二代示波器教程】第8章示波器設計—測量功能

第8章示波器設計—測量功能二代示波器測量功能實現比較簡單，使用2D函數繪制即可。不過也專門開辟一個章節，為大家做一個簡單的說明，方便理解。 8.1 水平測量功能 8.2 垂直測量功能 8.3 總結 8.1 水平測量功能水平測量方…

深度學習數據更換背景_開始學習數據科學的最佳方法是了解其背景

深度學習數據更換背景數據科學教育 (DATA SCIENCE EDUCATION) 目錄 (Table of Contents) The Importance of Context Knowledge 情境知識的重要性 (Optional) Research Supporting Context-Based Learning (可選)研究支持基于上下文的學習 The Context of Data Science 數據科學…

熊貓數據集_用熊貓掌握數據聚合

熊貓數據集Data aggregation is the process of gathering data and expressing it in a summary form. This typically corresponds to summary statistics for numerical and categorical variables in a data set. In this post we will discuss how to aggregate data usin…

IOS CALayer的屬性和使用

一、CALayer的常用屬性 1、propertyCGPoint position; 圖層中心點的位置，類似與UIView的center；用來設置CALayer在父層中的位置；以父層的左上角為原點（0，0）； 2、 property CGPoint anchorPoint…

GridView詳解

快速預覽：GridView無代碼分頁排序GridView選中，編輯，取消，刪除GridView正反雙向排序GridView和下拉菜單DropDownList結合GridView和CheckBox結合鼠標移到GridView某一行時改變該行的背景色方法一鼠標移到GridView某一行時改變該行…

訪問模型參數，初始化模型參數，共享模型參數方法

一. 訪問模型參數對于使用Sequential類構造的神經網絡，我們可以通過方括號[]來訪問網絡的任一層。回憶一下上一節中提到的Sequential類與Block類的繼承關系。對于Sequential實例中含模型參數的層，我們可以通過Block類的params屬性來訪問該層包含的所有…

QZEZ第一屆“飯吉圓”杯程序設計競賽

終于到了飯吉圓杯的開賽，這是EZ我參與的歷史上第一場ACM賽制的題目然而沒有罰時不過題目很好，舉辦地也很成功，為法老點贊！！！ 這次和翰爺，吳駿達 dalao，陳樂揚dalao組的隊&#xff0…

談談數據分析 caoz_讓我們談談開放數據…

談談數據分析 caozAccording to the International Open Data Charter(1), it defines open data as those digital data that are made available with the technical and legal characteristics necessary so that they can be freely used, reused and redistributed by any…

數據創造價值_展示數據并創造價值

數據創造價值To create the maximum value, urgency, and leverage in a data partnership, you must present the data available for sale or partnership in a clear and comprehensive way. Partnerships are based upon the concept that you are offering value for valu…