python 儀表盤_如何使用Python刮除儀表板

python 儀表盤

Dashboard scraping is a useful skill to have when the only way to interact with the data you need is through a dashboard. We’re going to learn how to scrape data from a dashboard using the Selenium and Beautiful Soup packages in Python. The Selenium package allows you to write Python code to automate web browser interaction, and the Beautiful Soup package allows you to easily pull data from the HTML code that produces the webpage you want to scrape.

當與所需數據進行交互的唯一方法是通過儀表板時,儀表板抓取是一項有用的技能。 我們將學習如何使用Python中的Selenium和Beautiful Soup軟件包從儀表板上抓取數據。 Selenium程序包允許您編寫Python代碼來自動執行Web瀏覽器交互,而Beautiful Soup程序包則使您可以輕松地從生成您要抓取的網頁HTML代碼中提取數據。

Our goal is to scrape the Fort Bend County Community Impact Dashboard that visualizes the COVID-19 situation in Fort Bend County in Texas. We will extract the history of total tests performed and the daily case counts reported so that we can estimate the percent of positive cases in Fort Bend County.

我們的目標是刮擦本德堡縣社區影響儀表板 ,以可視化方式顯示德克薩斯州本德堡縣的COVID-19情況。 我們將提取進行的總檢測的歷史記錄和每日報告的病例計數,以便我們可以估算本德堡縣陽性病例的百分比。

Note that all of the code in this tutorial is written in Python version 3.6.2.

請注意,本教程中的所有代碼都是使用Python 3.6.2版編寫的。

步驟1:導入Python軟件包,模塊和方法 (Step 1: Import Python Packages, Modules, and Methods)

The first step is to import the Python packages, modules, and methods needed for dashboard scraping. The versions of the packages used in this tutorial are listed below.

第一步是導入儀表板抓取所需的Python包,模塊和方法。 下面列出了本教程中使用的軟件包的版本。

步驟2:抓取HTML源代碼 (Step 2: Scrape HTML Source Code)

The next step is to write Python code to automate our interaction with the dashboard. Before writing any code, we must look at the dashboard and inspect its source code to identify the HTML elements that contain the data we need. The dashboard source code refers to the HTML code that tells your browser how to render the dashboard web page. To view the dashboard source code, navigate to the dashboard and use the keyboard shortcut Ctrl+Shift+I. An interactive panel containing the dashboard source code will appear.

下一步是編寫Python代碼來自動化我們與儀表板的交互。 在編寫任何代碼之前,我們必須查看儀表板并檢查其源代碼以識別包含我們所需數據HTML元素。 儀表板源代碼是指HTML代碼,它告訴您的瀏覽器如何呈現儀表板網頁。 要查看儀表板源代碼,請導航至儀表板并使用鍵盤快捷鍵Ctrl+Shift+I 將出現一個包含儀表板源代碼的交互式面板。

Notice that the history of total tests performed and the daily case counts reported are only visible after clicking the “History” tab in the “Total Numbers of Tests Performed at County Sites” panel and the “Daily Case Count” tab in the “Confirmed Cases” panel, respectively. This means that we need to write Python code that automatically clicks on the “History” and “Daily Case Count” tabs so that the history of total tests performed and the daily case counts reported will be visible to Beautiful Soup.

請注意,僅在單擊“縣站點執行的測試總數”面板中的“歷史記錄”選項卡和“已確認案例”中的“每日案例計數”選項卡之后,才可以執行總測試的歷史記錄和報告的每日案例計數”面板。 這意味著我們需要編寫Python代碼,該代碼自動單擊“歷史記錄”和“每日案例計數”選項卡,以便Beautiful Soup可以看到執行的總測試的歷史記錄和每日報告的案例計數。

Image for post
Fort Bend County Community Impact Dashboard on July 10th, 2020本德堡縣社區影響儀表板

To find the HTML element that contains the “History” tab, use the shortcut Ctrl+Shift+C and then click on the "History" tab. You will see in the source code panel that the "History" tab is in a div element with ID "ember208".

要查找包含“歷史記錄”選項卡HTML元素,請使用快捷鍵Ctrl+Shift+C ,然后單擊“歷史記錄”選項卡。 您將在源代碼面板中看到“歷史記錄”選項卡位于ID為“ ember208”的div元素中。

Image for post
History Tab Source Code
歷史記錄選項卡源代碼

Following the same steps for the “Daily Case Count” tab, you will see that the “Daily Case Count” tab is in a div element with ID “ember238”.

按照“每日案件計數”標簽的相同步驟,您將看到“每日案件計數”標簽位于ID為“ ember238”的div元素中。

Image for post
Source Code of Daily Case Count Tab
每日病例計數選項卡的源代碼

Now that we have identified the elements we need, we can write code that:

現在我們已經確定了所需的元素,我們可以編寫代碼:

  1. Launches the dashboard in Chrome

    在Chrome中啟動儀表板
  2. Clicks on the “History” tab once the “History” tab finishes loading

    一旦“歷史記錄”選項卡完成加載,請單擊“歷史記錄”選項卡
  3. Clicks on the “Daily Case Count” tab once the “Daily Case Count” tab finishes loading

    一旦“每日病例數”選項卡完成加載,請單擊“每日病例數”選項卡
  4. Extracts the dashboard HTML source code

    提取儀表板HTML源代碼
  5. Exits Chrome

    退出Chrome

步驟3:從HTML解析數據 (Step 3: Parse Data from HTML)

Now, we need to parse the HTML source code to extract the history of total tests performed and the daily case counts reported. We will begin by looking at the dashboard source code to identify the HTML elements that contain the data.

現在,我們需要解析HTML源代碼,以提取執行的總測試的歷史記錄和每日報告的病例數。 我們將從查看儀表板源代碼開始,以識別包含數據HTML元素。

To find the div element that contains the history of total tests performed, use the Ctrl+Shift+C shortcut and then click in the general area of the "Testing Sites" plot. You will see in the source code that the entire plot is in the div element with ID "ember96".

要查找包含已執行的全部測試的歷史記錄的div元素,請使用Ctrl+Shift+C快捷鍵,然后單擊“測試站點”圖的常規區域。 您會在源代碼中看到整個圖位于ID為“ ember96”的div元素中。

Image for post
Source Code of Testing Sites Plot
測試站點圖的源代碼

If you hover over a specific data point, a label containing the date and number of tests performed will appear. Use the Ctrl+Shift+C shortcut and click on a specific data point. You will see that the label text is stored as the aria-label attribute of a g element.

如果將鼠標懸停在特定數據點上,則會顯示一個標簽,其中包含執行的測試的日期和數量。 使用Ctrl+Shift+C快捷鍵,然后單擊特定的數據點。 您將看到標簽文本存儲為g元素的aria-label屬性。

Image for post
Source Code of Testing Sites Data Labels
測試站點數據標簽的源代碼

Following the same steps for the daily case counts reported, you will see that the plot of daily case counts is in the div element with ID “ember143”.

按照報告的每日案件計數的相同步驟,您將看到每日案件計數的圖位于ID為“ ember143”的div元素中。

Image for post
Source Code of Daily Cases based on Report Date Plot
基于報告日期圖的日常案例源代碼

If you hover over a specific data point, a label containing the date and the number of positive cases reported will appear. Using the Ctrl+Shift+C shortcut, you will notice that the data are also stored in the aria-label attribute of g elements.

如果將鼠標懸停在特定數據點上,將顯示一個標簽,其中包含日期和報告的陽性病例數。 使用Ctrl+Shift+C快捷鍵,您會注意到數據也存儲在g元素的aria-label屬性中。

Image for post
Source Code of Daily Cases based on Report Date Data Labels
基于報告日期數據標簽的日常案例的源代碼

Once we have the elements that contain the data, we can write code that:

一旦有了包含數據的元素,就可以編寫代碼:

  1. Finds the div element that contains the plot of the total tests performed and pulls the total tests performed data

    查找包含執行的總測試次數的圖的div元素,并提取執行的總測試數據
  2. Finds the div element that contains the plot of the daily case counts and pulls the daily case count data

    查找包含每日案件計數圖的div元素,并提取每日案件計數數據
  3. Combines the data in a pandas dataframe and exports it to a CSV

    將數據合并到pandas數據框中,并將其導出到CSV

步驟4:計算正率 (Step 4: Calculate Positivity Rate)

Now, we can finally estimate the COVID-19 positivity rate in Fort Bend County. We will divide the cases reported by the tests performed and calculate the 7-day moving averages. It is unclear from the dashboard whether the reported positive cases include cases that were determined through tests not conducted by the county (e.g. tests conducted at a hospital or clinic). It is also unclear when the tests for the positive cases were conducted since the dashboard only displays the reported case date. That is why the positivity rates derived from these data are only considered a rough estimate for the true positivity rate.

現在,我們終于可以估算出本德堡縣的COVID-19陽性率。 我們將通過執行的測試報告的案例相除,并計算7天移動平均值。 從儀表板尚不清楚,報告的陽性病例是否包括那些不是由縣進行的檢測(例如,在醫院或診所進行的檢測)確定的病例。 還不清楚何時進行陽性病例的測試,因為儀表板僅顯示報告的病例日期。 這就是為什么僅將這些數據得出的陽性率視為真實陽性率的粗略估計。

Image for post

翻譯自: https://towardsdatascience.com/how-to-scrape-a-dashboard-with-python-8b088f6cecf3

python 儀表盤

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388348.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388348.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388348.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

VS2015 定時服務及控制端

一. 服務端 如下圖—新建項目—經典桌面—Windows服務—起名svrr2. 打到server1 改名為svrExecSqlInsert 右擊對應的設計界面&#xff0c;添加安裝服務目錄結構如圖 3. svrExecSqlInsert里有打到OnStart()方法開始寫代碼如下 /// <summary>/// 服務開啟操作/// </su…

css文件如何設置scss,Webpack - 如何將scss編譯成單獨的css文件?

2 個答案:答案 0 :(得分&#xff1a;3)這是我在嘗試將css編譯成單獨文件時使用的webpack.config.js文件|-- App|-- dist|-- src|-- css|-- header.css|-- sass|-- img|-- partials|-- _variables.scss|-- main.scss|--ts|-- tsconfig.json|-- user.ts|-- main.js|-- app.js|-- …

Iphone表視圖的簡單操作

1.創建一個Navigation—based—Application項目&#xff0c;這樣Interface Builder中會自動生成一個Table View&#xff0c;然后將Search Bar拖放到表示圖上&#xff0c;以我們要給表示圖添加搜索功能&#xff0c;不要忘記將Search Bar的delegate連接到File‘s Owner項&#xf…

PhantomJS的使用

PhantomJS安裝下載地址 配置環境變量 成功&#xff01; 轉載于:https://www.cnblogs.com/hankleo/p/9736323.html

aws emr 大數據分析_DataOps —使用AWS Lambda和Amazon EMR的全自動,低成本數據管道

aws emr 大數據分析Progression is continuous. Taking a flashback journey through my 25 years career in information technology, I have experienced several phases of progression and adaptation.進步是連續的。 在我25年的信息技術職業生涯中經歷了一次閃回之旅&…

21eval 函數

eval() 函數十分強大 ---- 將字符串 當成 有效的表達式 來求職 并 返回計算結果 1 # 基本的數學計算2 print(eval("1 1")) # 23 4 # 字符串重復5 print(eval("* * 5")) # *****6 7 # 將字符串轉換成列表8 print(eval("[1, 2, 3, 4]")) # [1,…

聯想r630服務器開啟虛擬化,整合虛擬化 聯想萬全R630服務器上市

虛擬化技術的突飛猛進&#xff0c;對運行虛擬化應用的服務器平臺的運算性能提出了更高的要求。近日&#xff0c;聯想萬全R630G7正式對外發布。這款計算性能強勁&#xff0c;IO吞吐能力強大的四路四核服務器&#xff0c;主要面向高端企業級應用而開發。不僅能夠完美承載大規模數…

Iphone屏幕旋轉

該示例是想在手機屏幕方向發生改變時重新定位視圖&#xff08;這里是一個button&#xff09; 1.創建一個View—based Application項目,并在View窗口中添加一個Round Rect Button視圖&#xff0c;通過尺寸檢查器設置其位置&#xff0c;然后單擊View窗口右上角的箭頭圖標來旋轉窗…

先進的NumPy數據科學

We will be covering some of the advanced concepts of NumPy specifically functions and methods required to work on a realtime dataset. Concepts covered here are more than enough to start your journey with data.我們將介紹NumPy的一些高級概念&#xff0c;特別是…

lsof命令詳解

基礎命令學習目錄首頁 原文鏈接&#xff1a;https://www.cnblogs.com/ggjucheng/archive/2012/01/08/2316599.html 簡介 lsof(list open files)是一個列出當前系統打開文件的工具。在linux環境下&#xff0c;任何事物都以文件的形式存在&#xff0c;通過文件不僅僅可以訪問常規…

Xcode中捕獲iphone/ipad/ipod手機攝像頭的實時視頻數據

目的&#xff1a;打開、關閉前置攝像頭&#xff0c;繪制圖像&#xff0c;并獲取攝像頭的二進制數據。 需要的庫 AVFoundation.framework 、CoreVideo.framework 、CoreMedia.framework 、QuartzCore.framework 該攝像頭捕抓必須編譯真機的版本&#xff0c;模擬器下編譯不了。 函…

統計和冰淇淋

Photo by Irene Kredenets on UnsplashIrene Kredenets在Unsplash上拍攝的照片 摘要 (Summary) In this article, you will learn a little bit about probability calculations in R Studio. As it is a Statistical language, R comes with many tests already built in it, …

信息流服務器哪種好,選購存儲服務器需要注意六大關鍵因素,你知道幾個?

原標題&#xff1a;選購存儲服務器需要注意六大關鍵因素&#xff0c;你知道幾個&#xff1f;信息技術的飛速發展帶動了整個信息產業的發展。越來越多的電子商務平臺和虛擬化環境出現在企業的日常應用中。存儲服務器作為企業建設環境的核心設備&#xff0c;在整個信息流中承擔著…

t3 深入Tornado

3.1 Application settings 前面的學習中&#xff0c;在創建tornado.web.Application的對象時&#xff0c;傳入了第一個參數——路由映射列表。實際上Application類的構造函數還接收很多關于tornado web應用的配置參數。 參數&#xff1a; debug&#xff0c;設置tornado是否工作…

vml編輯器

<HTML xmlns:v> <HEAD> <META http-equiv"Content-Type" content"text/html; Charsetgb2312"> <META name"GENERATOR" content"網絡程序員伴侶(Lshdic)2004"> <META name"GENERATORDOWNLOADADDRESS&q…

對數據倉庫進行數據建模_確定是否可以對您的數據進行建模

對數據倉庫進行數據建模Some data sets are just not meant to have the geospatial representation that can be clustered. There is great variance in your features, and theoretically great features as well. But, it doesn’t mean is statistically separable.某些數…

15 并發編程-(IO模型)

一、IO模型介紹 1、阻塞與非阻塞指的是程序的兩種運行狀態 阻塞&#xff1a;遇到IO就發生阻塞&#xff0c;程序一旦遇到阻塞操作就會停在原地&#xff0c;并且立刻釋放CPU資源 非阻塞&#xff08;就緒態或運行態&#xff09;&#xff1a;沒有遇到IO操作&#xff0c;或者通過某種…

arduino消息服務器,在C(Arduino IDE)中將API鏈接消息解析為服務器(示例代碼)

我正在使用Arduino IDE來編程我的微控制器&#xff0c;它有一個內置的Wi-Fi芯片(ESP8266 NodeMCU)&#xff0c;它連接到我的互聯網路由器&#xff0c;然后有一個特定的IP(就像192.168.1.5)。所以我想通過添加到鏈接的消息發送命令(和數據)&#xff0c;然后鏈接變為&#xff1a;…

不提拔你,就是因為你只想把工作做好

2019獨角獸企業重金招聘Python工程師標準>>> 我有個朋友&#xff0c;他30出頭&#xff0c;在500強公司做技術經理。他戴無邊眼鏡&#xff0c;穿一身土黃色的夾克&#xff0c;下面是一條常年不洗的牛仔褲加休閑皮鞋&#xff0c;典型技術高手范。 三 年前&#xff0c;…

python內置函數多少個_每個數據科學家都應該知道的10個Python內置函數

python內置函數多少個Python is the number one choice of programming language for many data scientists and analysts. One of the reasons of this choice is that python is relatively easier to learn and use. More importantly, there is a wide variety of third pa…