python 仪表盘_如何使用Python刮除仪表板
python 儀表盤
Dashboard scraping is a useful skill to have when the only way to interact with the data you need is through a dashboard. We’re going to learn how to scrape data from a dashboard using the Selenium and Beautiful Soup packages in Python. The Selenium package allows you to write Python code to automate web browser interaction, and the Beautiful Soup package allows you to easily pull data from the HTML code that produces the webpage you want to scrape.
當與所需數據進行交互的唯一方法是通過儀表板時,儀表板抓取是一項有用的技能。 我們將學習如何使用Python中的Selenium和Beautiful Soup軟件包從儀表板上抓取數據。 Selenium程序包允許您編寫Python代碼來自動執行Web瀏覽器交互,而Beautiful Soup程序包則使您可以輕松地從生成您要抓取的網頁HTML代碼中提取數據。
Our goal is to scrape the Fort Bend County Community Impact Dashboard that visualizes the COVID-19 situation in Fort Bend County in Texas. We will extract the history of total tests performed and the daily case counts reported so that we can estimate the percent of positive cases in Fort Bend County.
我們的目標是刮擦本德堡縣社區影響儀表板 ,以可視化方式顯示德克薩斯州本德堡縣的COVID-19情況。 我們將提取進行的總檢測的歷史記錄和每日報告的病例計數,以便我們可以估算本德堡縣陽性病例的百分比。
Note that all of the code in this tutorial is written in Python version 3.6.2.
請注意,本教程中的所有代碼都是使用Python 3.6.2版編寫的。
步驟1:導入Python軟件包,模塊和方法 (Step 1: Import Python Packages, Modules, and Methods)
The first step is to import the Python packages, modules, and methods needed for dashboard scraping. The versions of the packages used in this tutorial are listed below.
第一步是導入儀表板抓取所需的Python包,模塊和方法。 下面列出了本教程中使用的軟件包的版本。
步驟2:抓取HTML源代碼 (Step 2: Scrape HTML Source Code)
The next step is to write Python code to automate our interaction with the dashboard. Before writing any code, we must look at the dashboard and inspect its source code to identify the HTML elements that contain the data we need. The dashboard source code refers to the HTML code that tells your browser how to render the dashboard web page. To view the dashboard source code, navigate to the dashboard and use the keyboard shortcut Ctrl+Shift+I. An interactive panel containing the dashboard source code will appear.
下一步是編寫Python代碼來自動化我們與儀表板的交互。 在編寫任何代碼之前,我們必須查看儀表板并檢查其源代碼以識別包含我們所需數據HTML元素。 儀表板源代碼是指HTML代碼,它告訴您的瀏覽器如何呈現儀表板網頁。 要查看儀表板源代碼,請導航至儀表板并使用鍵盤快捷鍵Ctrl+Shift+I 將出現一個包含儀表板源代碼的交互式面板。
Notice that the history of total tests performed and the daily case counts reported are only visible after clicking the “History” tab in the “Total Numbers of Tests Performed at County Sites” panel and the “Daily Case Count” tab in the “Confirmed Cases” panel, respectively. This means that we need to write Python code that automatically clicks on the “History” and “Daily Case Count” tabs so that the history of total tests performed and the daily case counts reported will be visible to Beautiful Soup.
請注意,僅在單擊“縣站點執行的測試總數”面板中的“歷史記錄”選項卡和“已確認案例”中的“每日案例計數”選項卡之后,才可以執行總測試的歷史記錄和報告的每日案例計數”面板。 這意味著我們需要編寫Python代碼,該代碼自動單擊“歷史記錄”和“每日案例計數”選項卡,以便Beautiful Soup可以看到執行的總測試的歷史記錄和每日報告的案例計數。
Fort Bend County Community Impact Dashboard on July 10th, 2020本德堡縣社區影響儀表板To find the HTML element that contains the “History” tab, use the shortcut Ctrl+Shift+C and then click on the "History" tab. You will see in the source code panel that the "History" tab is in a div element with ID "ember208".
要查找包含“歷史記錄”選項卡HTML元素,請使用快捷鍵Ctrl+Shift+C ,然后單擊“歷史記錄”選項卡。 您將在源代碼面板中看到“歷史記錄”選項卡位于ID為“ ember208”的div元素中。
History Tab Source Code歷史記錄選項卡源代碼Following the same steps for the “Daily Case Count” tab, you will see that the “Daily Case Count” tab is in a div element with ID “ember238”.
按照“每日案件計數”標簽的相同步驟,您將看到“每日案件計數”標簽位于ID為“ ember238”的div元素中。
Source Code of Daily Case Count Tab每日病例計數選項卡的源代碼Now that we have identified the elements we need, we can write code that:
現在我們已經確定了所需的元素,我們可以編寫代碼:
步驟3:從HTML解析數據 (Step 3: Parse Data from HTML)
Now, we need to parse the HTML source code to extract the history of total tests performed and the daily case counts reported. We will begin by looking at the dashboard source code to identify the HTML elements that contain the data.
現在,我們需要解析HTML源代碼,以提取執行的總測試的歷史記錄和每日報告的病例數。 我們將從查看儀表板源代碼開始,以識別包含數據HTML元素。
To find the div element that contains the history of total tests performed, use the Ctrl+Shift+C shortcut and then click in the general area of the "Testing Sites" plot. You will see in the source code that the entire plot is in the div element with ID "ember96".
要查找包含已執行的全部測試的歷史記錄的div元素,請使用Ctrl+Shift+C快捷鍵,然后單擊“測試站點”圖的常規區域。 您會在源代碼中看到整個圖位于ID為“ ember96”的div元素中。
Source Code of Testing Sites Plot測試站點圖的源代碼If you hover over a specific data point, a label containing the date and number of tests performed will appear. Use the Ctrl+Shift+C shortcut and click on a specific data point. You will see that the label text is stored as the aria-label attribute of a g element.
如果將鼠標懸停在特定數據點上,則會顯示一個標簽,其中包含執行的測試的日期和數量。 使用Ctrl+Shift+C快捷鍵,然后單擊特定的數據點。 您將看到標簽文本存儲為g元素的aria-label屬性。
Source Code of Testing Sites Data Labels測試站點數據標簽的源代碼Following the same steps for the daily case counts reported, you will see that the plot of daily case counts is in the div element with ID “ember143”.
按照報告的每日案件計數的相同步驟,您將看到每日案件計數的圖位于ID為“ ember143”的div元素中。
Source Code of Daily Cases based on Report Date Plot基于報告日期圖的日常案例源代碼If you hover over a specific data point, a label containing the date and the number of positive cases reported will appear. Using the Ctrl+Shift+C shortcut, you will notice that the data are also stored in the aria-label attribute of g elements.
如果將鼠標懸停在特定數據點上,將顯示一個標簽,其中包含日期和報告的陽性病例數。 使用Ctrl+Shift+C快捷鍵,您會注意到數據也存儲在g元素的aria-label屬性中。
Source Code of Daily Cases based on Report Date Data Labels基于報告日期數據標簽的日常案例的源代碼Once we have the elements that contain the data, we can write code that:
一旦有了包含數據的元素,就可以編寫代碼:
步驟4:計算正率 (Step 4: Calculate Positivity Rate)
Now, we can finally estimate the COVID-19 positivity rate in Fort Bend County. We will divide the cases reported by the tests performed and calculate the 7-day moving averages. It is unclear from the dashboard whether the reported positive cases include cases that were determined through tests not conducted by the county (e.g. tests conducted at a hospital or clinic). It is also unclear when the tests for the positive cases were conducted since the dashboard only displays the reported case date. That is why the positivity rates derived from these data are only considered a rough estimate for the true positivity rate.
現在,我們終于可以估算出本德堡縣的COVID-19陽性率。 我們將通過執行的測試報告的案例相除,并計算7天移動平均值。 從儀表板尚不清楚,報告的陽性病例是否包括那些不是由縣進行的檢測(例如,在醫院或診所進行的檢測)確定的病例。 還不清楚何時進行陽性病例的測試,因為儀表板僅顯示報告的病例日期。 這就是為什么僅將這些數據得出的陽性率視為真實陽性率的粗略估計。
翻譯自: https://towardsdatascience.com/how-to-scrape-a-dashboard-with-python-8b088f6cecf3
python 儀表盤
總結
以上是生活随笔為你收集整理的python 仪表盘_如何使用Python刮除仪表板的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到好多虫子是什么意思
- 下一篇: python内置函数多少个_每个数据科学