bigquery 教程

This medium article focusses on the detailed walkthrough of the steps I took to solve the challenge lab of the Insights from Data with BigQuery Skill Badge on the Google Cloud Platform (Qwiklabs). I got access to this lab in the Google Cloud Ready Facilitator Program. Thanks to Google!

這篇中篇文章重點介紹了我為解決Google Cloud Platform( Qwiklabs )上的BigQuery Skill Badge數據見解挑戰實驗室而采取的步驟的詳細演練。我可以通過Google Cloud Ready Facilitator計劃訪問此實驗室。 感謝Google！

Till now, I have completed over 100 labs and 23 quests on Qwiklabs. Below is the reference of my profile.

到目前為止，我已經完成了100多個實驗室和Qwiklabs上的23個任務 。以下是我的個人資料參考。

This lab is only recommended for students who have completed the labs in the Insights from Data with BigQuery Quest. Knowledge of SQL and BigQuery is also needed to solve this challenge lab. Are you up for the challenge? Let’s go!

僅向在使用BigQuery Quest進行數據洞察中完成實驗的學生推薦該實驗。的知識解決此挑戰實驗室也需要SQL 和 BigQuery 。 你準備好接受挑戰了嗎？ 我們走吧！

使用的數據集 (Dataset Used)

The dataset that we would be using in this challenge lab is bigquery-public-data.covid19_open_data.covid19_open_data. This dataset contains data related to covid-19 on a country basis globally. We would be using this in this skill badge tutorial.

我們將在此挑戰實驗室中使用的數據集為bigquery-public-data.covid19_open_data.covid19_open_data。 該數據集包含全球基于國家/地區與covid-19相關的數據。我們將在本技能徽章教程中使用它。

BigQuery Tutorial can be found on the reference below:

可以在以下參考資料中找到BigQuery教程：

挑戰場景 (Challenge Scenario)

There are 10 small tasks in this challenge lab, all of which should be completed to score 100/100. In order to pass the lab, there are 9 SQL commands and 1 Data Studio report that should be generated in order to score 100. This tutorial list out the steps I took to solve all the ten challenges within the lab. The ten tasks are as follows:

這個挑戰實驗室中有10個小任務 ，所有這些小任務都應得分為100/100。為了通過實驗室，應生成9條SQL命令和1個Data Studio報告才能獲得100分。本教程列出了我為解決實驗室中的所有十個挑戰而采取的步驟。十個任務如下：

Building a SQL query that outputs the total no. of confirmed cases.
建立一個SQL查詢，輸出總編號。 確診病例。
Building a SQL query that outputs the worst affected areas.
構建一個SQL查詢以輸出受影響最嚴重的區域。
Building a SQL query that identifies the Hotspots in USA.
建立一個SQL查詢來標識美國的熱點。
Building a SQL query that outputs the Fatality Ratio.
建立一個輸出致命率SQL查詢。
Building a SQL query that identifies a specific day according to the constraints.
建立一個SQL查詢來根據約束條件確定特定的一天 。
Building a SQL query that outputs the number of days with zero net new cases.
建立一個SQL查詢，以輸出凈新案例為零的天數。
Building a SQL query that outputs the Doubling Rate.
建立一個輸出雙倍速率SQL查詢。
Building a SQL query that outputs the Recovery Rate.
構建一個輸出恢復率SQL查詢。
Building a SQL query that outputs the CDGR — Cumulative Daily Growth Rate.
構建一個輸出CDGRSQL查詢-累積每日增長率。
Creating a Datastudio report.
創建一個Datastudio報告。

重要的提示 (Important Note)

Before starting this lab, ensure that you do whatever is required. Allocating more resources or doing something that is not required may lead to blocking of account by qwiklabs admin. Doing something other than that required in the lab results in account blocked by qwiklabs. Don’t worry. I came across this problem. The account can easily be unblocked by contacting qwiklabs support within a second.
在開始本實驗之前，請確保您執行所需的任何操作。 分配更多資源或執行不必要的操作可能會導致qwiklabs管理員阻止帳戶。 如果執行實驗室中未要求的操作，則會導致qwiklabs阻止帳戶。 不用擔心 我遇到了這個問題。 一秒鐘內聯系qwiklabs支持人員即可輕松解除帳戶鎖定。

加載數據集 (Loading the Dataset)

In the cloud console, once logged in completely, Go to Menu > BigQuery.
在云控制臺中，一旦完全登錄，請轉到菜單> BigQuery。
Click + Add Data and then click on Explore Public Datasets from the left pane.
單擊+添加數據 ，然后從左窗格中單擊探索公共數據集 。
Search covid19_open_data and then select “Covid-19 Open Data”. Click on View Dataset to explore more!
搜索covid19_open_data ，然后選擇“ Covid-19 Open Data”。 單擊查看數據集以探索更多內容！
Use filter and locate the table covid19_open_data under the covid19_open_data dataset.
使用過濾器并在covid19_open_data下找到表covid19_open_data 數據集。

Image for post — Image by Wynn Pointaux on Pixabay

任務詳細教程— 1 (Detailed Tutorial of Task — 1)

In task 1 it requires the user to execute a query that outputs the total count of confirmed cases on Apr 15, 2020. The output should contain only a single row containing the sum of confirmed cases across all the countries in the dataset. total_cases_worldwide should be the name of the column.

在任務1中，它要求用戶執行查詢，以輸出2020年4月15日確診病例的總數 。輸出應僅包含一行，其中包含數據集中所有國家/地區的確診病例的總數。 total_cases_worldwide應該是列的名稱。

Copy the below query in the query editor and click on RUN.

在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

SELECTSUM(cumulative_confirmed) AS total_cases_worldwideFROM
  `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
  date = "2020-04-15"

任務詳細教程— 2 (Detailed Tutorial of Task — 2)

Task 2 requires to build a query for extracting the result of: “How many states in the US had more than 100 deaths on Apr 10, 2020?” The output should have the field name as count_of_states.

任務2需要構建一個查詢來提取以下結果：“ 到2020年4月10日，美國有多少州的死亡人數超過100？ 輸出的字段名稱應為count_of_states。

Hint: We don’t have to include NULL values.(Important)
提示：我們不必包含NULL值。(重要)

Copy the below query in the query editor and click on RUN.

在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

SELECTCOUNT(*) AS count_of_statesFROM (SELECT
    subregion1_name AS state,SUM(cumulative_deceased) AS death_countFROM
  `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
  country_name="United States of America"AND date='2020-04-10'AND subregion1_name IS NOT NULLGROUP BY
  subregion1_name
)WHERE death_count > 100

任務詳細教程— 3 (Detailed Tutorial of Task — 3)

Writing a query that will output the result of: “List all the states in the United States of America that had more than 1000 confirmed cases on Apr 10, 2020?” The output should have two columns named state and total_confirmed_cases that corresponds to State Name and the confirmed cases arranged in descending order.

編寫查詢將輸出以下結果：“ 列出2020年4月10日美國確診病例超過1000的所有州？ ”輸出應具有名為state和total_confirmed_cases的兩列，分別對應于State Name和已確認的個案，它們以降序排列。

Copy the below query in the query editor and click on RUN.

在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

SELECT
    subregion1_name AS state,SUM(cumulative_confirmed) AS total_confirmed_casesFROM
    `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
    country_name="United States of America"AND date = "2020-04-10"GROUP BY subregion1_nameHAVING total_confirmed_cases > 1000ORDER BY total_confirmed_cases DESC

任務詳細教程— 4 (Detailed Tutorial of Task — 4)

Building a query in the query editor that will answer the following question: “What was the case-fatality ratio in Italy for the month of April 2020?”

在查詢編輯器中構建一個查詢，該查詢將回答以下問題： “意大利2020年4月的病死率是多少？ ”

Case-fatality ratio is defined as (total deaths / total confirmed cases) * 100. The output should have three columns named total_confirmed_cases, total_deaths and case_fatality_ratio.
病死率定義為(總死亡人數/確診病例總數)*100 。輸出應具有三列，分別稱為total_confirmed_cases ， total_deaths和case_fatality_ratio 。

Copy the below query in the query editor and click on RUN.

在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

SELECT SUM(cumulative_confirmed) AS total_confirmed_cases, SUM(cumulative_deceased) AS total_deaths, (SUM(cumulative_deceased)/SUM(cumulative_confirmed))*100 AS case_fatality_ratioFROM `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE country_name="Italy" AND date BETWEEN "2020-04-01" AND "2020-04-30"

任務詳細教程— 5 (Detailed Tutorial of Task — 5)

Building a query that will answer the following question: “On what day did the total number of deaths cross 10000 in Italy?”

建立一個查詢，將回答以下問題：“ 意大利的總死亡人數在哪一天超過10000？ ”

The query should output the date with a column name “date” and in the format “yyyy-mm-dd”.
查詢應以列名稱“ date”和格式“ yyyy-mm-dd”輸出日期。

Copy the below query in the query editor and click on RUN.

在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

SELECT
 dateFROM
  `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
 country_name = 'Italy'AND cumulative_deceased > 10000ORDER BY dateLIMIT 1

任務詳細教程— 6 (Detailed Tutorial of Task — 6)

The query given should be updated to output the correct number of days in India between 21 Feb 2020 and 15 March 2020 when there were zero increases in the number of confirmed cases.

給出的查詢應進行更新，以輸出2020年2月21日至2020年3月15日之間印度的正確天數，此時確診病例數增加為零。

Copy the below query in the query editor and click on RUN.

在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

WITH india_cases_by_date AS (SELECT
    date,SUM(cumulative_confirmed) AS casesFROM
    `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
    country_name="India"AND date between '2020-02-21' and '2020-03-15'GROUP BY
    dateORDER BY
    date ASC
 )
, india_previous_day_comparison AS
(SELECT
  date,
  cases,
  LAG(cases) OVER(ORDER BY date) AS previous_day,
  cases - LAG(cases) OVER(ORDER BY date) AS net_new_casesFROM india_cases_by_date
)SELECTCOUNT(date)FROM
  india_previous_day_comparisonWHERE
  net_new_cases = 0

任務詳細教程— 7 (Detailed Tutorial of Task — 7)

Using the query that we ran in Task 6 as a template, the user has to build a query to find out the dates on which the confirmed cases increased by more than 10% compared to the previous day in the US between the dates March 22, 2020 and April 20, 2020.

使用我們在任務6中運行的查詢作為模板，用戶必須構建查詢以找出確認的病例比3月22日在美國的前一天增加了10％以上的日期， 2020年和2020年4月20日。

There should be four columns named Date, Confirmed_Cases_On_Day, Confirmed_Cases_Previous_Day and Percentage_Increase_In_Cases.
應該有四列，分別命名為Date ， Confirmed_Cases_On_Day ， Confirmed_Cases_Previous_Day和Percentage_Increase_In_Cases 。

Copy the below query in the query editor and click on RUN.

在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

WITH us_cases_by_date AS (SELECT
    date,SUM( cumulative_confirmed ) AS casesFROM
    `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
    country_name="United States of America"AND date between '2020-03-22' and '2020-04-20'GROUP BY
    dateORDER BY
    date ASC
 )
, us_previous_day_comparison AS
(SELECT
  date,
  cases,
  LAG(cases) OVER(ORDER BY date) AS previous_day,
  cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases,
  (cases - LAG(cases) OVER(ORDER BY date))*100/LAG(cases) OVER(ORDER BY date) AS percentage_increaseFROM us_cases_by_date
)SELECT
  Date,
  cases AS Confirmed_Cases_On_Day,
  previous_day AS Confirmed_Cases_Previous_Day,
  percentage_increase AS Percentage_Increase_In_CasesFROM
  us_previous_day_comparisonWHERE
  percentage_increase > 10

任務詳細教程— 8 (Detailed Tutorial of Task — 8)

Building a query to list the recovery rates of countries on the date May 10, 2020 with only those countries having more than 50K confirmed cases and output arranged in descending order (limit to 10). The name of the columns in the output should be as country, recovered_cases, confirmed_cases, recovery_rate in order to score full marks.

生成查詢以列出2020年5月10日的國家的恢復率，只有那些確認病例和產量超過5萬的國家/地區以降序排列(限制為10個)。在輸出列的名稱應為國家 ，recovered_cases，confirmed_cases，recovery_rate才能得滿分。

Copy the below query in the query editor and click on RUN.

在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

WITH cases_by_country AS (SELECT
    country_name AS country,SUM(cumulative_confirmed) AS cases,SUM(cumulative_recovered) AS recovered_casesFROM
    `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
    date="2020-05-10"GROUP BY
    country_name
)
, recovered_rate AS (SELECT
    country, cases, recovered_cases,
    (recovered_cases * 100)/cases AS recovery_rateFROM
    cases_by_country
)SELECT country, cases AS confirmed_cases, recovered_cases, recovery_rateFROM
   recovered_rateWHERE
   cases > 50000ORDER BY recovery_rate DESCLIMIT 10

任務詳細教程— 9 (Detailed Tutorial of Task — 9)

Building a query that outputs the correct CDGR in the correct format. The CDGR or Cumulative Daily Growth Rate is calculated as:

建立一個以正確格式輸出正確CDGR的查詢。 CDGR或累計每日增長率計算為：

((last_day_cases/first_day_cases)^1/days_diff)-1)

Where last_day_cases, first_day_cases and days_diff is given as:

其中last_day_cases，first_day_cases和days_diff給出為：

last_day_cases corresponds to the number of confirmed cases on May 10, 2020
last_day_cases對應于2020年5月10日的確診病例數
first_day_cases corresponds to the number of confirmed cases on Feb 02, 2020
first_day_cases對應于2020年2月2日的確診病例數
days_diff corresponds to the number of days between Feb 02 - May 10, 2020
days_diff對應于2020年2月2日至5月10日之間的天數

Copy the below query in the query editor and click on RUN.

在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

WITH
  france_cases AS (SELECT
    date,SUM(cumulative_confirmed) AS total_casesFROM
    `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
    country_name="France"AND date IN ('2020-01-24',
      '2020-05-10')GROUP BY
    dateORDER BY
    date)
, summary as (SELECT
  total_cases AS first_day_cases,
  LEAD(total_cases) OVER(ORDER BY date) AS last_day_cases,
  DATE_DIFF(LEAD(date) OVER(ORDER BY date),date, day) AS days_diffFROM
  france_casesLIMIT 1
)select first_day_cases, last_day_cases, days_diff, POWER(last_day_cases/first_day_cases,1/days_diff)-1 as cdgrfrom summary

任務詳細教程— 10 (Detailed Tutorial of Task — 10)

For creating the Data Studio report, a number of steps should be followed.

要創建Data Studio報表，應遵循許多步驟。

1. First of all, Copy the below query in the query editor and click on RUN.

1.首先，在查詢編輯器中復制以下查詢，然后單擊“ 運行”。

SELECT
  date, SUM(cumulative_confirmed) AS country_cases,SUM(cumulative_deceased) AS country_deathsFROM
  `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
  date BETWEEN '2020-03-15'AND '2020-04-30'AND country_name='United States of America'GROUP BY date

2. Click on EXPLORE DATA > Explore with Data Studio.

2.單擊探索數據 > 使用Data Studio探索 。

3. Give access to Data Studio and authorize it to control BigQuery.

3.授予對Data Studio的訪問權限，并授權它控制BigQuery。

If you fail to create a report for the very first time login of Data Studio, click + Blank Report option and accept the Terms of Service. Then, go back again to BigQuery page and click Explore with Data Studio again.
如果您第一次登錄Data Studio時未能創建報告，請單擊+空白報告選項并接受服務條款。然后，再次返回BigQuery頁面，然后再次單擊“使用Data Studio探索” 。

4. Create a new Time series chart in the new Data Studio report by selecting Add a chart > Time series Chart.

4.通過選擇新的Data Studio報告創建一個新的時間序列圖表 添加圖表 > 時間序列圖 。

5. Add country_cases and country_deaths to the Metric field.

5.將country_cases和country_deaths添加到“ 度量”字段。

6. Click Save to commit the change.

6.單擊保存以提交更改。

恭喜!! (Congratulations!!)

This is the skill badge I got after completing this challenge lab :P

這是完成挑戰實驗后獲得的技能徽章：P

With this, we have come to the end of this challenge lab. Thanks for reading this and following along. Hope you loved it! Bundle of thanks for reading it!

至此，我們已經到了挑戰實驗室的終點。感謝您閱讀并繼續。希望你喜歡它！ 捆綁感謝您閱讀！

My Portfolio and Linkedin :)

我的投資組合和Linkedin :)

翻譯自: https://medium.com/swlh/insights-from-data-with-bigquery-challenge-lab-tutorial-f868992ef9dc

bigquery 教程

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/389757.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/389757.shtml
英文地址，請注明出處：http://en.pswp.cn/news/389757.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！