snowflake 數據庫
目錄 (Table of Contents)
Introduction
介紹
Creating a Snowflake Datasource
創建雪花數據源
Querying Your Datasource
查詢數據源
Analyzing Your Data and Adding Visualizations
分析數據并添加可視化
Using Drilldowns on Your Visualizations
在可視化上使用明細
Using Search-Based Analytics to Query Your Data
使用基于搜索的分析來查詢數據
Summary
摘要
介紹 (Introduction)
Snowflake is a purpose-built SQL cloud data platform that has grown at a nearly unprecedented rate since launching in 2014; Okta Inc.’s 2020 Businesses @ Work report found that Snowflake was the world’s fastest-growing app. Snowflake’s growth is easy to reconcile given their unmatched flexibility, top-of-the-line security, automatic scaling of storage, and seamless integration with various BI tools.
Snowflake是一個專用SQL云數據平臺,自2014年推出以來,其增長速度幾乎達到前所未有的水平; Okta Inc.的2020年Businesss @ Work報告發現,雪花是世界上增長最快的應用程序。 憑借其無與倫比的靈活性,頂級安全性,存儲的自動擴展以及與各種BI工具的無縫集成,Snowflake的增長易于協調。
Among the BI tools that offer Snowflake integration, only one is fully native to Snowflake with support for nested objects and arrays: Knowi. This allows users to simultaneously analyze their data while reaping the benefits of Snowflake’s scaling and security. If you’d like to learn more about using Knowi to analyze your Snowflake data, this tutorial is for you.
在提供Snowflake集成的BI工具中,只有一個完全是Snowflake原生的,支持嵌套對象和數組: Knowi 。 這使用戶可以同時分析其數據,同時充分利用Snowflake的擴展性和安全性。 如果您想了解更多有關使用Knowi分析雪花數據的信息,本教程非常適合您。
創建雪花數據源 (Creating a Snowflake Datasource)
Once you’ve set up your free Knowi trial account and logged in, follow these steps:
設置免費的Knowi試用帳戶并登錄后,請按照以下步驟操作:
1. Locate and click on “Data sources” on the panel on the left side of your screen.
1.在屏幕左側的面板上找到并單擊“數據源”。
2. Scroll down to “Data Warehouses” and click on Snowflake.
2.向下滾動到“數據倉庫”,然后單擊Snowflake。
3. Right now, the default Schema Name is set to TPC-DS which contains data on products, orders, and customers. We’re going to change the schema to TPC-H, which contains data on decision support systems. In order to do this, change Schema Name from TPCDS_SF100TCL to TPCH_SF1 and click “Test Connection.”
3.現在,默認的架構名稱設置為TPC-DS,其中包含有關產品,訂單和客戶的數據。 我們將模式更改為TPC-H,其中包含有關決策支持系統的數據。 為此,請將Schema Name從TPCDS_SF100TCL更改為TPCH_SF1,然后單擊“測試連接”。
4. In a few moments, Knowi should tell you that your connection was successful; click “Save” once it does.
4.稍后,Knowi應該告訴您您的連接已成功; 單擊“保存”。
Congratulations on setting up your first Snowflake datasource!
祝賀您建立了第一個Snowflake數據源!
查詢數據源 (Querying Your Datasource)
Now that you’ve created a datasource, you can run queries on your data by following these steps:
現在,您已經創建了數據源,可以按照以下步驟對數據運行查詢:
1. As soon as you saved your datasource, you should’ve received a “Datasource Added. Configure Queries.” alert at the top of your page. Click on the word Queries. (You can also just go back to the panel on the left side of your screen, go right below “Data Sources,” click on “Queries,” and select “New Query +” from the top right.)
1.保存數據源后,您應該立即收到“添加的數據源”。 配置查詢。” 在頁面頂部發出警報。 單擊查詢一詞。 (您也可以返回屏幕左側的面板,轉到“數據源”下方,單擊“查詢”,然后從右上角選擇“新建查詢+”。)
2. Name your report inside the “Report Name*” bar on the very top left of your screen. The query that we’re using here will be closely modeled off of the default functional query that is provided in the TPC-H schema, so let’s name this one “Functional Query.”
2.在屏幕左上角的“報告名稱*”欄中命名報告。 我們將在此處使用的查詢與TPC-H模式中提供的默認功能查詢緊密相似,因此,我們將其命名為“功能查詢”。
3. This default query schema lists the totals and averages for extended price, discounted extended price, and discount extended price plus tax, as well as total charge and a count of the number of line items, and groups this data by return flag and line status. In order to enter this query, head over to “Snowflake Query” in your Query Builder and enter the following syntax:
3.此默認查詢架構列出了擴展價格,擴展的折扣價,擴展的折扣價和稅費的總計和平均值,以及總費用和行項目數的計數,并按返回標志和行將數據分組狀態。 為了輸入此查詢,請轉到查詢生成器中的“雪花查詢”,然后輸入以下語法:
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1-l_discount)) as sum_disc_price,
sum(l_extendedprice * (1-l_discount) * (1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= dateadd(day, -90, to_date('1998-12-01'))
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
4. Before you run this query, we want to add one more column that concatenates return flag and line status. This will make it much easier to visualize our data. In order to do this post-processing with Cloud9QL — Knowi’s powerful SQL style language — enter the following syntax into “Cloud9QL Query:”
4.在運行此查詢之前,我們要再添加一列以將返回標志和行狀態連接在一起。 這將使可視化我們的數據變得更加容易。 為了使用Cloud9QL(Knowi的強大SQL樣式語言)進行此后處理,請在“ Cloud9QL Query:”中輸入以下語法:
select concat(l_returnflag, " - ", l_linestatus) as Flag - Status, *
5. Head to the bottom of your screen and click the blue “Preview” button. This should return four rows and eleven columns worth of data. Once you’ve confirmed that it does, click the green “Save & Run Now” button on the bottom right corner of your screen.
5.轉到屏幕底部,然后單擊藍色的“預覽”按鈕。 這將返回四行和十一列的數據。 確認確實如此后,請點擊屏幕右下角的綠色“立即保存并立即運行”按鈕。
When you ran your query, Knowi automatically saved your results as a virtual dataset and stored those results as a dataset in its elastic data warehouse. Every time you successfully run a query, Knowi does this.
運行查詢時,Knowi自動將結果保存為虛擬數據集,并將這些結果作為數據集存儲在其彈性數據倉庫中 。 每次您成功運行查詢時,Knowi都會這樣做。
分析數據并添加可視化 (Analyzing Your Data and Adding Visualizations)
As you saw in the data preview, there are 4 different combinations for return flag and line status: A-F, N-F, N-O, and R-F. Let’s say we want to visualize various metrics, such as the total sum of order quantity, grouped by these separate flag — status combinations. Knowi allows us to efficiently visualize this in just a few steps:
如您在數據預覽中所看到的,返回標志和線路狀態有4種不同的組合:AF,NF,NO和RF。 假設我們要可視化各種指標,例如訂單數量的總和,并按這些單獨的標記-狀態組合進行分組。 Knowi使我們能夠僅需幾個步驟就可以有效地實現可視化:
1. Return to the top of the panel on the left side of your screen and click on “Dashboards.” Click the orange plus icon and name your dashboard “TPC-H Visualizations.”
1.返回屏幕左側面板的頂部,然后單擊“儀表板”。 單擊橙色加號圖標,然后將儀表板命名為“ TPC-H Visualizations”。
2. Go just below “Dashboards” on your panel and click “Widgets.” Drag your “Functional Query” widget onto your new dashboard.
2.轉到面板上“儀表板”下方,然后單擊“窗口小部件”。 將“功能查詢”窗口小部件拖動到新的儀表板上。
3. Right now, your widget is just a data grid containing the results from our query. We want to keep this data grid, but to add a more compelling visualization that more quickly conveys a message. Head over to the top right corner of your widget, click on the 3 dot icon and click on “Analyze.”
3.現在,您的小部件只是一個數據網格,其中包含我們查詢的結果。 我們希望保留此數據網格,但要添加更具吸引力的可視化效果,以更快地傳達消息。 轉到小部件的右上角,單擊3點圖標,然后單擊“分析”。
4. Drag the “Flag — Status” and “Sum_QTY” bars over from the left side of your screen into the “Fields/Metrics:” box. This shows us the total quantity for each combination. Now, head to the top of your screen and click on “Visualization.” Scroll over to “Visualization Type” in the top left corner of your screen and change this from “Data Grid” to “Donut.”
4.將“標記-狀態”和“ Sum_QTY”欄從屏幕左側拖到“字段/指標:”框中。 這向我們顯示了每種組合的總量。 現在,轉到屏幕頂部,然后單擊“可視化”。 滾動到屏幕左上角的“可視化類型”,然后將其從“數據網格”更改為“甜甜圈”。
5. This donut chart already rather clearly conveys that roughly one quarter of the total quantity falls into each of R-F and A-F, roughly falls into N-F, and roughly half falls into N-O, but it may still help to add value labels and percentages to the chart. To do this, scroll down to “Options” and click on it, then check the boxes underneath “Display as Percent” and “Label-Value.”
5.這個甜甜圈圖已經很清楚地表明,大約總量的四分之一落入RF和AF中,大約落入NF中,大約一半落入NO,但仍可能有助于在標簽中添加價值標簽和百分比。圖表。 為此,向下滾動到“選項”并單擊它,然后選中“顯示為百分比”和“標簽值”下方的框。
6. Now, head back to the top right corner of your screen and click the “Clone” icon that looks like two small pieces of paper. When you do this, you’ll be asked to name your cloned widget; name it “Total Quantity by Flag — Status.”
6.現在,回到屏幕的右上角,然后單擊看起來像兩張小紙片的“克隆”圖標。 執行此操作時,系統將要求您命名克隆的小部件。 將其命名為“按標志總計數量-狀態”。
7. Click “Clone” and then click “Add to Dashboard” to add your new widget to your dashboard.
7.單擊“克隆”,然后單擊“添加到儀表板”,將新的小部件添加到儀表板。
There is a clear conclusion that comes from this visualization: roughly one quarter of the total quantity falls into each of R-F and A-F, roughly one half falls into N-O, and roughly one percent falls into N-F.
該可視化有一個明確的結論:RF和AF分別占總數的四分之一,NO占大約一半,NF則占大約百分之一。
在可視化上使用明細 (Using Drilldowns on Your Visualization)
Drilldowns add an interactive component to your visualizations that allows you to dive into a filtered section of your data with just one click. Follow this process to add a drilldown:
向下鉆取為您的可視化添加了一個交互式組件,使您只需單擊一下即可進入數據的過濾部分。 請按照以下過程添加明細:
1. Click on the 3 dot icon in the top right corner of the “Total Quantity by Flag — Status” widget that you just created, then scroll down and click on “Drilldowns.”
1.單擊剛剛創建的“按標志總計數量-狀態”小部件右上角的3點圖標,然后向下滾動并單擊“向下鉆取”。
2. Set “Widget” as your drilldown type, set it to drill into “Functional Query” when “SUM_QTY” is clicked. Then add an optional drilldown filter that sets “SUM_QTY” equal to “SUM_QTY” and click “SAVE” and “Close.”
2.將“窗口小部件”設置為鉆取類型,單擊“ SUM_QTY”后將其設置為鉆取到“功能查詢”。 然后添加一個可選的向下鉆取過濾器,將“ SUM_QTY”設置為等于“ SUM_QTY”,然后單擊“保存”和“關閉”。
3. Test your drilldown by clicking on N-O, the Flag — Status combination with the largest share of the total quantity. This shows you all of the raw data for that combination. To return to your original visualization, go to the top right corner of your widget and click on the left arrow icon in the middle.
3.單擊“否”,即“標志—狀態”組合,占總數的最大份額,以測試您的向下鉆取。 這將顯示該組合的所有原始數據。 要返回原始的可視化效果,請轉到小部件的右上角,然后單擊中間的左箭頭圖標。
使用基于搜索的分析來查詢數據 (Using Search-Based Analytics to Query Your Data)
Now that you’ve set up a dashboard and become familiar with creating visualizations in Knowi, the next step is to start querying your data using search-based analytics. This feature makes your dashboard accessible to all English speakers — even those who aren’t data savvy or familiar with Knowi. Here’s how to query your data with search-based analytics:
既然您已經設置了儀表板并熟悉在Knowi中創建可視化文件,那么下一步就是開始使用基于搜索的分析來查詢數據。 此功能使所有講英語的人都可以訪問您的儀表板,即使是那些不懂數據或不熟悉Knowi的人也可以使用。 以下是使用基于搜索的分析查詢數據的方法:
1. Head to the top right corner of your original “Functional Query” widget and click on the 3 dot icon. Scroll down and click on “Analyze.”
1.轉到原始“功能查詢”窗口小部件的右上角,然后單擊3點圖標。 向下滾動并單擊“分析”。
2. Let’s say that you don’t care about the return flag for any of this information; you just want to group your data by line status, and look at the total amount that has been charged for those with a line status of F and O. Just head to the search bar at the top of your screen that currently says “Ask a question of your data” and type “total sum charge by line status.”
2.假設您不關心任何此類信息的返回標志; 您只想按行狀態對數據進行分組,并查看對行狀態為F和O的用戶收取的總金額。只需轉到屏幕頂部當前顯示“請問一個您的數據問題”,然后輸入“按行狀態收取的總費用”。
3. Now, to visualize this data, we’re going to follow the exact same process that we did before. Return to “Visualization” and set the visualization type to “Donut.” As you did before, scroll down and click on “Options,” then check the boxes underneath “Display as Percent” and “Label — Value.”
3.現在,為了可視化此數據,我們將遵循與之前完全相同的過程。 返回“可視化”并將可視化類型設置為“甜甜圈”。 與以前一樣,向下滾動并單擊“選項”,然后選中“顯示為百分比”和“標簽-值”下方的框。
4. Return to the top right and click on the “Clone” icon again. Name this widget “Charge by Status” and add it to your dashboard.
4.返回右上角,然后再次單擊“克隆”圖標。 將此小部件命名為“按狀態收費”并將其添加到您的儀表板。
This visualization clearly conveys that orders with a status of F and orders with a status of O both share roughly 50% of the total charge.
該可視化清楚地表明,狀態為F的訂單和狀態為O的訂單均約占總費用的50%。
摘要 (Summary)
To review, we started off by connecting to a sample Snowflake Datasource and ran a functional query on it. The results of this query were stored as a dataset in Knowi’s elastic data warehouse. Afterwards, we asked a question about our data, created a visualization that clearly answered it, and added drilldowns to our visualization that let the user dive deeper into the data behind the visualization. Finally, we asked our data another question in plain English using search-based analytics, and created another visualization which conveyed our answer.
為了進行回顧,我們首先連接到樣本Snowflake數據源,然后在其上運行功能查詢。 該查詢的結果作為數據集存儲在Knowi的彈性數據倉庫中。 之后,我們問了一個關于數據的問題,創建了一個清晰地回答了它的可視化文件,并在可視化文件中添加了向下鉆取,使用戶可以更深入地了解可視化文件背后的數據。 最后,我們使用基于搜索的分析以簡單的英語問了我們的數據另一個問題,并創建了另一個可視化的圖像,傳達了我們的答案。
翻譯自: https://towardsdatascience.com/snowflake-data-analytics-tutorial-c9d4dd9b06d
snowflake 數據庫
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388024.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388024.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388024.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!