snowflake 數據庫_Snowflake數據分析教程

snowflake 數據庫

目錄 (Table of Contents)

  • Introduction

    介紹

  • Creating a Snowflake Datasource

    創建雪花數據源

  • Querying Your Datasource

    查詢數據源

  • Analyzing Your Data and Adding Visualizations

    分析數據并添加可視化

  • Using Drilldowns on Your Visualizations

    在可視化上使用明細

  • Using Search-Based Analytics to Query Your Data

    使用基于搜索的分析來查詢數據

  • Summary

    摘要

介紹 (Introduction)

Snowflake is a purpose-built SQL cloud data platform that has grown at a nearly unprecedented rate since launching in 2014; Okta Inc.’s 2020 Businesses @ Work report found that Snowflake was the world’s fastest-growing app. Snowflake’s growth is easy to reconcile given their unmatched flexibility, top-of-the-line security, automatic scaling of storage, and seamless integration with various BI tools.

Snowflake是一個專用SQL云數據平臺,自2014年推出以來,其增長速度幾乎達到前所未有的水平; Okta Inc.的2020年Businesss @ Work報告發現,雪花是世界上增長最快的應用程序。 憑借其無與倫比的靈活性,頂級安全性,存儲的自動擴展以及與各種BI工具的無縫集成,Snowflake的增長易于協調。

Among the BI tools that offer Snowflake integration, only one is fully native to Snowflake with support for nested objects and arrays: Knowi. This allows users to simultaneously analyze their data while reaping the benefits of Snowflake’s scaling and security. If you’d like to learn more about using Knowi to analyze your Snowflake data, this tutorial is for you.

在提供Snowflake集成的BI工具中,只有一個完全是Snowflake原生的,支持嵌套對象和數組: Knowi 。 這使用戶可以同時分析其數據,同時充分利用Snowflake的擴展性和安全性。 如果您想了解更多有關使用Knowi分析雪花數據的信息,本教程非常適合您。

創建雪花數據源 (Creating a Snowflake Datasource)

Once you’ve set up your free Knowi trial account and logged in, follow these steps:

設置免費的Knowi試用帳戶并登錄后,請按照以下步驟操作:

1. Locate and click on “Data sources” on the panel on the left side of your screen.

1.在屏幕左側的面板上找到并單擊“數據源”。

2. Scroll down to “Data Warehouses” and click on Snowflake.

2.向下滾動到“數據倉庫”,然后單擊Snowflake。

3. Right now, the default Schema Name is set to TPC-DS which contains data on products, orders, and customers. We’re going to change the schema to TPC-H, which contains data on decision support systems. In order to do this, change Schema Name from TPCDS_SF100TCL to TPCH_SF1 and click “Test Connection.”

3.現在,默認的架構名稱設置為TPC-DS,其中包含有關產品,訂單和客戶的數據。 我們將模式更改為TPC-H,其中包含有關決策支持系統的數據。 為此,請將Schema Name從TPCDS_SF100TCL更改為TPCH_SF1,然后單擊“測試連接”。

4. In a few moments, Knowi should tell you that your connection was successful; click “Save” once it does.

4.稍后,Knowi應該告訴您您的連接已成功; 單擊“保存”。

Congratulations on setting up your first Snowflake datasource!

祝賀您建立了第一個Snowflake數據源!

查詢數據源 (Querying Your Datasource)

Now that you’ve created a datasource, you can run queries on your data by following these steps:

現在,您已經創建了數據源,可以按照以下步驟對數據運行查詢:

1. As soon as you saved your datasource, you should’ve received a “Datasource Added. Configure Queries.” alert at the top of your page. Click on the word Queries. (You can also just go back to the panel on the left side of your screen, go right below “Data Sources,” click on “Queries,” and select “New Query +” from the top right.)

1.保存數據源后,您應該立即收到“添加的數據源”。 配置查詢。” 在頁面頂部發出警報。 單擊查詢一詞。 (您也可以返回屏幕左側的面板,轉到“數據源”下方,單擊“查詢”,然后從右上角選擇“新建查詢+”。)

2. Name your report inside the “Report Name*” bar on the very top left of your screen. The query that we’re using here will be closely modeled off of the default functional query that is provided in the TPC-H schema, so let’s name this one “Functional Query.”

2.在屏幕左上角的“報告名稱*”欄中命名報告。 我們將在此處使用的查詢與TPC-H模式中提供的默認功能查詢緊密相似,因此,我們將其命名為“功能查詢”。

3. This default query schema lists the totals and averages for extended price, discounted extended price, and discount extended price plus tax, as well as total charge and a count of the number of line items, and groups this data by return flag and line status. In order to enter this query, head over to “Snowflake Query” in your Query Builder and enter the following syntax:

3.此默認查詢架構列出了擴展價格,擴展的折扣價,擴展的折扣價和稅費的總計和平均值,以及總費用和行項目數的計數,并按返回標志和行將數據分組狀態。 為了輸入此查詢,請轉到查詢生成器中的“雪花查詢”,然后輸入以下語法:

select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1-l_discount)) as sum_disc_price,
sum(l_extendedprice * (1-l_discount) * (1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= dateadd(day, -90, to_date('1998-12-01'))
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;

4. Before you run this query, we want to add one more column that concatenates return flag and line status. This will make it much easier to visualize our data. In order to do this post-processing with Cloud9QL — Knowi’s powerful SQL style language — enter the following syntax into “Cloud9QL Query:”

4.在運行此查詢之前,我們要再添加一列以將返回標志和行狀態連接在一起。 這將使可視化我們的數據變得更加容易。 為了使用Cloud9QL(Knowi的強大SQL樣式語言)進行此后處理,請在“ Cloud9QL Query:”中輸入以下語法:

select concat(l_returnflag, " - ",  l_linestatus) as Flag - Status, *

5. Head to the bottom of your screen and click the blue “Preview” button. This should return four rows and eleven columns worth of data. Once you’ve confirmed that it does, click the green “Save & Run Now” button on the bottom right corner of your screen.

5.轉到屏幕底部,然后單擊藍色的“預覽”按鈕。 這將返回四行和十一列的數據。 確認確實如此后,請點擊屏幕右下角的綠色“立即保存并立即運行”按鈕。

When you ran your query, Knowi automatically saved your results as a virtual dataset and stored those results as a dataset in its elastic data warehouse. Every time you successfully run a query, Knowi does this.

運行查詢時,Knowi自動將結果保存為虛擬數據集,并將這些結果作為數據集存儲在其彈性數據倉庫中 。 每次您成功運行查詢時,Knowi都會這樣做。

分析數據并添加可視化 (Analyzing Your Data and Adding Visualizations)

As you saw in the data preview, there are 4 different combinations for return flag and line status: A-F, N-F, N-O, and R-F. Let’s say we want to visualize various metrics, such as the total sum of order quantity, grouped by these separate flag — status combinations. Knowi allows us to efficiently visualize this in just a few steps:

如您在數據預覽中所看到的,返回標志和線路狀態有4種不同的組合:AF,NF,NO和RF。 假設我們要可視化各種指標,例如訂單數量的總和,并按這些單獨的標記-狀態組合進行分組。 Knowi使我們能夠僅需幾個步驟就可以有效地實現可視化:

1. Return to the top of the panel on the left side of your screen and click on “Dashboards.” Click the orange plus icon and name your dashboard “TPC-H Visualizations.”

1.返回屏幕左側面板的頂部,然后單擊“儀表板”。 單擊橙色加號圖標,然后將儀表板命名為“ TPC-H Visualizations”。

2. Go just below “Dashboards” on your panel and click “Widgets.” Drag your “Functional Query” widget onto your new dashboard.

2.轉到面板上“儀表板”下方,然后單擊“窗口小部件”。 將“功能查詢”窗口小部件拖動到新的儀表板上。

3. Right now, your widget is just a data grid containing the results from our query. We want to keep this data grid, but to add a more compelling visualization that more quickly conveys a message. Head over to the top right corner of your widget, click on the 3 dot icon and click on “Analyze.”

3.現在,您的小部件只是一個數據網格,其中包含我們查詢的結果。 我們希望保留此數據網格,但要添加更具吸引力的可視化效果,以更快地傳達消息。 轉到小部件的右上角,單擊3點圖標,然后單擊“分析”。

4. Drag the “Flag — Status” and “Sum_QTY” bars over from the left side of your screen into the “Fields/Metrics:” box. This shows us the total quantity for each combination. Now, head to the top of your screen and click on “Visualization.” Scroll over to “Visualization Type” in the top left corner of your screen and change this from “Data Grid” to “Donut.”

4.將“標記-狀態”和“ Sum_QTY”欄從屏幕左側拖到“字段/指標:”框中。 這向我們顯示了每種組合的總量。 現在,轉到屏幕頂部,然后單擊“可視化”。 滾動到屏幕左上角的“可視化類型”,然后將其從“數據網格”更改為“甜甜圈”。

5. This donut chart already rather clearly conveys that roughly one quarter of the total quantity falls into each of R-F and A-F, roughly falls into N-F, and roughly half falls into N-O, but it may still help to add value labels and percentages to the chart. To do this, scroll down to “Options” and click on it, then check the boxes underneath “Display as Percent” and “Label-Value.”

5.這個甜甜圈圖已經很清楚地表明,大約總量的四分之一落入RF和AF中,大約落入NF中,大約一半落入NO,但仍可能有助于在標簽中添加價值標簽和百分比。圖表。 為此,向下滾動到“選項”并單擊它,然后選中“顯示為百分比”和“標簽值”下方的框。

6. Now, head back to the top right corner of your screen and click the “Clone” icon that looks like two small pieces of paper. When you do this, you’ll be asked to name your cloned widget; name it “Total Quantity by Flag — Status.”

6.現在,回到屏幕的右上角,然后單擊看起來像兩張小紙片的“克隆”圖標。 執行此操作時,系統將要求您命名克隆的小部件。 將其命名為“按標志總計數量-狀態”。

7. Click “Clone” and then click “Add to Dashboard” to add your new widget to your dashboard.

7.單擊“克隆”,然后單擊“添加到儀表板”,將新的小部件添加到儀表板。

There is a clear conclusion that comes from this visualization: roughly one quarter of the total quantity falls into each of R-F and A-F, roughly one half falls into N-O, and roughly one percent falls into N-F.

該可視化有一個明確的結論:RF和AF分別占總數的四分之一,NO占大約一半,NF則占大約百分之一。

在可視化上使用明細 (Using Drilldowns on Your Visualization)

Drilldowns add an interactive component to your visualizations that allows you to dive into a filtered section of your data with just one click. Follow this process to add a drilldown:

向下鉆取為您的可視化添加了一個交互式組件,使您只需單擊一下即可進入數據的過濾部分。 請按照以下過程添加明細:

1. Click on the 3 dot icon in the top right corner of the “Total Quantity by Flag — Status” widget that you just created, then scroll down and click on “Drilldowns.”

1.單擊剛剛創建的“按標志總計數量-狀態”小部件右上角的3點圖標,然后向下滾動并單擊“向下鉆取”。

2. Set “Widget” as your drilldown type, set it to drill into “Functional Query” when “SUM_QTY” is clicked. Then add an optional drilldown filter that sets “SUM_QTY” equal to “SUM_QTY” and click “SAVE” and “Close.”

2.將“窗口小部件”設置為鉆取類型,單擊“ SUM_QTY”后將其設置為鉆取到“功能查詢”。 然后添加一個可選的向下鉆取過濾器,將“ SUM_QTY”設置為等于“ SUM_QTY”,然后單擊“保存”和“關閉”。

3. Test your drilldown by clicking on N-O, the Flag — Status combination with the largest share of the total quantity. This shows you all of the raw data for that combination. To return to your original visualization, go to the top right corner of your widget and click on the left arrow icon in the middle.

3.單擊“否”,即“標志—狀態”組合,占總數的最大份額,以測試您的向下鉆取。 這將顯示該組合的所有原始數據。 要返回原始的可視化效果,請轉到小部件的右上角,然后單擊中間的左箭頭圖標。

使用基于搜索的分析來查詢數據 (Using Search-Based Analytics to Query Your Data)

Now that you’ve set up a dashboard and become familiar with creating visualizations in Knowi, the next step is to start querying your data using search-based analytics. This feature makes your dashboard accessible to all English speakers — even those who aren’t data savvy or familiar with Knowi. Here’s how to query your data with search-based analytics:

既然您已經設置了儀表板并熟悉在Knowi中創建可視化文件,那么下一步就是開始使用基于搜索的分析來查詢數據。 此功能使所有講英語的人都可以訪問您的儀表板,即使是那些不懂數據或不熟悉Knowi的人也可以使用。 以下是使用基于搜索的分析查詢數據的方法:

1. Head to the top right corner of your original “Functional Query” widget and click on the 3 dot icon. Scroll down and click on “Analyze.”

1.轉到原始“功能查詢”窗口小部件的右上角,然后單擊3點圖標。 向下滾動并單擊“分析”。

2. Let’s say that you don’t care about the return flag for any of this information; you just want to group your data by line status, and look at the total amount that has been charged for those with a line status of F and O. Just head to the search bar at the top of your screen that currently says “Ask a question of your data” and type “total sum charge by line status.”

2.假設您不關心任何此類信息的返回標志; 您只想按行狀態對數據進行分組,并查看對行狀態為F和O的用戶收取的總金額。只需轉到屏幕頂部當前顯示“請問一個您的數據問題”,然后輸入“按行狀態收取的總費用”。

3. Now, to visualize this data, we’re going to follow the exact same process that we did before. Return to “Visualization” and set the visualization type to “Donut.” As you did before, scroll down and click on “Options,” then check the boxes underneath “Display as Percent” and “Label — Value.”

3.現在,為了可視化此數據,我們將遵循與之前完全相同的過程。 返回“可視化”并將可視化類型設置為“甜甜圈”。 與以前一樣,向下滾動并單擊“選項”,然后選中“顯示為百分比”和“標簽-值”下方的框。

4. Return to the top right and click on the “Clone” icon again. Name this widget “Charge by Status” and add it to your dashboard.

4.返回右上角,然后再次單擊“克隆”圖標。 將此小部件命名為“按狀態收費”并將其添加到您的儀表板。

This visualization clearly conveys that orders with a status of F and orders with a status of O both share roughly 50% of the total charge.

該可視化清楚地表明,狀態為F的訂單和狀態為O的訂單均約占總費用的50%。

摘要 (Summary)

To review, we started off by connecting to a sample Snowflake Datasource and ran a functional query on it. The results of this query were stored as a dataset in Knowi’s elastic data warehouse. Afterwards, we asked a question about our data, created a visualization that clearly answered it, and added drilldowns to our visualization that let the user dive deeper into the data behind the visualization. Finally, we asked our data another question in plain English using search-based analytics, and created another visualization which conveyed our answer.

為了進行回顧,我們首先連接到樣本Snowflake數據源,然后在其上運行功能查詢。 該查詢的結果作為數據集存儲在Knowi的彈性數據倉庫中。 之后,我們問了一個關于數據的問題,創建了一個清晰地回答了它的可視化文件,并在可視化文件中添加了向下鉆取,使用戶可以更深入地了解可視化文件背后的數據。 最后,我們使用基于搜索的分析以簡單的英語問了我們的數據另一個問題,并創建了另一個可視化的圖像,傳達了我們的答案。

翻譯自: https://towardsdatascience.com/snowflake-data-analytics-tutorial-c9d4dd9b06d

snowflake 數據庫

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388024.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388024.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388024.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

jeesite緩存問題

jeesite&#xff0c;其框架主要為&#xff1a; 后端 核心框架&#xff1a;Spring Framework 4.0 安全框架&#xff1a;Apache Shiro 1.2 視圖框架&#xff1a;Spring MVC 4.0 服務端驗證&#xff1a;Hibernate Validator 5.1 布局框架&#xff1a;SiteMesh 2.4 工作流引擎…

高級Python:定義類時要應用的9種最佳做法

重點 (Top highlight)At its core, Python is an object-oriented programming (OOP) language. Being an OOP language, Python handles data and functionalities by supporting various features centered around objects. For instance, data structures are all objects, …

Java 注解 攔截器

場景描述&#xff1a;現在需要對部分Controller或者Controller里面的服務方法進行權限攔截。如果存在我們自定義的注解&#xff0c;通過自定義注解提取所需的權限值&#xff0c;然后對比session中的權限判斷當前用戶是否具有對該控制器或控制器方法的訪問權限。如果沒有相關權限…

醫療大數據處理流程_我們需要數據來大規模改善醫療流程

醫療大數據處理流程Note: the fictitious examples and diagrams are for illustrative purposes ONLY. They are mainly simplifications of real phenomena. Please consult with your physician if you have any questions.注意&#xff1a;虛擬示例和圖表僅用于說明目的。 …

What's the difference between markForCheck() and detectChanges()

https://stackoverflow.com/questions/41364386/whats-the-difference-between-markforcheck-and-detectchanges轉載于:https://www.cnblogs.com/chen8840/p/10573295.html

ASP.NET Core中使用GraphQL - 第七章 Mutation

ASP.NET Core中使用GraphQL - 目錄 ASP.NET Core中使用GraphQL - 第一章 Hello WorldASP.NET Core中使用GraphQL - 第二章 中間件ASP.NET Core中使用GraphQL - 第三章 依賴注入ASP.NET Core中使用GraphQL - 第四章 GrahpiQLASP.NET Core中使用GraphQL - 第五章 字段, 參數, 變量…

POM.xml紅叉解決方法

方法/步驟 1用Eclipse創建一個maven工程&#xff0c;網上有很多資料&#xff0c;這里不再啰嗦。 2右鍵maven工程&#xff0c;進行更新 3在彈出的對話框中勾選強制更新&#xff0c;如圖所示 4稍等片刻&#xff0c;pom.xml的紅叉消失了。。。

JS前臺頁面驗證文本框非空

效果圖&#xff1a; 代碼&#xff1a; 源代碼&#xff1a; <script type"text/javascript"> function check(){ var xm document.getElementById("xm").value; if(xm null || xm ){ alert("用戶名不能為空"); return false; } return …

python對象引用計數器_在Python中借助計數器對象對項目進行計數

python對象引用計數器前提 (The Premise) When we deal with data containers, such as tuples and lists, in Python we often need to count particular elements. One common way to do this is to use the count() function — you specify the element you want to count …

套接字設置為(非)阻塞模式

當socket 進行TCP 連接的時候&#xff08;也就是調用connect 時&#xff09;&#xff0c;一旦網絡不通&#xff0c;或者是ip 地址無效&#xff0c;就可能使整個線程阻塞。一般為30 秒&#xff08;我測的是20 秒&#xff09;。如果設置為非阻塞模式&#xff0c;能很好的解決這個…

經典問題之「分支預測」

問題 來源 &#xff1a;stackoverflow 為什么下面代碼排序后累加比不排序快&#xff1f; public static void main(String[] args) {// Generate dataint arraySize 32768;int data[] new int[arraySize];Random rnd new Random(0);for (int c 0; c < arraySize; c)data…

vi

vi filename :打開或新建文件&#xff0c;并將光標置于第一行首 vi n filename &#xff1a;打開文件&#xff0c;并將光標置于第n行首 vi filename &#xff1a;打開文件&#xff0c;并將光標置于最后一行首 vi /pattern filename&#xff1a;打開文件&#xff0c;并將光標置…

數字圖像處理 python_5使用Python處理數字的高級操作

數字圖像處理 pythonNumbers are everywhere in our daily life — there are phone numbers, dates of birth, ages, and other various identifiers (driver’s license and social security numbers, for example).電話號碼在我們的日常生活中無處不在-電話號碼&#xff0c;…

05精益敏捷項目管理——超越Scrum

00.我們不是不知道它會給我們帶來麻煩&#xff0c;只是沒想到麻煩會有這么多。——威爾.羅杰斯 01.知識點&#xff1a; a.Scrum是一個強大、特意設計的輕量級框架&#xff0c;器特性就是將軟件開發中在制品的數量限制在團隊層級&#xff0c;使團隊有能力與業務落班一起有效地開…

帶標題的圖片輪詢展示

為什么80%的碼農都做不了架構師&#xff1f;>>> <div> <table width"671" cellpadding"0" cellspacing"0"> <tr height"5"> <td style"back…

linux java 查找進程中的線程

這里對linux下、sun(oracle) JDK的線程資源占用問題的查找步驟做一個小結&#xff1b;linux環境下&#xff0c;當發現java進程占用CPU資源很高&#xff0c;且又要想更進一步查出哪一個java線程占用了CPU資源時&#xff0c;按照以下步驟進行查找&#xff1a;(一)&#xff1a;通過…

定位匹配 模板匹配 地圖_什么是地圖匹配?

定位匹配 模板匹配 地圖By Marie Douriez, James Murphy, Kerrick Staley瑪麗杜里茲(Marie Douriez)&#xff0c;詹姆斯墨菲(James Murphy)&#xff0c;凱里克史塔利(Kerrick Staley) When you request a ride, Lyft tries to match you with the driver most suited for your…

Sprint計劃列表

轉載于:https://www.cnblogs.com/zhs20160715/p/9953586.html

MySQL學習【第十二篇事務中的鎖與隔離級別】

一.事務中的鎖 1.啥是鎖&#xff1f; 顧名思義&#xff0c;鎖就是鎖定的意思 2.鎖的作用是什么&#xff1f; 在事務ACID的過程中&#xff0c;‘鎖’和‘隔離級別’一起來實現‘I’隔離性的作用 3.鎖的種類 共享鎖&#xff1a;保證在多事務工作期間&#xff0c;數據查詢不會被阻…

Android WebKit

這段時間基于項目需要 在開發中與WebView的接觸比較多&#xff0c;前段時間關于HTML5規范塵埃落定的消息出現在各大IT社區頭版上&#xff0c;更有人說&#xff1a;HTML5將顛覆原生App開發 雖然我不太認同這一點 但是關于HTML5JSCSSNative的跨平臺開發模式還是為很多企業節省了開…