第1章 數倉分層
1.1 為什么要分層
DIM:dimensionality?維度
1.2?數據集市與數據倉庫概念
1.3?數倉命名規范
1.3.1 表命名
- ODS層命名為ods_表名
- DIM層命名為dim_表名
- DWD層命名為dwd_表名
- DWS層命名為dws_表名 ?
- DWT層命名為dwt_表名
- ADS層命名為ads_表名
- 臨時表命名為tmp_表名
1.3.2 腳本命名
- 數據源_to_目標_db/log.sh
- 用戶行為腳本以log為后綴;業務數據腳本以db為后綴。
1.3.3?表字段類型
- 數量類型為bigint
- 金額類型為decimal(16, 2),表示:16位有效數字,其中小數部分2位
- 字符串(名字,描述信息等)類型為string
- 主鍵外鍵類型為string
- 時間戳類型為bigint
第2章 數倉理論
2.1 范式理論
2.1.1 范式概念
1)定義
數據建模必須遵循一定的規則,在關系建模中,這種規則就是范式。
2)目的
采用范式,可以降低數據的冗余性。
為什么要降低數據冗余性?
(1)十幾年前,磁盤很貴,為了減少磁盤存儲。
(2)以前沒有分布式系統,都是單機,只能增加磁盤,磁盤個數也是有限的
(3)一次修改,需要修改多個表,很難保證數據一致性
3)缺點
范式的缺點是獲取數據時,需要通過Join拼接出最后的數據。
4)分類
目前業界范式有:第一范式(1NF)、第二范式(2NF)、第三范式(3NF)、巴斯-科德范式(BCNF)、第四范式(4NF)、第五范式(5NF)。?
2.1.2 函數依賴
2.1.3 三范式區分
2.2?關系建模與維度建模
關系建模和維度建模是兩種數據倉庫的建模技術。關系建模由Bill Inmon所倡導,維度建模由Ralph Kimball所倡導。
2.2.1 關系建模
關系建模將復雜的數據抽象為兩個概念——實體和關系,并使用規范化的方式表示出來。關系模型如圖所示,從圖中可以看出,較為松散、零碎,物理表數量多。
關系模型嚴格遵循第三范式(3NF),數據冗余程度低,數據的一致性容易得到保證。由于數據分布于眾多的表中,查詢會相對復雜,在大數據的場景下,查詢效率相對較低。
2.2.2 維度建模
維度模型如圖所示,從圖中可以看出,模型相對清晰、簡潔。
維度模型以數據分析作為出發點,不遵循三范式,故數據存在一定的冗余。維度模型面向業務,將業務用事實表和維度表呈現出來。表結構簡單,故查詢簡單,查詢效率較高。
2.3 維度表和事實表(重點)
2.3.1?維度表
維度表:一般是對事實的描述信息。每一張維表對應現實世界中的一個對象或者概念。 ???例如:用戶、商品、日期、地區等。
維表的特征:
- 維表的范圍很寬(具有多個屬性、列比較多)
- 跟事實表相比,行數相對較小:通常< 10萬條
- 內容相對固定:編碼表
時間維度表:
日期ID | day?of?week | day?of?year | 季度 | 節假日 |
2020-01-01 | 2 | 1 | 1 | 元旦 |
2020-01-02 | 3 | 2 | 1 | 無 |
2020-01-03 | 4 | 3 | 1 | 無 |
2020-01-04 | 5 | 4 | 1 | 無 |
2020-01-05 | 6 | 5 | 1 | 無 |
2.3.2 事實表
????????事實表中的每行數據代表一個業務事件(下單、支付、退款、評價等)。“事實”這個術語表示的是業務事件的度量值(可統計次數、個數、金額等),例如,2020年5月21日,宋宋老師在京東花了250塊錢買了一瓶海狗人參丸。維度表:時間、用戶、商品、商家。事實表:250塊錢、一瓶
????????每一個事實表的行包括:具有可加性的數值型的度量值、與維表相連接的外鍵,通常具有兩個和兩個以上的外鍵。
事實表的特征:
- 非常的大
- 內容相對的窄:列數較少(主要是外鍵id和度量值)
- 經常發生變化,每天會新增加很多。
1)事務型事實表
????????以每個事務或事件為單位,例如一個銷售訂單記錄,一筆支付記錄等,作為事實表里的一行數據。一旦事務被提交,事實表數據被插入,數據就不再進行更改,其更新方式為增量更新。
2)周期型快照事實表
????????周期型快照事實表中不會保留所有數據,只保留固定時間間隔的數據,例如每天或者每月的銷售額,或每月的賬戶余額等。
????????例如購物車,有加減商品,隨時都有可能變化,但是我們更關心每天結束時這里面有多少商品,方便我們后期統計分析。
3)累積型快照事實表
????????累計快照事實表用于跟蹤業務事實的變化。例如,數據倉庫中可能需要累積或者存儲訂單從下訂單開始,到訂單商品被打包、運輸、和簽收的各個業務階段的時間點數據來跟蹤訂單聲明周期的進展情況。當這個業務過程進行時,事實表的記錄也要不斷更新。
訂單id | 用戶id | 下單時間 | 打包時間 | 發貨時間 | 簽收時間 | 訂單金額 |
3-8 | 3-8 | 3-9 | 3-10 |
2.4 維度模型分類
在維度建模的基礎上又分為三種模型:星型模型、雪花模型、星座模型。
2.5?數據倉庫建模(絕對重點)
2.5.1?ODS層
1)HDFS用戶行為數據
2)HDFS業務數據
3)針對HDFS上的用戶行為數據和業務數據,我們如何規劃處理?
(1)保持數據原貌不做任何修改,起到備份數據的作用。
(2)數據采用壓縮,減少磁盤存儲空間(例如:原始數據100G,可以壓縮到10G左右)
(3)創建分區表,防止后續的全表掃描
2.5.2 DIM層和DWD層
DIM層DWD層需構建維度模型,一般采用星型模型,呈現的狀態一般為星座模型。
維度建模一般按照以下四個步驟:
????????選擇業務過程→聲明粒度→確認維度→確認事實
(1)選擇業務過程
????????在業務系統中,挑選我們感興趣的業務線,比如下單業務,支付業務,退款業務,物流業務,一條業務線對應一張事實表。
(2)聲明粒度
數據粒度指數據倉庫的數據中保存數據的細化程度或綜合程度的級別。
聲明粒度意味著精確定義事實表中的一行數據表示什么,應該盡可能選擇最小粒度,以此來應各種各樣的需求。
典型的粒度聲明如下:
????????訂單事實表中一行數據表示的是一個訂單中的一個商品項。
????????支付事實表中一行數據表示的是一個支付記錄。
(3)確定維度
維度的主要作用是描述業務是事實,主要表示的是“誰,何處,何時”等信息。
確定維度的原則是:后續需求中是否要分析相關維度的指標。例如,需要統計,什么時間下的訂單多,哪個地區下的訂單多,哪個用戶下的訂單多。需要確定的維度就包括:時間維度、地區維度、用戶維度。
(4)確定事實
此處的“事實”一詞,指的是業務中的度量值(次數、個數、件數、金額,可以進行累加),例如訂單金額、下單次數等。
在DWD層,以業務過程為建模驅動,基于每個具體業務過程的特點,構建最細粒度的明細層事實表。事實表可做適當的寬表化處理。
事實表和維度表的關聯比較靈活,但是為了應對更復雜的業務需求,可以將能關聯上的表盡量關聯上。
時間 | 用戶 | 地區 | 商品 | 優惠券 | 活動 | 度量值 | |
訂單 | √ | √ | √ | 運費/優惠金額/原始金額/最終金額 | |||
訂單詳情 | √ | √ | √ | √ | √ | √ | 件數/優惠金額/原始金額/最終金額 |
支付 | √ | √ | √ | 支付金額 | |||
加購 | √ | √ | √ | 件數/金額 | |||
收藏 | √ | √ | √ | 次數 | |||
評價 | √ | √ | √ | 次數 | |||
退單 | √ | √ | √ | √ | 件數/金額 | ||
退款 | √ | √ | √ | √ | 件數/金額 | ||
優惠券領用 | √ | √ | √ | 次數 |
至此,數據倉庫的維度建模已經完畢,DWD層是以業務過程為驅動。DWS層、DWT層和ADS層都是以需求為驅動,和維度建模已經沒有關系了。DWS和DWT都是建寬表,按照主題去建表。主題相當于觀察問題的角度。對應著維度表。
2.5.3 DWS層與DWT層
DWS層和DWT層統稱寬表層,這兩層的設計思想大致相同,通過以下案例進行闡述。
1)問題引出:兩個需求,統計每個省份訂單的個數、統計每個省份訂單的總金額
2)處理辦法:都是將省份表和訂單表進行join,group?by省份,然后計算。同樣數據被計算了兩次,實際上類似的場景還會更多。
那怎么設計能避免重復計算呢?
????????針對上述場景,可以設計一張地區寬表,其主鍵為地區ID,字段包含為:下單次數、下單金額、支付次數、支付金額等。上述所有指標都統一進行計算,并將結果保存在該寬表中,這樣就能有效避免數據的重復計算。
3)總結:
(1)需要建哪些寬表:以維度為基準。
(2)寬表里面的字段:是站在不同維度的角度去看事實表,重點關注事實表聚合后的度量值。
(3)DWS和DWT層的區別:DWS層存放的所有主題對象當天的匯總行為,例如每個地區當天的下單次數,下單金額等,DWT層存放的是所有主題對象的累積行為,例如每個地區最近7天(15天、30天、60天)的下單次數、下單金額等。
2.5.4?ADS層
????????對電商系統各大主題指標分別進行分析。
第3章 數倉環境搭建
3.1 Hive環境搭建
3.1.1?Hive引擎簡介
Hive引擎包括:默認MR、tez、spark
Hive on Spark:Hive既作為存儲元數據又負責SQL的解析優化,語法是HQL語法,執行引擎變成了Spark,Spark負責采用RDD執行。
Spark on Hive: Hive只作為存儲元數據,Spark負責SQL解析優化,語法是Spark?SQL語法,Spark負責采用RDD執行。
3.1.2 Hive?on?Spark配置
1)兼容性說明
注意:官網下載的Hive3.1.2和Spark3.0.0默認是不兼容的。因為Hive3.1.2支持的Spark版本是2.4.5,所以需要我們重新編譯Hive3.1.2版本。
編譯步驟:官網下載Hive3.1.2源碼,修改pom文件中引用的Spark版本為3.0.0,如果編譯通過,直接打包獲取jar包。如果報錯,就根據提示,修改相關方法,直到不報錯,打包獲取jar包。
2)在Hive所在節點部署Spark
????????如果之前已經部署了Spark,則該步驟可以跳過,但要檢查SPARK_HOME的環境變量配置是否正確。
(1)Spark官網下載jar包地址:
Downloads | Apache Spark
(2)上傳并解壓解壓spark-3.0.0-bin-hadoop3.2.tgz
[seven@hadoop102 software]$ tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/module/
[seven@hadoop102 software]$ mv /opt/module/spark-3.0.0-bin-hadoop3.2 /opt/module/spark
(3)配置SPARK_HOME環境變量
[seven@hadoop102 software]$ sudo vim /etc/profile.d/my_env.sh
添加如下內容
#?SPARK_HOME
export?SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/bin
source 使其生效
[seven@hadoop102 software]$ source?/etc/profile.d/my_env.sh
3)在hive中創建spark配置文件
[seven@hadoop102 software]$ vim /opt/module/hive/conf/spark-defaults.conf
添加如下內容(在執行任務時,會根據如下參數執行)
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir ????????????????????hdfs://hadoop102:8020/spark-history
spark.executor.memory ???????????????? 1g
spark.driver.memory ??? 1g
在HDFS創建如下路徑,用于存儲歷史日志
[seven@hadoop102 software]$ hadoop fs -mkdir /spark-history
4)向HDFS上傳Spark純凈版jar包
說明1:由于Spark3.0.0非純凈版默認支持的是hive2.3.7版本,直接使用會和安裝的Hive3.1.2出現兼容性問題。所以采用Spark純凈版jar包,不包含hadoop和hive相關依賴,避免沖突。
說明2:Hive任務最終由Spark來執行,Spark任務資源分配由Yarn來調度,該任務有可能被分配到集群的任何一個節點。所以需要將Spark的依賴上傳到HDFS集群路徑,這樣集群中任何一個節點都能獲取到。
(1)上傳并解壓spark-3.0.0-bin-without-hadoop.tgz
[seven@hadoop102 software]$ tar -zxvf /opt/software/spark-3.0.0-bin-without-hadoop.tgz
(2)上傳Spark純凈版jar包到HDFS
[seven@hadoop102 software]$ hadoop fs -mkdir?/spark-jars
[seven@hadoop102 software]$ hadoop fs -put spark-3.0.0-bin-without-hadoop/jars/* /spark-jars
5)修改hive-site.xml文件
[seven@hadoop102 ~]$ vim /opt/module/hive/conf/hive-site.xml
添加如下內容
<!--Spark依賴位置(注意:端口號8020必須和namenode的端口號一致)-->
<property><name>spark.yarn.jars</name><value>hdfs://hadoop102:8020/spark-jars/*</value>
</property>??<!--Hive執行引擎-->
<property><name>hive.execution.engine</name><value>spark</value>
</property>
3.1.3 Hive on Spark測試
(1)啟動hive客戶端
[seven@hadoop102 hive]$ bin/hive
(2)創建一張測試表
hive (default)> create table student(id int, name string);
(3)通過insert測試效果
hive (default)> insert into table student values(1,'abc');
若結果如下,則說明配置成功
3.2 Yarn配置
3.2.1 增加ApplicationMaster資源比例
????????容量調度器對每個資源隊列中同時運行的Application?Master占用的資源進行了限制,該限制通過yarn.scheduler.capacity.maximum-am-resource-percent參數實現,其默認值是0.1,表示每個資源隊列上Application?Master最多可使用的資源為該隊列總資源的10%,目的是防止大部分資源都被Application?Master占用,而導致Map/Reduce?Task無法執行。
????????生產環境該參數可使用默認值。但學習環境,集群資源總數很少,如果只分配10%的資源給Application?Master,則可能出現,同一時刻只能運行一個Job的情況,因為一個Application?Master使用的資源就可能已經達到10%的上限了。故此處可將該值適當調大。
(1)在hadoop102的/opt/module/hadoop-3.1.3/etc/hadoop/capacity-scheduler.xml文件中修改如下參數值
[seven@hadoop102 hadoop]$ vim capacity-scheduler.xml
<property><name>yarn.scheduler.capacity.maximum-am-resource-percent</name><value>0.8</value>
</property
(2)分發capacity-scheduler.xml配置文件
[seven@hadoop102 hadoop]$ xsync capacity-scheduler.xml
(3)關閉正在運行的任務,重新啟動yarn集群
[seven@hadoop103 hadoop-3.1.3]$ sbin/stop-yarn.sh
[seven@hadoop103 hadoop-3.1.3]$ sbin/start-yarn.sh
3.3 數倉開發環境
數倉開發工具可選用DBeaver或者DataGrip。兩者都需要用到JDBC協議連接到Hive,故需要啟動HiveServer2。
1.啟動HiveServer2
[seven@hadoop102 hive]$ hiveserver2
2.配置DataGrip連接
1)創建連接
2)配置連接屬性
所有屬性配置,和Hive的beeline客戶端配置一致即可。初次使用,配置過程會提示缺少JDBC驅動,按照提示下載即可。
3.測試使用
創建數據庫gmall,并觀察是否創建成功。
1)創建數據庫
2)查看數據庫
3)修改連接,指明連接數據庫
4)選擇當前數據庫為gmall
3.4 數據準備
一般企業在搭建數倉時,業務系統中會存在一定的歷史數據,此處為模擬真實場景,需準備若干歷史數據。假定數倉上線的日期為2020-06-14,具體說明如下。?
1.用戶行為日志
用戶行為日志,一般是沒有歷史數據的,故日志只需要準備2020-06-14一天的數據。具體操作如下:
1)啟動日志采集通道,包括Flume、Kafak等
2)修改兩個日志服務器(hadoop102、hadoop103)中的/opt/module/applog/application.yml配置文件,將mock.date參數改為2020-06-14。
3)執行日志生成腳本lg.sh。
4)觀察HDFS是否出現相應文件
2.業務數據
業務數據一般存在歷史數據,此處需準備2020-06-10至2020-06-14的數據。具體操作如下。
1)修改hadoop102節點上的/opt/module/db_log/application.properties文件,將mock.date、mock.clear,mock.clear.user三個參數調整為如圖所示的值。
????????????????
2)執行模擬生成業務數據的命令,生成第一天2020-06-10的歷史數據。
[seven@hadoop102 db_log]$ java -jar gmall2020-mock-db-2021-01-22.jar
3)修改/opt/module/db_log/application.properties文件,將mock.date、mock.clear,mock.clear.user三個參數調整為如圖所示的值。
????????????????
4)執行模擬生成業務數據的命令,生成第二天2020-06-11的歷史數據。
????????[seven@hadoop102 db_log]$ java -jar gmall2020-mock-db-2021-01-22.jar
5)之后只修改/opt/module/db_log/application.properties文件中的mock.date參數,依次改為2020-06-12,2020-06-13,2020-06-14,并分別生成對應日期的數據。
6)執行mysql_to_hdfs_init.sh腳本,將模擬生成的業務數據同步到HDFS。
????????[seven@hadoop102 bin]$ mysql_to_hdfs_init.sh all 2020-06-14
7)觀察HDFS上是否出現相應的數據
第4章 數倉搭建-ODS層
1)保持數據原貌不做任何修改,起到備份數據的作用。
2)數據采用LZO壓縮,減少磁盤存儲空間。100G數據可以壓縮到10G以內。
3)創建分區表,防止后續的全表掃描,在企業開發中大量使用分區表。
4)創建外部表。在企業開發中,除了自己用的臨時表,創建內部表外,絕大多數場景都是創建外部表。
4.1?ODS層(用戶行為數據)
4.1.1?創建日志表ods_log
1)創建支持lzo壓縮的分區表
(1)建表語句
hive (gmall)>
drop table if exists ods_log;
CREATE EXTERNAL TABLE ods_log (`line` string)
PARTITIONED BY (`dt` string) -- 按照時間創建分區
STORED AS -- 指定存儲方式,讀數據采用LzoTextInputFormat;INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_log' -- 指定數據在hdfs上的存儲位置
;
說明Hive的LZO壓縮:LanguageManual LZO - Apache Hive - Apache Software Foundation
(2)分區規劃
2)加載數據
hive (gmall)>
load data inpath '/origin_data/gmall/log/topic_log/2020-06-14' into table ods_log partition(dt='2020-06-14');
注意:時間格式都配置成YYYY-MM-DD格式,這是Hive默認支持的時間格式
3)為lzo壓縮文件創建索引
[seven@hadoop102 bin]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_log/dt=2020-06-14
4.1.2 Shell中單引號和雙引號區別
1)在/home/seven/bin創建一個test.sh文件
[seven@hadoop102 bin]$ vim test.sh
在文件中添加如下內容
#!/bin/bash
do_date=$1echo '$do_date'
echo "$do_date"
echo "'$do_date'"
echo '"$do_date"'
echo `date`
2)查看執行結果
[seven@hadoop102 bin]$ test.sh 2020-06-14
$do_date
2020-06-14
'2020-06-14'
"$do_date"
2020年 06月 18日 星期四 21:02:08 CST
3)總結:
(1)單引號不取變量值
(2)雙引號取變量值
(3)反引號`,執行引號中命令
(4)雙引號內部嵌套單引號,取出變量值
(5)單引號內部嵌套雙引號,不取出變量值
4.1.3 ODS層日志表加載數據腳本
1)編寫腳本
(1)在hadoop102的/home/seven/bin目錄下創建腳本
[seven@hadoop102 bin]$ vim hdfs_to_ods_log.sh
在腳本中編寫如下內容
#!/bin/bash# 定義變量方便修改
APP=gmall# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$1" ] ;thendo_date=$1
else do_date=`date -d "-1 day" +%F`
fi echo ================== 日志日期為 $do_date ==================
sql="
load data inpath '/origin_data/$APP/log/topic_log/$do_date' into table ${APP}.ods_log partition(dt='$do_date');
"hive -e "$sql"hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/$APP/ods/ods_log/dt=$do_date
(1)說明1:
[ -n 變量值?]?判斷變量的值,是否為空
--?變量的值,非空,返回true
--?變量的值,為空,返回false
注意:[?-n 變量值 ]不會解析數據,使用[?-n 變量值?]時,需要對變量加上雙引號(" ")
(2)說明2:
查看date命令的使用,date --help
(2)增加腳本執行權限
[seven@hadoop102 bin]$ chmod 777 hdfs_to_ods_log.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 module]$ hdfs_to_ods_log.sh 2020-06-14
(2)查看導入數據
4.2 ODS層(業務數據)
ODS層業務表分區規劃如下:
ODS層業務表數據裝載思路如下:
4.2.1?活動信息表
DROP TABLE IF EXISTS ods_activity_info;
CREATE EXTERNAL TABLE ods_activity_info(`id` STRING COMMENT '編號',`activity_name` STRING COMMENT '活動名稱',`activity_type` STRING COMMENT '活動類型',`start_time` STRING COMMENT '開始時間',`end_time` STRING COMMENT '結束時間',`create_time` STRING COMMENT '創建時間'
) COMMENT '活動信息表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_info/';
4.2.2?活動規則表
DROP TABLE IF EXISTS ods_activity_rule;
CREATE EXTERNAL TABLE ods_activity_rule(`id` STRING COMMENT '編號',`activity_id` STRING COMMENT '活動ID',`activity_type` STRING COMMENT '活動類型',`condition_amount` DECIMAL(16,2) COMMENT '滿減金額',`condition_num` BIGINT COMMENT '滿減件數',`benefit_amount` DECIMAL(16,2) COMMENT '優惠金額',`benefit_discount` DECIMAL(16,2) COMMENT '優惠折扣',`benefit_level` STRING COMMENT '優惠級別'
) COMMENT '活動規則表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_rule/';
4.2.3 一級品類表
DROP TABLE IF EXISTS ods_base_category1;
CREATE EXTERNAL TABLE ods_base_category1(`id` STRING COMMENT 'id',`name` STRING COMMENT '名稱'
) COMMENT '商品一級分類表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category1/';
4.2.4?二級品類表
DROP TABLE IF EXISTS ods_base_category2;
CREATE EXTERNAL TABLE ods_base_category2(`id` STRING COMMENT ' id',`name` STRING COMMENT '名稱',`category1_id` STRING COMMENT '一級品類id'
) COMMENT '商品二級分類表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category2/';
4.2.5?三級品類表
DROP TABLE IF EXISTS ods_base_category3;
CREATE EXTERNAL TABLE ods_base_category3(`id` STRING COMMENT ' id',`name` STRING COMMENT '名稱',`category2_id` STRING COMMENT '二級品類id'
) COMMENT '商品三級分類表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category3/';
4.2.6?編碼字典表
DROP TABLE IF EXISTS ods_base_dic;
CREATE EXTERNAL TABLE ods_base_dic(`dic_code` STRING COMMENT '編號',`dic_name` STRING COMMENT '編碼名稱',`parent_code` STRING COMMENT '父編碼',`create_time` STRING COMMENT '創建日期',`operate_time` STRING COMMENT '操作日期'
) COMMENT '編碼字典表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_dic/';
4.2.7?省份表
DROP TABLE IF EXISTS ods_base_province;
CREATE EXTERNAL TABLE ods_base_province (`id` STRING COMMENT '編號',`name` STRING COMMENT '省份名稱',`region_id` STRING COMMENT '地區ID',`area_code` STRING COMMENT '地區編碼',`iso_code` STRING COMMENT 'ISO-3166編碼,供可視化使用',`iso_3166_2` STRING COMMENT 'IOS-3166-2編碼,供可視化使用'
) ?COMMENT '省份表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_province/';
4.2.8?地區表
DROP TABLE IF EXISTS ods_base_region;
CREATE EXTERNAL TABLE ods_base_region (`id` STRING COMMENT '編號',`region_name` STRING COMMENT '地區名稱'
) ?COMMENT '地區表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_region/';
4.2.9?品牌表
DROP TABLE IF EXISTS ods_base_trademark;
CREATE EXTERNAL TABLE ods_base_trademark (`id` STRING COMMENT '編號',`tm_name` STRING COMMENT '品牌名稱'
) ?COMMENT '品牌表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_trademark/';
4.2.10?購物車表
DROP TABLE IF EXISTS ods_cart_info;
CREATE EXTERNAL TABLE ods_cart_info(`id` STRING COMMENT '編號',`user_id` STRING COMMENT '用戶id',`sku_id` STRING COMMENT 'skuid',`cart_price` DECIMAL(16,2) ?COMMENT '放入購物車時價格',`sku_num` BIGINT COMMENT '數量',`sku_name` STRING COMMENT 'sku名稱 (冗余)',`create_time` STRING COMMENT '創建時間',`operate_time` STRING COMMENT '修改時間',`is_ordered` STRING COMMENT '是否已經下單',`order_time` STRING COMMENT '下單時間',`source_type` STRING COMMENT '來源類型',`source_id` STRING COMMENT '來源編號'
) COMMENT '加購表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_cart_info/';
4.2.11?評論表
DROP TABLE IF EXISTS ods_comment_info;
CREATE EXTERNAL TABLE ods_comment_info(`id` STRING COMMENT '編號',`user_id` STRING COMMENT '用戶ID',`sku_id` STRING COMMENT '商品sku',`spu_id` STRING COMMENT '商品spu',`order_id` STRING COMMENT '訂單ID',`appraise` STRING COMMENT '評價',`create_time` STRING COMMENT '評價時間'
) COMMENT '商品評論表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_comment_info/';
4.2.12?優惠券信息表
DROP TABLE IF EXISTS ods_coupon_info;
CREATE EXTERNAL TABLE ods_coupon_info(`id` STRING COMMENT '購物券編號',`coupon_name` STRING COMMENT '購物券名稱',`coupon_type` STRING COMMENT '購物券類型 1 現金券 2 折扣券 3 滿減券 4 滿件打折券',`condition_amount` DECIMAL(16,2) COMMENT '滿額數',`condition_num` BIGINT COMMENT '滿件數',`activity_id` STRING COMMENT '活動編號',`benefit_amount` DECIMAL(16,2) COMMENT '減金額',`benefit_discount` DECIMAL(16,2) COMMENT '折扣',`create_time` STRING COMMENT '創建時間',`range_type` STRING COMMENT '范圍類型 1、商品 2、品類 3、品牌',`limit_num` BIGINT COMMENT '最多領用次數',`taken_count` BIGINT COMMENT '已領用次數',`start_time` STRING COMMENT '開始領取時間',`end_time` STRING COMMENT '結束領取時間',`operate_time` STRING COMMENT '修改時間',`expire_time` STRING COMMENT '過期時間'
) COMMENT '優惠券表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_info/';
4.2.13?優惠券領用表
DROP TABLE IF EXISTS ods_coupon_use;
CREATE EXTERNAL TABLE ods_coupon_use(`id` STRING COMMENT '編號',`coupon_id` STRING ?COMMENT '優惠券ID',`user_id` STRING ?COMMENT 'skuid',`order_id` STRING ?COMMENT 'spuid',`coupon_status` STRING ?COMMENT '優惠券狀態',`get_time` STRING ?COMMENT '領取時間',`using_time` STRING ?COMMENT '使用時間(下單)',`used_time` STRING ?COMMENT '使用時間(支付)',`expire_time` STRING COMMENT '過期時間'
) COMMENT '優惠券領用表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_use/';
4.2.14 收藏表
DROP TABLE IF EXISTS ods_favor_info;
CREATE EXTERNAL TABLE ods_favor_info(`id` STRING COMMENT '編號',`user_id` STRING COMMENT '用戶id',`sku_id` STRING COMMENT 'skuid',`spu_id` STRING COMMENT 'spuid',`is_cancel` STRING COMMENT '是否取消',`create_time` STRING COMMENT '收藏時間',`cancel_time` STRING COMMENT '取消時間'
) COMMENT '商品收藏表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_favor_info/';
4.2.15?訂單明細表
DROP TABLE IF EXISTS ods_order_detail;
CREATE EXTERNAL TABLE ods_order_detail(`id` STRING COMMENT '編號',`order_id` STRING ?COMMENT '訂單號',?`sku_id` STRING COMMENT '商品id',`sku_name` STRING COMMENT '商品名稱',`order_price` DECIMAL(16,2) COMMENT '商品價格',`sku_num` BIGINT COMMENT '商品數量',`create_time` STRING COMMENT '創建時間',`source_type` STRING COMMENT '來源類型',`source_id` STRING COMMENT '來源編號',`split_final_amount` DECIMAL(16,2) COMMENT '分攤最終金額',`split_activity_amount` DECIMAL(16,2) COMMENT '分攤活動優惠',`split_coupon_amount` DECIMAL(16,2) COMMENT '分攤優惠券優惠'
) COMMENT '訂單詳情表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail/';
4.2.16?訂單明細活動關聯表
DROP TABLE IF EXISTS ods_order_detail_activity;
CREATE EXTERNAL TABLE ods_order_detail_activity(`id` STRING COMMENT '編號',`order_id` STRING ?COMMENT '訂單號',`order_detail_id` STRING COMMENT '訂單明細id',`activity_id` STRING COMMENT '活動id',`activity_rule_id` STRING COMMENT '活動規則id',`sku_id` BIGINT COMMENT '商品id',`create_time` STRING COMMENT '創建時間'
) COMMENT '訂單詳情活動關聯表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_activity/';
4.2.17?訂單明細優惠券關聯表
DROP TABLE IF EXISTS ods_order_detail_coupon;
CREATE EXTERNAL TABLE ods_order_detail_coupon(`id` STRING COMMENT '編號',`order_id` STRING ?COMMENT '訂單號',`order_detail_id` STRING COMMENT '訂單明細id',`coupon_id` STRING COMMENT '優惠券id',`coupon_use_id` STRING COMMENT '優惠券領用記錄id',`sku_id` STRING COMMENT '商品id',`create_time` STRING COMMENT '創建時間'
) COMMENT '訂單詳情活動關聯表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_coupon/';
4.2.18?訂單表
DROP TABLE IF EXISTS ods_order_info;
CREATE EXTERNAL TABLE ods_order_info (`id` STRING COMMENT '訂單號',`final_amount` DECIMAL(16,2) COMMENT '訂單最終金額',`order_status` STRING COMMENT '訂單狀態',`user_id` STRING COMMENT '用戶id',`payment_way` STRING COMMENT '支付方式',`delivery_address` STRING COMMENT '送貨地址',`out_trade_no` STRING COMMENT '支付流水號',`create_time` STRING COMMENT '創建時間',`operate_time` STRING COMMENT '操作時間',`expire_time` STRING COMMENT '過期時間',`tracking_no` STRING COMMENT '物流單編號',`province_id` STRING COMMENT '省份ID',`activity_reduce_amount` DECIMAL(16,2) COMMENT '活動減免金額',`coupon_reduce_amount` DECIMAL(16,2) COMMENT '優惠券減免金額',`original_amount` DECIMAL(16,2) ?COMMENT '訂單原價金額',`feight_fee` DECIMAL(16,2) ?COMMENT '運費',`feight_fee_reduce` DECIMAL(16,2) ?COMMENT '運費減免'
) COMMENT '訂單表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_info/';
4.2.19?退單表
DROP TABLE IF EXISTS ods_order_refund_info;
CREATE EXTERNAL TABLE ods_order_refund_info(`id` STRING COMMENT '編號',`user_id` STRING COMMENT '用戶ID',`order_id` STRING COMMENT '訂單ID',`sku_id` STRING COMMENT '商品ID',`refund_type` STRING COMMENT '退單類型',`refund_num` BIGINT COMMENT '退單件數',`refund_amount` DECIMAL(16,2) COMMENT '退單金額',`refund_reason_type` STRING COMMENT '退單原因類型',`refund_status` STRING COMMENT '退單狀態',--退單狀態應包含買家申請、賣家審核、賣家收貨、退款完成等狀態。此處未涉及到,故該表按增量處理`create_time` STRING COMMENT '退單時間'
) COMMENT '退單表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_refund_info/';
4.2.20?訂單狀態日志表
DROP TABLE IF EXISTS ods_order_status_log;
CREATE EXTERNAL TABLE ods_order_status_log (`id` STRING COMMENT '編號',`order_id` STRING COMMENT '訂單ID',`order_status` STRING COMMENT '訂單狀態',`operate_time` STRING COMMENT '修改時間'
) ?COMMENT '訂單狀態表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_status_log/';
4.2.21?支付表
DROP TABLE IF EXISTS ods_payment_info;
CREATE EXTERNAL TABLE ods_payment_info(`id` STRING COMMENT '編號',`out_trade_no` STRING COMMENT '對外業務編號',`order_id` STRING COMMENT '訂單編號',`user_id` STRING COMMENT '用戶編號',`payment_type` STRING COMMENT '支付類型',`trade_no` STRING COMMENT '交易編號',`payment_amount` DECIMAL(16,2) COMMENT '支付金額',`subject` STRING COMMENT '交易內容',`payment_status` STRING COMMENT '支付狀態',`create_time` STRING COMMENT '創建時間',`callback_time` STRING COMMENT '回調時間'
) ?COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_payment_info/';
4.2.22?退款表
DROP TABLE IF EXISTS ods_refund_payment;
CREATE EXTERNAL TABLE ods_refund_payment(`id` STRING COMMENT '編號',`out_trade_no` STRING COMMENT '對外業務編號',`order_id` STRING COMMENT '訂單編號',`sku_id` STRING COMMENT 'SKU編號',`payment_type` STRING COMMENT '支付類型',`trade_no` STRING COMMENT '交易編號',`refund_amount` DECIMAL(16,2) COMMENT '支付金額',`subject` STRING COMMENT '交易內容',`refund_status` STRING COMMENT '支付狀態',`create_time` STRING COMMENT '創建時間',`callback_time` STRING COMMENT '回調時間'
) ?COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_refund_payment/';
4.2.23?商品平臺屬性表
DROP TABLE IF EXISTS ods_sku_attr_value;
CREATE EXTERNAL TABLE ods_sku_attr_value(`id` STRING COMMENT '編號',`attr_id` STRING COMMENT '平臺屬性ID',`value_id` STRING COMMENT '平臺屬性值ID',`sku_id` STRING COMMENT '商品ID',`attr_name` STRING COMMENT '平臺屬性名稱',`value_name` STRING COMMENT '平臺屬性值名稱'
) COMMENT 'sku平臺屬性表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_attr_value/';
4.2.24?商品(SKU)表
DROP TABLE IF EXISTS ods_sku_info;
CREATE EXTERNAL TABLE ods_sku_info(`id` STRING COMMENT 'skuId',`spu_id` STRING COMMENT 'spuid',`price` DECIMAL(16,2) COMMENT '價格',`sku_name` STRING COMMENT '商品名稱',`sku_desc` STRING COMMENT '商品描述',`weight` DECIMAL(16,2) COMMENT '重量',`tm_id` STRING COMMENT '品牌id',`category3_id` STRING COMMENT '品類id',`is_sale` STRING COMMENT '是否在售',`create_time` STRING COMMENT '創建時間'
) COMMENT 'SKU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_info/';
4.2.25?商品銷售屬性表
DROP TABLE IF EXISTS ods_sku_sale_attr_value;
CREATE EXTERNAL TABLE ods_sku_sale_attr_value(`id` STRING COMMENT '編號',`sku_id` STRING COMMENT 'sku_id',`spu_id` STRING COMMENT 'spu_id',`sale_attr_value_id` STRING COMMENT '銷售屬性值id',`sale_attr_id` STRING COMMENT '銷售屬性id',`sale_attr_name` STRING COMMENT '銷售屬性名稱',`sale_attr_value_name` STRING COMMENT '銷售屬性值名稱'
) COMMENT 'sku銷售屬性名稱'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_sale_attr_value/';
4.2.26?商品(SPU)表
DROP TABLE IF EXISTS ods_spu_info;
CREATE EXTERNAL TABLE ods_spu_info(`id` STRING COMMENT 'spuid',`spu_name` STRING COMMENT 'spu名稱',`category3_id` STRING COMMENT '品類id',`tm_id` STRING COMMENT '品牌id'
) COMMENT 'SPU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_spu_info/';
4.2.27?用戶表
DROP TABLE IF EXISTS ods_user_info;
CREATE EXTERNAL TABLE ods_user_info(`id` STRING COMMENT '用戶id',`login_name` STRING COMMENT '用戶名稱',`nick_name` STRING COMMENT '用戶昵稱',`name` STRING COMMENT '用戶姓名',`phone_num` STRING COMMENT '手機號碼',`email` STRING COMMENT '郵箱',`user_level` STRING COMMENT '用戶等級',`birthday` STRING COMMENT '生日',`gender` STRING COMMENT '性別',`create_time` STRING COMMENT '創建時間',`operate_time` STRING COMMENT '操作時間'
) COMMENT '用戶表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_user_info/';
4.2.28?ODS層業務表首日數據裝載腳本
1)編寫腳本
(1)在/home/seven/bin目錄下創建腳本hdfs_to_ods_db_init.sh
[seven@hadoop102 bin]$ vim hdfs_to_ods_db_init.sh
在腳本中填寫如下內容
#!/bin/bashAPP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "請傳入日期參數"exit
fi ods_order_info="
load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"ods_order_detail="
load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"ods_sku_info="
load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"ods_user_info="
load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"ods_payment_info="
load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"ods_base_category1="
load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"ods_base_category2="
load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"ods_base_category3="
load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "ods_base_trademark="
load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "ods_activity_info="
load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "ods_cart_info="
load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "ods_comment_info="
load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "ods_coupon_info="
load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "ods_coupon_use="
load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "ods_favor_info="
load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "ods_order_refund_info="
load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "ods_order_status_log="
load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "ods_spu_info="
load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "ods_activity_rule="
load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" ods_base_dic="
load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "ods_order_detail_activity="
load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "ods_order_detail_coupon="
load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "ods_refund_payment="
load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "ods_sku_attr_value="
load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "ods_sku_sale_attr_value="
load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "ods_base_province="
load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"ods_base_region="
load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"case $1 in"ods_order_info"){hive -e "$ods_order_info"};;"ods_order_detail"){hive -e "$ods_order_detail"};;"ods_sku_info"){hive -e "$ods_sku_info"};;"ods_user_info"){hive -e "$ods_user_info"};;"ods_payment_info"){hive -e "$ods_payment_info"};;"ods_base_category1"){hive -e "$ods_base_category1"};;"ods_base_category2"){hive -e "$ods_base_category2"};;"ods_base_category3"){hive -e "$ods_base_category3"};;"ods_base_trademark"){hive -e "$ods_base_trademark"};;"ods_activity_info"){hive -e "$ods_activity_info"};;"ods_cart_info"){hive -e "$ods_cart_info"};;"ods_comment_info"){hive -e "$ods_comment_info"};;"ods_coupon_info"){hive -e "$ods_coupon_info"};;"ods_coupon_use"){hive -e "$ods_coupon_use"};;"ods_favor_info"){hive -e "$ods_favor_info"};;"ods_order_refund_info"){hive -e "$ods_order_refund_info"};;"ods_order_status_log"){hive -e "$ods_order_status_log"};;"ods_spu_info"){hive -e "$ods_spu_info"};;"ods_activity_rule"){hive -e "$ods_activity_rule"};;"ods_base_dic"){hive -e "$ods_base_dic"};;"ods_order_detail_activity"){hive -e "$ods_order_detail_activity"};;"ods_order_detail_coupon"){hive -e "$ods_order_detail_coupon"};;"ods_refund_payment"){hive -e "$ods_refund_payment"};;"ods_sku_attr_value"){hive -e "$ods_sku_attr_value"};;"ods_sku_sale_attr_value"){hive -e "$ods_sku_sale_attr_value"};;"ods_base_province"){hive -e "$ods_base_province"};;"ods_base_region"){hive -e "$ods_base_region"};;"all"){hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value$ods_base_province$ods_base_region"};;
esac
(2)增加執行權限
[seven@hadoop102 bin]$ chmod +x hdfs_to_ods_db_init.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 bin]$ hdfs_to_ods_db_init.sh?all 2020-06-14
(2)查看數據是否導入成功
4.2.29 ODS層業務表每日數據裝載腳本
1)編寫腳本
(1)在/home/seven/bin目錄下創建腳本hdfs_to_ods_db.sh
[seven@hadoop102 bin]$ vim hdfs_to_ods_db.sh
在腳本中填寫如下內容
#!/bin/bashAPP=gmall# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fiods_order_info="
load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"ods_order_detail="
load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"ods_sku_info="
load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"ods_user_info="
load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"ods_payment_info="
load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"ods_base_category1="
load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"ods_base_category2="
load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"ods_base_category3="
load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "ods_base_trademark="
load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "ods_activity_info="
load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "ods_cart_info="
load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "ods_comment_info="
load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "ods_coupon_info="
load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "ods_coupon_use="
load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "ods_favor_info="
load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "ods_order_refund_info="
load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "ods_order_status_log="
load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "ods_spu_info="
load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "ods_activity_rule="
load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" ods_base_dic="
load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "ods_order_detail_activity="
load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "ods_order_detail_coupon="
load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "ods_refund_payment="
load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "ods_sku_attr_value="
load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "ods_sku_sale_attr_value="
load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "ods_base_province="
load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"ods_base_region="
load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"case $1 in"ods_order_info"){hive -e "$ods_order_info"};;"ods_order_detail"){hive -e "$ods_order_detail"};;"ods_sku_info"){hive -e "$ods_sku_info"};;"ods_user_info"){hive -e "$ods_user_info"};;"ods_payment_info"){hive -e "$ods_payment_info"};;"ods_base_category1"){hive -e "$ods_base_category1"};;"ods_base_category2"){hive -e "$ods_base_category2"};;"ods_base_category3"){hive -e "$ods_base_category3"};;"ods_base_trademark"){hive -e "$ods_base_trademark"};;"ods_activity_info"){hive -e "$ods_activity_info"};;"ods_cart_info"){hive -e "$ods_cart_info"};;"ods_comment_info"){hive -e "$ods_comment_info"};;"ods_coupon_info"){hive -e "$ods_coupon_info"};;"ods_coupon_use"){hive -e "$ods_coupon_use"};;"ods_favor_info"){hive -e "$ods_favor_info"};;"ods_order_refund_info"){hive -e "$ods_order_refund_info"};;"ods_order_status_log"){hive -e "$ods_order_status_log"};;"ods_spu_info"){hive -e "$ods_spu_info"};;"ods_activity_rule"){hive -e "$ods_activity_rule"};;"ods_base_dic"){hive -e "$ods_base_dic"};;"ods_order_detail_activity"){hive -e "$ods_order_detail_activity"};;"ods_order_detail_coupon"){hive -e "$ods_order_detail_coupon"};;"ods_refund_payment"){hive -e "$ods_refund_payment"};;"ods_sku_attr_value"){hive -e "$ods_sku_attr_value"};;"ods_sku_sale_attr_value"){hive -e "$ods_sku_sale_attr_value"};;"all"){hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value"};;
esac
(2)修改權限
[seven@hadoop102 bin]$ chmod +x?hdfs_to_ods_db.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 bin]$ hdfs_to_ods_db.sh?all 2020-06-14
(2)查看數據是否導入成功
第5章 數倉搭建-DIM層
5.1 商品維度表(全量)
1.建表語句
DROP TABLE IF EXISTS dim_sku_info;
CREATE EXTERNAL TABLE dim_sku_info (`id` STRING COMMENT '商品id',`price` DECIMAL(16,2) COMMENT '商品價格',`sku_name` STRING COMMENT '商品名稱',`sku_desc` STRING COMMENT '商品描述',`weight` DECIMAL(16,2) COMMENT '重量',`is_sale` BOOLEAN COMMENT '是否在售',`spu_id` STRING COMMENT 'spu編號',`spu_name` STRING COMMENT 'spu名稱',`category3_id` STRING COMMENT '三級分類id',`category3_name` STRING COMMENT '三級分類名稱',`category2_id` STRING COMMENT '二級分類id',`category2_name` STRING COMMENT '二級分類名稱',`category1_id` STRING COMMENT '一級分類id',`category1_name` STRING COMMENT '一級分類名稱',`tm_id` STRING COMMENT '品牌id',`tm_name` STRING COMMENT '品牌名稱',`sku_attr_values` ARRAY<STRUCT<attr_id:STRING,value_id:STRING,attr_name:STRING,value_name:STRING>> COMMENT '平臺屬性',`sku_sale_attr_values` ARRAY<STRUCT<sale_attr_id:STRING,sale_attr_value_id:STRING,sale_attr_name:STRING,sale_attr_value_name:STRING>> COMMENT '銷售屬性',`create_time` STRING COMMENT '創建時間'
) COMMENT '商品維度表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_sku_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2.分區規劃
3.數據裝載
1)Hive讀取索引文件問題
(1)兩種方式,分別查詢數據有多少行
hive (gmall)> select * from ods_log;
Time taken: 0.706 seconds, Fetched: 2955 row(s)
hive (gmall)> select count(*) from ods_log;
2959
(2)兩次查詢結果不一致。
????????原因是select * from ods_log不執行MR操作,直接采用的是ods_log建表語句中指定的DeprecatedLzoTextInputFormat,能夠識別lzo.index為索引文件。
????????select count(*) from ods_log執行MR操作,會先經過hive.input.format,其默認值為CombineHiveInputFormat,其會先將索引文件當成小文件合并,將其當做普通文件處理。更嚴重的是,這會導致LZO文件無法切片。
hive (gmall)> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
解決辦法:修改CombineHiveInputFormat為HiveInputFormat
????????hive (gmall)> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
2)首日裝載
with
sku as
(selectid,price,sku_name,sku_desc,weight,is_sale,spu_id,category3_id,tm_id,create_timefrom ods_sku_infowhere dt='2020-06-14'
),
spu as
(selectid,spu_namefrom ods_spu_infowhere dt='2020-06-14'
),
c3 as
(selectid,name,category2_idfrom ods_base_category3where dt='2020-06-14'
),
c2 as
(selectid,name,category1_idfrom ods_base_category2where dt='2020-06-14'
),
c1 as
(selectid,namefrom ods_base_category1where dt='2020-06-14'
),
tm as
(selectid,tm_namefrom ods_base_trademarkwhere dt='2020-06-14'
),
attr as
(selectsku_id,collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrsfrom ods_sku_attr_valuewhere dt='2020-06-14'group by sku_id
),
sale_attr as
(selectsku_id,collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrsfrom ods_sku_sale_attr_valuewhere dt='2020-06-14'group by sku_id
)
insert overwrite table dim_sku_info partition(dt='2020-06-14')
selectsku.id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.is_sale,sku.spu_id,spu.spu_name,sku.category3_id,c3.name,c3.category2_id,c2.name,c2.category1_id,c1.name,sku.tm_id,tm.tm_name,attr.attrs,sale_attr.sale_attrs,sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
3)每日裝載
with
sku as
(selectid,price,sku_name,sku_desc,weight,is_sale,spu_id,category3_id,tm_id,create_timefrom ods_sku_infowhere dt='2020-06-15'
),
spu as
(selectid,spu_namefrom ods_spu_infowhere dt='2020-06-15'
),
c3 as
(selectid,name,category2_idfrom ods_base_category3where dt='2020-06-15'
),
c2 as
(selectid,name,category1_idfrom ods_base_category2where dt='2020-06-15'
),
c1 as
(selectid,namefrom ods_base_category1where dt='2020-06-15'
),
tm as
(selectid,tm_namefrom ods_base_trademarkwhere dt='2020-06-15'
),
attr as
(selectsku_id,collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrsfrom ods_sku_attr_valuewhere dt='2020-06-15'group by sku_id
),
sale_attr as
(selectsku_id,collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrsfrom ods_sku_sale_attr_valuewhere dt='2020-06-15'group by sku_id
)
insert overwrite table dim_sku_info partition(dt='2020-06-15')
selectsku.id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.is_sale,sku.spu_id,spu.spu_name,sku.category3_id,c3.name,c3.category2_id,c2.name,c2.category1_id,c1.name,sku.tm_id,tm.tm_name,attr.attrs,sale_attr.sale_attrs,sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
5.2?優惠券維度表(全量)
1.建表語句
DROP TABLE IF EXISTS dim_coupon_info;
CREATE EXTERNAL TABLE dim_coupon_info(`id` STRING COMMENT '購物券編號',`coupon_name` STRING COMMENT '購物券名稱',`coupon_type` STRING COMMENT '購物券類型 1 現金券 2 折扣券 3 滿減券 4 滿件打折券',`condition_amount` DECIMAL(16,2) COMMENT '滿額數',`condition_num` BIGINT COMMENT '滿件數',`activity_id` STRING COMMENT '活動編號',`benefit_amount` DECIMAL(16,2) COMMENT '減金額',`benefit_discount` DECIMAL(16,2) COMMENT '折扣',`create_time` STRING COMMENT '創建時間',`range_type` STRING COMMENT '范圍類型 1、商品 2、品類 3、品牌',`limit_num` BIGINT COMMENT '最多領取次數',`taken_count` BIGINT COMMENT '已領取次數',`start_time` STRING COMMENT '可以領取的開始日期',`end_time` STRING COMMENT '可以領取的結束日期',`operate_time` STRING COMMENT '修改時間',`expire_time` STRING COMMENT '過期時間'
) COMMENT '優惠券維度表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_coupon_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2.分區規劃
3.數據裝載
1)首日裝載
insert overwrite table dim_coupon_info partition(dt='2020-06-14')
selectid,coupon_name,coupon_type,condition_amount,condition_num,activity_id,benefit_amount,benefit_discount,create_time,range_type,limit_num,taken_count,start_time,end_time,operate_time,expire_time
from ods_coupon_info
where dt='2020-06-14';
2)每日裝載
insert overwrite table dim_coupon_info partition(dt='2020-06-15')
selectid,coupon_name,coupon_type,condition_amount,condition_num,activity_id,benefit_amount,benefit_discount,create_time,range_type,limit_num,taken_count,start_time,end_time,operate_time,expire_time
from ods_coupon_info
where dt='2020-06-15';
5.3?活動維度表(全量)
1.建表語句
DROP TABLE IF EXISTS dim_activity_rule_info;
CREATE EXTERNAL TABLE dim_activity_rule_info(`activity_rule_id` STRING COMMENT '活動規則ID',`activity_id` STRING COMMENT '活動ID',`activity_name` STRING COMMENT '活動名稱',`activity_type` STRING COMMENT '活動類型',`start_time` STRING COMMENT '開始時間',`end_time` STRING COMMENT '結束時間',`create_time` STRING COMMENT '創建時間',`condition_amount` DECIMAL(16,2) COMMENT '滿減金額',`condition_num` BIGINT COMMENT '滿減件數',`benefit_amount` DECIMAL(16,2) COMMENT '優惠金額',`benefit_discount` DECIMAL(16,2) COMMENT '優惠折扣',`benefit_level` STRING COMMENT '優惠級別'
) COMMENT '活動信息表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_activity_rule_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2.分區規劃
3.數據裝載
1)首日裝載
insert overwrite table dim_activity_rule_info partition(dt='2020-06-14')
selectar.id,ar.activity_id,ai.activity_name,ar.activity_type,ai.start_time,ai.end_time,ai.create_time,ar.condition_amount,ar.condition_num,ar.benefit_amount,ar.benefit_discount,ar.benefit_level
from
(selectid,activity_id,activity_type,condition_amount,condition_num,benefit_amount,benefit_discount,benefit_levelfrom ods_activity_rulewhere dt='2020-06-14'
)ar
left join
(selectid,activity_name,start_time,end_time,create_timefrom ods_activity_infowhere dt='2020-06-14'
)ai
on ar.activity_id=ai.id;
2)每日轉載
insert overwrite table dim_activity_rule_info partition(dt='2020-06-15')
selectar.id,ar.activity_id,ai.activity_name,ar.activity_type,ai.start_time,ai.end_time,ai.create_time,ar.condition_amount,ar.condition_num,ar.benefit_amount,ar.benefit_discount,ar.benefit_level
from
(selectid,activity_id,activity_type,condition_amount,condition_num,benefit_amount,benefit_discount,benefit_levelfrom ods_activity_rulewhere dt='2020-06-15'
)ar
left join
(selectid,activity_name,start_time,end_time,create_timefrom ods_activity_infowhere dt='2020-06-15'
)ai
on ar.activity_id=ai.id;
5.4?地區維度表(特殊)
1.建表語句
DROP TABLE IF EXISTS dim_base_province;
CREATE EXTERNAL TABLE dim_base_province (`id` STRING COMMENT 'id',`province_name` STRING COMMENT '省市名稱',`area_code` STRING COMMENT '地區編碼',`iso_code` STRING COMMENT 'ISO-3166編碼,供可視化使用',`iso_3166_2` STRING COMMENT 'IOS-3166-2編碼,供可視化使用',`region_id` STRING COMMENT '地區id',`region_name` STRING COMMENT '地區名稱'
) COMMENT '地區維度表'
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_base_province/'
TBLPROPERTIES ("parquet.compression"="lzo");
2.數據裝載
地區維度表數據相對穩定,變化概率較低,故無需每日裝載。
insert overwrite table dim_base_province
selectbp.id,bp.name,bp.area_code,bp.iso_code,bp.iso_3166_2,bp.region_id,br.region_name
from ods_base_province bp
join ods_base_region br on bp.region_id = br.id;
5.5 時間維度表(特殊)
1.建表語句
DROP TABLE IF EXISTS dim_date_info;
CREATE EXTERNAL TABLE dim_date_info(`date_id` STRING COMMENT '日',`week_id` STRING COMMENT '周ID',`week_day` STRING COMMENT '周幾',`day` STRING COMMENT '每月的第幾天',`month` STRING COMMENT '第幾月',`quarter` STRING COMMENT '第幾季度',`year` STRING COMMENT '年',`is_workday` STRING COMMENT '是否是工作日',`holiday_id` STRING COMMENT '節假日'
) COMMENT '時間維度表'
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_date_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2.數據裝載
通常情況下,時間維度表的數據并不是來自于業務系統,而是手動寫入,并且由于時間維度表數據的可預見性,無須每日導入,一般可一次性導入一年的數據。
1)創建臨時表
DROP TABLE IF EXISTS tmp_dim_date_info;
CREATE EXTERNAL TABLE tmp_dim_date_info (`date_id` STRING COMMENT '日',`week_id` STRING COMMENT '周ID',`week_day` STRING COMMENT '周幾',`day` STRING COMMENT '每月的第幾天',`month` STRING COMMENT '第幾月',`quarter` STRING COMMENT '第幾季度',`year` STRING COMMENT '年',`is_workday` STRING COMMENT '是否是工作日',`holiday_id` STRING COMMENT '節假日'
) COMMENT '時間維度表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/tmp/tmp_dim_date_info/';
2)將數據文件上傳到HFDS上臨時表指定路徑/warehouse/gmall/tmp/tmp_dim_date_info/
????????
其內容如下:
2020-01-01 1 3 1 1 1 2020 0 元旦
2020-01-02 1 4 2 1 1 2020 1 \N
2020-01-03 1 5 3 1 1 2020 1 \N
2020-01-04 1 6 4 1 1 2020 0 \N
3)執行以下語句將其導入時間維度表
insert overwrite table dim_date_info select * from tmp_dim_date_info;
4)檢查數據是否導入成功
select * from dim_date_info;
5.6 用戶維度表(拉鏈表)
5.6.1 拉鏈表概述
1)什么是拉鏈表
2)為什么要做拉鏈表
3)如何使用拉鏈表
4)拉鏈表形成過程
5.6.2 制作拉鏈表
1.建表語句
DROP TABLE IF EXISTS dim_user_info;
CREATE EXTERNAL TABLE dim_user_info(`id` STRING COMMENT '用戶id',`login_name` STRING COMMENT '用戶名稱',`nick_name` STRING COMMENT '用戶昵稱',`name` STRING COMMENT '用戶姓名',`phone_num` STRING COMMENT '手機號碼',`email` STRING COMMENT '郵箱',`user_level` STRING COMMENT '用戶等級',`birthday` STRING COMMENT '生日',`gender` STRING COMMENT '性別',`create_time` STRING COMMENT '創建時間',`operate_time` STRING COMMENT '操作時間',`start_date` STRING COMMENT '開始日期',`end_date` STRING COMMENT '結束日期'
) COMMENT '用戶表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_user_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2.分區規劃
3.數據裝載
1)首日裝載
????????拉鏈表首日裝載,需要進行初始化操作,具體工作為將截止到初始化當日的全部歷史用戶導入一次性導入到拉鏈表中。目前的ods_user_info表的第一個分區,即2020-06-14分區中就是全部的歷史用戶,故將該分區數據進行一定處理后導入拉鏈表的9999-99-99分區即可。
insert overwrite table dim_user_info partition(dt='9999-99-99')
selectid,login_name,nick_name,md5(name),md5(phone_num),md5(email),user_level,birthday,gender,create_time,operate_time,'2020-06-14','9999-99-99'
from ods_user_info
where dt='2020-06-14';
2)每日裝載
(1)實現思路
(2)sql編寫
with
tmp as
(selectold.id old_id,old.login_name old_login_name,old.nick_name old_nick_name,old.name old_name,old.phone_num old_phone_num,old.email old_email,old.user_level old_user_level,old.birthday old_birthday,old.gender old_gender,old.create_time old_create_time,old.operate_time old_operate_time,old.start_date old_start_date,old.end_date old_end_date,new.id new_id,new.login_name new_login_name,new.nick_name new_nick_name,new.name new_name,new.phone_num new_phone_num,new.email new_email,new.user_level new_user_level,new.birthday new_birthday,new.gender new_gender,new.create_time new_create_time,new.operate_time new_operate_time,new.start_date new_start_date,new.end_date new_end_datefrom(selectid,login_name,nick_name,name,phone_num,email,user_level,birthday,gender,create_time,operate_time,start_date,end_datefrom dim_user_infowhere dt='9999-99-99')oldfull outer join(selectid,login_name,nick_name,md5(name) name,md5(phone_num) phone_num,md5(email) email,user_level,birthday,gender,create_time,operate_time,'2020-06-15' start_date,'9999-99-99' end_datefrom ods_user_infowhere dt='2020-06-15')newon old.id=new.id
)
insert overwrite table dim_user_info partition(dt)
selectnvl(new_id,old_id),nvl(new_login_name,old_login_name),nvl(new_nick_name,old_nick_name),nvl(new_name,old_name),nvl(new_phone_num,old_phone_num),nvl(new_email,old_email),nvl(new_user_level,old_user_level),nvl(new_birthday,old_birthday),nvl(new_gender,old_gender),nvl(new_create_time,old_create_time),nvl(new_operate_time,old_operate_time),nvl(new_start_date,old_start_date),nvl(new_end_date,old_end_date),nvl(new_end_date,old_end_date) dt
from tmp
union all
selectold_id,old_login_name,old_nick_name,old_name,old_phone_num,old_email,old_user_level,old_birthday,old_gender,old_create_time,old_operate_time,old_start_date,cast(date_add('2020-06-15',-1) as string),cast(date_add('2020-06-15',-1) as string) dt
from tmp
where new_id is not null and old_id is not null;
5.7?DIM層首日數據裝載腳本
1)編寫腳本
(1)在/home/seven/bin目錄下創建腳本ods_to_dim_db_init.sh
[seven@hadoop102 bin]$ vim ods_to_dim_db_init.sh
在腳本中填寫如下內容
#!/bin/bashAPP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "請傳入日期參數"exit
fi dim_user_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_user_info partition(dt='9999-99-99')
selectid,login_name,nick_name,md5(name),md5(phone_num),md5(email),user_level,birthday,gender,create_time,operate_time,'$do_date','9999-99-99'
from ${APP}.ods_user_info
where dt='$do_date';
"dim_sku_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
with
sku as
(selectid,price,sku_name,sku_desc,weight,is_sale,spu_id,category3_id,tm_id,create_timefrom ${APP}.ods_sku_infowhere dt='$do_date'
),
spu as
(selectid,spu_namefrom ${APP}.ods_spu_infowhere dt='$do_date'
),
c3 as
(selectid,name,category2_idfrom ${APP}.ods_base_category3where dt='$do_date'
),
c2 as
(selectid,name,category1_idfrom ${APP}.ods_base_category2where dt='$do_date'
),
c1 as
(selectid,namefrom ${APP}.ods_base_category1where dt='$do_date'
),
tm as
(selectid,tm_namefrom ${APP}.ods_base_trademarkwhere dt='$do_date'
),
attr as
(selectsku_id,collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrsfrom ${APP}.ods_sku_attr_valuewhere dt='$do_date'group by sku_id
),
sale_attr as
(selectsku_id,collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrsfrom ${APP}.ods_sku_sale_attr_valuewhere dt='$do_date'group by sku_id
)insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date')
selectsku.id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.is_sale,sku.spu_id,spu.spu_name,sku.category3_id,c3.name,c3.category2_id,c2.name,c2.category1_id,c1.name,sku.tm_id,tm.tm_name,attr.attrs,sale_attr.sale_attrs,sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"dim_base_province="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_base_province
selectbp.id,bp.name,bp.area_code,bp.iso_code,bp.iso_3166_2,bp.region_id,br.region_name
from ${APP}.ods_base_province bp
join ${APP}.ods_base_region br on bp.region_id = br.id;
"dim_coupon_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date')
selectid,coupon_name,coupon_type,condition_amount,condition_num,activity_id,benefit_amount,benefit_discount,create_time,range_type,limit_num,taken_count,start_time,end_time,operate_time,expire_time
from ${APP}.ods_coupon_info
where dt='$do_date';
"dim_activity_rule_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date')
selectar.id,ar.activity_id,ai.activity_name,ar.activity_type,ai.start_time,ai.end_time,ai.create_time,ar.condition_amount,ar.condition_num,ar.benefit_amount,ar.benefit_discount,ar.benefit_level
from
(selectid,activity_id,activity_type,condition_amount,condition_num,benefit_amount,benefit_discount,benefit_levelfrom ${APP}.ods_activity_rulewhere dt='$do_date'
)ar
left join
(selectid,activity_name,start_time,end_time,create_timefrom ${APP}.ods_activity_infowhere dt='$do_date'
)ai
on ar.activity_id=ai.id;
"case $1 in
"dim_user_info"){hive -e "$dim_user_info"
};;
"dim_sku_info"){hive -e "$dim_sku_info"
};;
"dim_base_province"){hive -e "$dim_base_province"
};;
"dim_coupon_info"){hive -e "$dim_coupon_info"
};;
"dim_activity_rule_info"){hive -e "$dim_activity_rule_info"
};;
"all"){hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info$dim_base_province"
};;
esac
(2)增加執行權限
[seven@hadoop102 bin]$ chmod +x ods_to_dim_db_init.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 bin]$ ods_to_dim_db_init.sh all 2020-06-14
注意:該腳本不包含時間維度表的裝載,時間維度表需手動裝載數據,參考5.5節。
(2)查看數據是否導入成功
5.8 DIM層每日數據裝載腳本
1)編寫腳本
(1)在/home/seven/bin目錄下創建腳本ods_to_dim_db.sh
[seven@hadoop102 bin]$ vim ods_to_dim_db.sh
在腳本中填寫如下內容
#!/bin/bashAPP=gmall# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fidim_user_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
with
tmp as
(selectold.id old_id,old.login_name old_login_name,old.nick_name old_nick_name,old.name old_name,old.phone_num old_phone_num,old.email old_email,old.user_level old_user_level,old.birthday old_birthday,old.gender old_gender,old.create_time old_create_time,old.operate_time old_operate_time,old.start_date old_start_date,old.end_date old_end_date,new.id new_id,new.login_name new_login_name,new.nick_name new_nick_name,new.name new_name,new.phone_num new_phone_num,new.email new_email,new.user_level new_user_level,new.birthday new_birthday,new.gender new_gender,new.create_time new_create_time,new.operate_time new_operate_time,new.start_date new_start_date,new.end_date new_end_datefrom(selectid,login_name,nick_name,name,phone_num,email,user_level,birthday,gender,create_time,operate_time,start_date,end_datefrom ${APP}.dim_user_infowhere dt='9999-99-99'and start_date<'$do_date')oldfull outer join(selectid,login_name,nick_name,md5(name) name,md5(phone_num) phone_num,md5(email) email,user_level,birthday,gender,create_time,operate_time,'$do_date' start_date,'9999-99-99' end_datefrom ${APP}.ods_user_infowhere dt='$do_date')newon old.id=new.id
)
insert overwrite table ${APP}.dim_user_info partition(dt)
selectnvl(new_id,old_id),nvl(new_login_name,old_login_name),nvl(new_nick_name,old_nick_name),nvl(new_name,old_name),nvl(new_phone_num,old_phone_num),nvl(new_email,old_email),nvl(new_user_level,old_user_level),nvl(new_birthday,old_birthday),nvl(new_gender,old_gender),nvl(new_create_time,old_create_time),nvl(new_operate_time,old_operate_time),nvl(new_start_date,old_start_date),nvl(new_end_date,old_end_date),nvl(new_end_date,old_end_date) dt
from tmp
union all
selectold_id,old_login_name,old_nick_name,old_name,old_phone_num,old_email,old_user_level,old_birthday,old_gender,old_create_time,old_operate_time,old_start_date,cast(date_add('$do_date',-1) as string),cast(date_add('$do_date',-1) as string) dt
from tmp
where new_id is not null and old_id is not null;
"dim_sku_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
with
sku as
(selectid,price,sku_name,sku_desc,weight,is_sale,spu_id,category3_id,tm_id,create_timefrom ${APP}.ods_sku_infowhere dt='$do_date'
),
spu as
(selectid,spu_namefrom ${APP}.ods_spu_infowhere dt='$do_date'
),
c3 as
(selectid,name,category2_idfrom ${APP}.ods_base_category3where dt='$do_date'
),
c2 as
(selectid,name,category1_idfrom ${APP}.ods_base_category2where dt='$do_date'
),
c1 as
(selectid,namefrom ${APP}.ods_base_category1where dt='$do_date'
),
tm as
(selectid,tm_namefrom ${APP}.ods_base_trademarkwhere dt='$do_date'
),
attr as
(selectsku_id,collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrsfrom ${APP}.ods_sku_attr_valuewhere dt='$do_date'group by sku_id
),
sale_attr as
(selectsku_id,collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrsfrom ${APP}.ods_sku_sale_attr_valuewhere dt='$do_date'group by sku_id
)insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date')
selectsku.id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.is_sale,sku.spu_id,spu.spu_name,sku.category3_id,c3.name,c3.category2_id,c2.name,c2.category1_id,c1.name,sku.tm_id,tm.tm_name,attr.attrs,sale_attr.sale_attrs,sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"dim_base_province="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_base_province
selectbp.id,bp.name,bp.area_code,bp.iso_code,bp.iso_3166_2,bp.region_id,bp.name
from ${APP}.ods_base_province bp
join ${APP}.ods_base_region br on bp.region_id = br.id;
"dim_coupon_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date')
selectid,coupon_name,coupon_type,condition_amount,condition_num,activity_id,benefit_amount,benefit_discount,create_time,range_type,limit_num,taken_count,start_time,end_time,operate_time,expire_time
from ${APP}.ods_coupon_info
where dt='$do_date';
"dim_activity_rule_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date')
selectar.id,ar.activity_id,ai.activity_name,ar.activity_type,ai.start_time,ai.end_time,ai.create_time,ar.condition_amount,ar.condition_num,ar.benefit_amount,ar.benefit_discount,ar.benefit_level
from
(selectid,activity_id,activity_type,condition_amount,condition_num,benefit_amount,benefit_discount,benefit_levelfrom ${APP}.ods_activity_rulewhere dt='$do_date'
)ar
left join
(selectid,activity_name,start_time,end_time,create_timefrom ${APP}.ods_activity_infowhere dt='$do_date'
)ai
on ar.activity_id=ai.id;
"case $1 in
"dim_user_info"){hive -e "$dim_user_info"
};;
"dim_sku_info"){hive -e "$dim_sku_info"
};;
"dim_base_province"){hive -e "$dim_base_province"
};;
"dim_coupon_info"){hive -e "$dim_coupon_info"
};;
"dim_activity_rule_info"){hive -e "$dim_activity_rule_info"
};;
"all"){hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info"
};;
esac
(2)增加執行權限
[seven@hadoop102 bin]$ chmod?+x ods_to_dim_db.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 bin]$ ods_to_dim_db.sh all?2020-06-14
(2)查看數據是否導入成功
第6章 數倉搭建-DWD層
1)對用戶行為數據解析。
2)對業務數據采用維度模型重新建模。
6.1 DWD層(用戶行為日志)
6.1.1 日志解析思路
1)日志結構回顧
(1)頁面埋點日志
(2)啟動日志
2)日志解析思路
6.1.2 get_json_object函數使用
1)數據
[{"name":"大郎","sex":"男","age":"25"},{"name":"西門慶","sex":"男","age":"47"}]
2)取出第一個json對象
hive (gmall)>select get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西門慶","sex":"男","age":"47"}]','$[0]');
結果是:{"name":"大郎","sex":"男","age":"25"}
3)取出第一個json的age字段的值
hive (gmall)>SELECT get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西門慶","sex":"男","age":"47"}]',"$[0].age");
結果是:25
6.1.3?啟動日志表
啟動日志解析思路:啟動日志表中每行數據對應一個啟動記錄,一個啟動記錄應該包含日志中的公共信息和啟動信息。先將所有包含start字段的日志過濾出來,然后使用get_json_object函數解析每個字段。
1)建表語句
DROP TABLE IF EXISTS dwd_start_log;
CREATE EXTERNAL TABLE dwd_start_log(`area_code` STRING COMMENT '地區編碼',`brand` STRING COMMENT '手機品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次啟動',`model` STRING COMMENT '手機型號',`mid_id` STRING COMMENT '設備id',`os` STRING COMMENT '操作系統',`user_id` STRING COMMENT '會員id',`version_code` STRING COMMENT 'app版本號',`entry` STRING COMMENT 'icon手機圖標 notice 通知 install 安裝后啟動',`loading_time` BIGINT COMMENT '啟動加載時間',`open_ad_id` STRING COMMENT '廣告頁ID ',`open_ad_ms` BIGINT COMMENT '廣告總共播放時間',`open_ad_skip_ms` BIGINT COMMENT '用戶跳過廣告時點',`ts` BIGINT COMMENT '時間'
) COMMENT '啟動日志表'
PARTITIONED BY (`dt` STRING) -- 按照時間創建分區
STORED AS PARQUET -- 采用parquet列式存儲
LOCATION '/warehouse/gmall/dwd/dwd_start_log' -- 指定在HDFS上存儲位置
TBLPROPERTIES('parquet.compression'='lzo') -- 采用LZO壓縮
;
2)數據導入
hive (gmall)>
insert overwrite table dwd_start_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.start.entry'),get_json_object(line,'$.start.loading_time'),get_json_object(line,'$.start.open_ad_id'),get_json_object(line,'$.start.open_ad_ms'),get_json_object(line,'$.start.open_ad_skip_ms'),get_json_object(line,'$.ts')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.start') is not null;
3)查看數據
hive (gmall)> select * from dwd_start_log where dt='2020-06-14' limit 2;
6.1.4?頁面日志表
頁面日志解析思路:頁面日志表中每行數據對應一個頁面訪問記錄,一個頁面訪問記錄應該包含日志中的公共信息和頁面信息。先將所有包含page字段的日志過濾出來,然后使用get_json_object函數解析每個字段。
1)建表語句
DROP TABLE IF EXISTS dwd_page_log;
CREATE EXTERNAL TABLE dwd_page_log(`area_code` STRING COMMENT '地區編碼',`brand` STRING COMMENT '手機品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次啟動',`model` STRING COMMENT '手機型號',`mid_id` STRING COMMENT '設備id',`os` STRING COMMENT '操作系統',`user_id` STRING COMMENT '會員id',`version_code` STRING COMMENT 'app版本號',`during_time` BIGINT COMMENT '持續時間毫秒',`page_item` STRING COMMENT '目標id ',`page_item_type` STRING COMMENT '目標類型',`last_page_id` STRING COMMENT '上頁類型',`page_id` STRING COMMENT '頁面ID ',`source_type` STRING COMMENT '來源類型',`ts` bigint
) COMMENT '頁面日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_page_log'
TBLPROPERTIES('parquet.compression'='lzo');
2)數據導入
hive (gmall)>
insert overwrite table dwd_page_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.ts')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.page') is not null;
3)查看數據
hive (gmall)>
select * from dwd_page_log where dt='2020-06-14' limit 2;
6.1.5?動作日志表
動作日志解析思路:動作日志表中每行數據對應用戶的一個動作記錄,一個動作記錄應當包含公共信息、頁面信息以及動作信息。先將包含action字段的日志過濾出來,然后通過UDTF函數,將action數組“炸開”(類似于explode函數的效果),然后使用get_json_object函數解析每個字段。
1)建表語句
DROP TABLE IF EXISTS dwd_action_log;
CREATE EXTERNAL TABLE dwd_action_log(`area_code` STRING COMMENT '地區編碼',`brand` STRING COMMENT '手機品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次啟動',`model` STRING COMMENT '手機型號',`mid_id` STRING COMMENT '設備id',`os` STRING COMMENT '操作系統',`user_id` STRING COMMENT '會員id',`version_code` STRING COMMENT 'app版本號',`during_time` BIGINT COMMENT '持續時間毫秒',`page_item` STRING COMMENT '目標id ',`page_item_type` STRING COMMENT '目標類型',`last_page_id` STRING COMMENT '上頁類型',`page_id` STRING COMMENT '頁面id ',`source_type` STRING COMMENT '來源類型',`action_id` STRING COMMENT '動作id',`item` STRING COMMENT '目標id ',`item_type` STRING COMMENT '目標類型',`ts` BIGINT COMMENT '時間'
) COMMENT '動作日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_action_log'
TBLPROPERTIES('parquet.compression'='lzo');
2)創建UDTF函數——設計思路
3)創建UDTF函數——編寫代碼
(1)創建一個maven工程:hivefunction
(2)創建包名:com.seven.hive.udtf
(3)引入如下依賴
<dependencies><!--添加hive依賴--><dependency><groupId>org.apache.hive</groupId><artifactId>hive-exec</artifactId><version>3.1.2</version></dependency>
</dependencies>
(4)編碼
package com.seven.hive.udtf;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.json.JSONArray;import java.util.ArrayList;
import java.util.List;public class ExplodeJSONArray extends GenericUDTF {@Overridepublic StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {// 1 參數合法性檢查if (argOIs.length != 1) {throw new UDFArgumentException("explode_json_array 只需要一個參數");}// 2 第一個參數必須為string//判斷參數是否為基礎數據類型if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {throw new UDFArgumentException("explode_json_array 只接受基礎類型參數");}//將參數對象檢查器強轉為基礎類型對象檢查器PrimitiveObjectInspector argumentOI = (PrimitiveObjectInspector) argOIs[0];//判斷參數是否為String類型if (argumentOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) {throw new UDFArgumentException("explode_json_array 只接受string類型的參數");}// 3 定義返回值名稱和類型List<String> fieldNames = new ArrayList<String>();List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();fieldNames.add("items");fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);}public void process(Object[] objects) throws HiveException {// 1 獲取傳入的數據String jsonArray = objects[0].toString();// 2 將string轉換為json數組JSONArray actions = new JSONArray(jsonArray);// 3 循環一次,取出數組中的一個json,并寫出for (int i = 0; i < actions.length(); i++) {String[] result = new String[1];result[0] = actions.getString(i);forward(result);}}public void close() throws HiveException {}}
4)創建函數
(1)打包
(2)將hivefunction-1.0-SNAPSHOT.jar上傳到hadoop102的/opt/module,然后再將該jar包上傳到HDFS的/user/hive/jars路徑下
? ?[seven@hadoop102 module]$ hadoop fs -mkdir -p /user/hive/jars
???[seven@hadoop102 module]$ hadoop fs -put hivefunction-1.0-SNAPSHOT.jar /user/hive/jars
(3)創建永久函數與開發好的java class關聯
create function explode_json_array?as 'com.seven.hive.udtf.ExplodeJSONArray' using jar 'hdfs://hadoop102:8020/user/hive/jars/hivefunction-1.0-SNAPSHOT.jar';
(4)注意:如果修改了自定義函數重新生成jar包怎么處理?只需要替換HDFS路徑上的舊jar包,然后重啟Hive客戶端即可。
5)數據導入
insert overwrite table dwd_action_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(action,'$.action_id'),get_json_object(action,'$.item'),get_json_object(action,'$.item_type'),get_json_object(action,'$.ts')
from ods_log lateral view explode_json_array(get_json_object(line,'$.actions')) tmp as action
where dt='2020-06-14'
and get_json_object(line,'$.actions') is not null;
3)查看數據
????????select * from dwd_action_log where dt='2020-06-14' limit 2;
6.1.6?曝光日志表
????????曝光日志解析思路:曝光日志表中每行數據對應一個曝光記錄,一個曝光記錄應當包含公共信息、頁面信息以及曝光信息。先將包含display字段的日志過濾出來,然后通過UDTF函數,將display數組“炸開”(類似于explode函數的效果),然后使用get_json_object函數解析每個字段。
1)建表語句
DROP TABLE IF EXISTS dwd_display_log;
CREATE EXTERNAL TABLE dwd_display_log(`area_code` STRING COMMENT '地區編碼',`brand` STRING COMMENT '手機品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次啟動',`model` STRING COMMENT '手機型號',`mid_id` STRING COMMENT '設備id',`os` STRING COMMENT '操作系統',`user_id` STRING COMMENT '會員id',`version_code` STRING COMMENT 'app版本號',`during_time` BIGINT COMMENT 'app版本號',`page_item` STRING COMMENT '目標id ',`page_item_type` STRING COMMENT '目標類型',`last_page_id` STRING COMMENT '上頁類型',`page_id` STRING COMMENT '頁面ID ',`source_type` STRING COMMENT '來源類型',`ts` BIGINT COMMENT 'app版本號',`display_type` STRING COMMENT '曝光類型',`item` STRING COMMENT '曝光對象id ',`item_type` STRING COMMENT 'app版本號',`order` BIGINT COMMENT '曝光順序',`pos_id` BIGINT COMMENT '曝光位置'
) COMMENT '曝光日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_display_log'
TBLPROPERTIES('parquet.compression'='lzo');
2)數據導入
insert overwrite table dwd_display_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.ts'),get_json_object(display,'$.display_type'),get_json_object(display,'$.item'),get_json_object(display,'$.item_type'),get_json_object(display,'$.order'),get_json_object(display,'$.pos_id')
from ods_log lateral view explode_json_array(get_json_object(line,'$.displays')) tmp as display
where dt='2020-06-14'
and get_json_object(line,'$.displays') is not null;
3)查看數據
select * from dwd_display_log where dt='2020-06-14' limit 2;
6.1.7?錯誤日志表
錯誤日志解析思路:錯誤日志表中每行數據對應一個錯誤記錄,為方便定位錯誤,一個錯誤記錄應當包含與之對應的公共信息、頁面信息、曝光信息、動作信息、啟動信息以及錯誤信息。先將包含err字段的日志過濾出來,然后使用get_json_object函數解析所有字段。
1)建表語句
DROP TABLE IF EXISTS dwd_error_log;
CREATE EXTERNAL TABLE dwd_error_log(`area_code` STRING COMMENT '地區編碼',`brand` STRING COMMENT '手機品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次啟動',`model` STRING COMMENT '手機型號',`mid_id` STRING COMMENT '設備id',`os` STRING COMMENT '操作系統',`user_id` STRING COMMENT '會員id',`version_code` STRING COMMENT 'app版本號',`page_item` STRING COMMENT '目標id ',`page_item_type` STRING COMMENT '目標類型',`last_page_id` STRING COMMENT '上頁類型',`page_id` STRING COMMENT '頁面ID ',`source_type` STRING COMMENT '來源類型',`entry` STRING COMMENT ' icon手機圖標 notice 通知 install 安裝后啟動',`loading_time` STRING COMMENT '啟動加載時間',`open_ad_id` STRING COMMENT '廣告頁ID ',`open_ad_ms` STRING COMMENT '廣告總共播放時間',`open_ad_skip_ms` STRING COMMENT '用戶跳過廣告時點',`actions` STRING COMMENT '動作',`displays` STRING COMMENT '曝光',`ts` STRING COMMENT '時間',`error_code` STRING COMMENT '錯誤碼',`msg` STRING COMMENT '錯誤信息'
) COMMENT '錯誤日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_error_log'
TBLPROPERTIES('parquet.compression'='lzo');
說明:此處為對動作數組和曝光數組做處理,如需分析錯誤與單個動作或曝光的關聯,可先使用explode_json_array函數將數組“炸開”,再使用get_json_object函數獲取具體字段。
4)數據導入
insert overwrite table dwd_error_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.start.entry'),get_json_object(line,'$.start.loading_time'),get_json_object(line,'$.start.open_ad_id'),get_json_object(line,'$.start.open_ad_ms'),get_json_object(line,'$.start.open_ad_skip_ms'),get_json_object(line,'$.actions'),get_json_object(line,'$.displays'),get_json_object(line,'$.ts'),get_json_object(line,'$.err.error_code'),get_json_object(line,'$.err.msg')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.err') is not null;
5)查看數據
hive (gmall)>
select * from dwd_error_log where dt='2020-06-14' limit 2;
6.1.8?DWD層用戶行為數據加載腳本
1)編寫腳本
(1)在hadoop102的/home/seven/bin目錄下創建腳本
[seven@hadoop102 bin]$ vim ods_to_dwd_log.sh
在腳本中編寫如下內容
#!/bin/bashAPP=gmall
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fidwd_start_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_start_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.start.entry'),get_json_object(line,'$.start.loading_time'),get_json_object(line,'$.start.open_ad_id'),get_json_object(line,'$.start.open_ad_ms'),get_json_object(line,'$.start.open_ad_skip_ms'),get_json_object(line,'$.ts')
from ${APP}.ods_log
where dt='$do_date'
and get_json_object(line,'$.start') is not null;"dwd_page_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_page_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.ts')
from ${APP}.ods_log
where dt='$do_date'
and get_json_object(line,'$.page') is not null;"dwd_action_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_action_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(action,'$.action_id'),get_json_object(action,'$.item'),get_json_object(action,'$.item_type'),get_json_object(action,'$.ts')
from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.actions')) tmp as action
where dt='$do_date'
and get_json_object(line,'$.actions') is not null;"dwd_display_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_display_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.ts'),get_json_object(display,'$.display_type'),get_json_object(display,'$.item'),get_json_object(display,'$.item_type'),get_json_object(display,'$.order'),get_json_object(display,'$.pos_id')
from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.displays')) tmp as display
where dt='$do_date'
and get_json_object(line,'$.displays') is not null;"dwd_error_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_error_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.start.entry'),get_json_object(line,'$.start.loading_time'),get_json_object(line,'$.start.open_ad_id'),get_json_object(line,'$.start.open_ad_ms'),get_json_object(line,'$.start.open_ad_skip_ms'),get_json_object(line,'$.actions'),get_json_object(line,'$.displays'),get_json_object(line,'$.ts'),get_json_object(line,'$.err.error_code'),get_json_object(line,'$.err.msg')
from ${APP}.ods_log
where dt='$do_date'
and get_json_object(line,'$.err') is not null;"case $1 indwd_start_log )hive -e "$dwd_start_log";;dwd_page_log )hive -e "$dwd_page_log";;dwd_action_log )hive -e "$dwd_action_log";;dwd_display_log )hive -e "$dwd_display_log";;dwd_error_log )hive -e "$dwd_error_log";;all )hive -e "$dwd_start_log$dwd_page_log$dwd_action_log$dwd_display_log$dwd_error_log";;
esac
(2)增加腳本執行權限
[seven@hadoop102 bin]$ chmod 777 ods_to_dwd_log.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 module]$ ods_to_dwd_log.sh all?2020-06-14
(2)查詢導入結果
6.2?DWD層(業務數據)
業務數據方面DWD層的搭建主要注意點在于維度建模。
6.2.1?評價事實表(事務型事實表)
1)建表語句
DROP TABLE IF EXISTS dwd_comment_info;
CREATE EXTERNAL TABLE dwd_comment_info(`id` STRING COMMENT '編號',`user_id` STRING COMMENT '用戶ID',`sku_id` STRING COMMENT '商品sku',`spu_id` STRING COMMENT '商品spu',`order_id` STRING COMMENT '訂單ID',`appraise` STRING COMMENT '評價(好評、中評、差評、默認評價)',`create_time` STRING COMMENT '評價時間'
) COMMENT '評價事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_comment_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)分區規劃
3)數據裝載
(1)首日裝載
insert overwrite table dwd_comment_info partition (dt)
selectid,user_id,sku_id,spu_id,order_id,appraise,create_time,date_format(create_time,'yyyy-MM-dd')
from ods_comment_info
where dt='2020-06-14';
(2)每日裝載
insert overwrite table dwd_comment_info partition(dt='2020-06-15')
selectid,user_id,sku_id,spu_id,order_id,appraise,create_time
from ods_comment_info where dt='2020-06-15';
6.2.2?訂單明細事實表(事務型事實表)
1)建表語句
DROP TABLE IF EXISTS dwd_order_detail;
CREATE EXTERNAL TABLE dwd_order_detail (`id` STRING COMMENT '訂單編號',`order_id` STRING COMMENT '訂單號',`user_id` STRING COMMENT '用戶id',`sku_id` STRING COMMENT 'sku商品id',`province_id` STRING COMMENT '省份ID',`activity_id` STRING COMMENT '活動ID',`activity_rule_id` STRING COMMENT '活動規則ID',`coupon_id` STRING COMMENT '優惠券ID',`create_time` STRING COMMENT '創建時間',`source_type` STRING COMMENT '來源類型',`source_id` STRING COMMENT '來源編號',`sku_num` BIGINT COMMENT '商品數量',`original_amount` DECIMAL(16,2) COMMENT '原始價格',`split_activity_amount` DECIMAL(16,2) COMMENT '活動優惠分攤',`split_coupon_amount` DECIMAL(16,2) COMMENT '優惠券優惠分攤',`split_final_amount` DECIMAL(16,2) COMMENT '最終價格分攤'
) COMMENT '訂單明細事實表表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_order_detail/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)分區規劃
3)數據裝載
(1)首日裝載
insert overwrite table dwd_order_detail partition(dt)
selectod.id,od.order_id,oi.user_id,od.sku_id,oi.province_id,oda.activity_id,oda.activity_rule_id,odc.coupon_id,od.create_time,od.source_type,od.source_id,od.sku_num,od.order_price*od.sku_num,od.split_activity_amount,od.split_coupon_amount,od.split_final_amount,date_format(create_time,'yyyy-MM-dd')
from
(select*from ods_order_detailwhere dt='2020-06-14'
)od
left join
(selectid,user_id,province_idfrom ods_order_infowhere dt='2020-06-14'
)oi
on od.order_id=oi.id
left join
(selectorder_detail_id,activity_id,activity_rule_idfrom ods_order_detail_activitywhere dt='2020-06-14'
)oda
on od.id=oda.order_detail_id
left join
(selectorder_detail_id,coupon_idfrom ods_order_detail_couponwhere dt='2020-06-14'
)odc
on od.id=odc.order_detail_id;
(2)每日裝載
insert overwrite table dwd_order_detail partition(dt='2020-06-15')
selectod.id,od.order_id,oi.user_id,od.sku_id,oi.province_id,oda.activity_id,oda.activity_rule_id,odc.coupon_id,od.create_time,od.source_type,od.source_id,od.sku_num,od.order_price*od.sku_num,od.split_activity_amount,od.split_coupon_amount,od.split_final_amount
from
(select*from ods_order_detailwhere dt='2020-06-15'
)od
left join
(selectid,user_id,province_idfrom ods_order_infowhere dt='2020-06-15'
)oi
on od.order_id=oi.id
left join
(selectorder_detail_id,activity_id,activity_rule_idfrom ods_order_detail_activitywhere dt='2020-06-15'
)oda
on od.id=oda.order_detail_id
left join
(selectorder_detail_id,coupon_idfrom ods_order_detail_couponwhere dt='2020-06-15'
)odc
on od.id=odc.order_detail_id;
6.2.3?退單事實表(事務型事實表)
1)建表語句
DROP TABLE IF EXISTS dwd_order_refund_info;
CREATE EXTERNAL TABLE dwd_order_refund_info(`id` STRING COMMENT '編號',`user_id` STRING COMMENT '用戶ID',`order_id` STRING COMMENT '訂單ID',`sku_id` STRING COMMENT '商品ID',`province_id` STRING COMMENT '地區ID',`refund_type` STRING COMMENT '退單類型',`refund_num` BIGINT COMMENT '退單件數',`refund_amount` DECIMAL(16,2) COMMENT '退單金額',`refund_reason_type` STRING COMMENT '退單原因類型',`create_time` STRING COMMENT '退單時間'
) COMMENT '退單事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_order_refund_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)分區規劃
3)數據裝載
(1)首日裝載
insert overwrite table dwd_order_refund_info partition(dt)
selectri.id,ri.user_id,ri.order_id,ri.sku_id,oi.province_id,ri.refund_type,ri.refund_num,ri.refund_amount,ri.refund_reason_type,ri.create_time,date_format(ri.create_time,'yyyy-MM-dd')
from
(select * from ods_order_refund_info where dt='2020-06-14'
)ri
left join
(select id,province_id from ods_order_info where dt='2020-06-14'
)oi
on ri.order_id=oi.id;
(2)每日裝載
insert overwrite table dwd_order_refund_info partition(dt='2020-06-15')
selectri.id,ri.user_id,ri.order_id,ri.sku_id,oi.province_id,ri.refund_type,ri.refund_num,ri.refund_amount,ri.refund_reason_type,ri.create_time
from
(select * from ods_order_refund_info where dt='2020-06-15'
)ri
left join
(select id,province_id from ods_order_info where dt='2020-06-15'
)oi
on ri.order_id=oi.id;
3)查詢加載結果
6.2.4 加購事實表(周期型快照事實表,每日快照)
1)建表語句
DROP TABLE IF EXISTS dwd_cart_info;
CREATE EXTERNAL TABLE dwd_cart_info(`id` STRING COMMENT '編號',`user_id` STRING COMMENT '用戶ID',`sku_id` STRING COMMENT '商品ID',`source_type` STRING COMMENT '來源類型',`source_id` STRING COMMENT '來源編號',`cart_price` DECIMAL(16,2) COMMENT '加入購物車時的價格',`is_ordered` STRING COMMENT '是否已下單',`create_time` STRING COMMENT '創建時間',`operate_time` STRING COMMENT '修改時間',`order_time` STRING COMMENT '下單時間',`sku_num` BIGINT COMMENT '加購數量'
) COMMENT '加購事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_cart_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)分區規劃
3)數據裝載
(1)首日裝載
insert overwrite table dwd_cart_info partition(dt='2020-06-14')
selectid,user_id,sku_id,source_type,source_id,cart_price,is_ordered,create_time,operate_time,order_time,sku_num
from ods_cart_info
where dt='2020-06-14';
(2)每日裝載
insert overwrite table dwd_cart_info partition(dt='2020-06-15')
selectid,user_id,sku_id,source_type,source_id,cart_price,is_ordered,create_time,operate_time,order_time,sku_num
from ods_cart_info
where dt='2020-06-15';
6.2.5 收藏事實表(周期型快照事實表,每日快照)
1)建表語句
DROP TABLE IF EXISTS dwd_favor_info;
CREATE EXTERNAL TABLE dwd_favor_info(`id` STRING COMMENT '編號',`user_id` STRING COMMENT '用戶id',`sku_id` STRING COMMENT 'skuid',`spu_id` STRING COMMENT 'spuid',`is_cancel` STRING COMMENT '是否取消',`create_time` STRING COMMENT '收藏時間',`cancel_time` STRING COMMENT '取消時間'
) COMMENT '收藏事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_favor_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)分區規劃
3)數據裝載
(1)首日裝載
insert overwrite table dwd_favor_info partition(dt='2020-06-14')
selectid,user_id,sku_id,spu_id,is_cancel,create_time,cancel_time
from ods_favor_info
where dt='2020-06-14';
(2)每日裝載
insert overwrite table dwd_favor_info partition(dt='2020-06-15')
selectid,user_id,sku_id,spu_id,is_cancel,create_time,cancel_time
from ods_favor_info
where dt='2020-06-15';
6.2.6 優惠券領用事實表(累積型快照事實表)
1)建表語句
DROP TABLE IF EXISTS dwd_coupon_use;
CREATE EXTERNAL TABLE dwd_coupon_use(`id` STRING COMMENT '編號',`coupon_id` STRING COMMENT '優惠券ID',`user_id` STRING COMMENT 'userid',`order_id` STRING COMMENT '訂單id',`coupon_status` STRING COMMENT '優惠券狀態',`get_time` STRING COMMENT '領取時間',`using_time` STRING COMMENT '使用時間(下單)',`used_time` STRING COMMENT '使用時間(支付)',`expire_time` STRING COMMENT '過期時間'
) COMMENT '優惠券領用事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_coupon_use/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)分區規劃
3)數據裝載
(1)首日裝載
insert overwrite table dwd_coupon_use partition(dt)
selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_time,coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
from ods_coupon_use
where dt='2020-06-14';
(2)每日裝載
a.裝載邏輯
b.轉載語句
insert overwrite table dwd_coupon_use partition(dt)
selectnvl(new.id,old.id),nvl(new.coupon_id,old.coupon_id),nvl(new.user_id,old.user_id),nvl(new.order_id,old.order_id),nvl(new.coupon_status,old.coupon_status),nvl(new.get_time,old.get_time),nvl(new.using_time,old.using_time),nvl(new.used_time,old.used_time),nvl(new.expire_time,old.expire_time),coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
from
(selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_timefrom dwd_coupon_usewhere dt='9999-99-99'
)old
full outer join
(selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_timefrom ods_coupon_usewhere dt='2020-06-15'
)new
on old.id=new.id;
6.2.7 支付事實表(累積型快照事實表)
1)建表語句
DROP TABLE IF EXISTS dwd_payment_info;
CREATE EXTERNAL TABLE dwd_payment_info (`id` STRING COMMENT '編號',`order_id` STRING COMMENT '訂單編號',`user_id` STRING COMMENT '用戶編號',`province_id` STRING COMMENT '地區ID',`trade_no` STRING COMMENT '交易編號',`out_trade_no` STRING COMMENT '對外交易編號',`payment_type` STRING COMMENT '支付類型',`payment_amount` DECIMAL(16,2) COMMENT '支付金額',`payment_status` STRING COMMENT '支付狀態',`create_time` STRING COMMENT '創建時間',--調用第三方支付接口的時間`callback_time` STRING COMMENT '完成時間'--支付完成時間,即支付成功回調時間
) COMMENT '支付事實表表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_payment_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)分區規劃
3)數據裝載
(1)首日裝載
insert overwrite table dwd_payment_info partition(dt)
selectpi.id,pi.order_id,pi.user_id,oi.province_id,pi.trade_no,pi.out_trade_no,pi.payment_type,pi.payment_amount,pi.payment_status,pi.create_time,pi.callback_time,nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(select * from ods_payment_info where dt='2020-06-14'
)pi
left join
(select id,province_id from ods_order_info where dt='2020-06-14'
)oi
on pi.order_id=oi.id;
(2)每日裝載
insert overwrite table dwd_payment_info partition(dt)
selectnvl(new.id,old.id),nvl(new.order_id,old.order_id),nvl(new.user_id,old.user_id),nvl(new.province_id,old.province_id),nvl(new.trade_no,old.trade_no),nvl(new.out_trade_no,old.out_trade_no),nvl(new.payment_type,old.payment_type),nvl(new.payment_amount,old.payment_amount),nvl(new.payment_status,old.payment_status),nvl(new.create_time,old.create_time),nvl(new.callback_time,old.callback_time),nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(select id,order_id,user_id,province_id,trade_no,out_trade_no,payment_type,payment_amount,payment_status,create_time,callback_timefrom dwd_payment_infowhere dt = '9999-99-99'
)old
full outer join
(selectpi.id,pi.out_trade_no,pi.order_id,pi.user_id,oi.province_id,pi.payment_type,pi.trade_no,pi.payment_amount,pi.payment_status,pi.create_time,pi.callback_timefrom(select * from ods_payment_info where dt='2020-06-15')pileft join(select id,province_id from ods_order_info where dt='2020-06-15')oion pi.order_id=oi.id
)new
on old.id=new.id;
6.2.8 退款事實表(累積型快照事實表)
1)建表語句
DROP TABLE IF EXISTS dwd_refund_payment;
CREATE EXTERNAL TABLE dwd_refund_payment (`id` STRING COMMENT '編號',`user_id` STRING COMMENT '用戶ID',`order_id` STRING COMMENT '訂單編號',`sku_id` STRING COMMENT 'SKU編號',`province_id` STRING COMMENT '地區ID',`trade_no` STRING COMMENT '交易編號',`out_trade_no` STRING COMMENT '對外交易編號',`payment_type` STRING COMMENT '支付類型',`refund_amount` DECIMAL(16,2) COMMENT '退款金額',`refund_status` STRING COMMENT '退款狀態',`create_time` STRING COMMENT '創建時間',--調用第三方支付接口的時間`callback_time` STRING COMMENT '回調時間'--支付接口回調時間,即支付成功時間
) COMMENT '退款事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_refund_payment/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)分區規劃
3)數據裝載
(1)首日裝載
insert overwrite table dwd_refund_payment partition(dt)
selectrp.id,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_time,nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(selectid,out_trade_no,order_id,sku_id,payment_type,trade_no,refund_amount,refund_status,create_time,callback_timefrom ods_refund_paymentwhere dt='2020-06-14'
)rp
left join
(selectid,user_id,province_idfrom ods_order_infowhere dt='2020-06-14'
)oi
on rp.order_id=oi.id;
(2)每日裝載
insert overwrite table dwd_refund_payment partition(dt)
selectnvl(new.id,old.id),nvl(new.user_id,old.user_id),nvl(new.order_id,old.order_id),nvl(new.sku_id,old.sku_id),nvl(new.province_id,old.province_id),nvl(new.trade_no,old.trade_no),nvl(new.out_trade_no,old.out_trade_no),nvl(new.payment_type,old.payment_type),nvl(new.refund_amount,old.refund_amount),nvl(new.refund_status,old.refund_status),nvl(new.create_time,old.create_time),nvl(new.callback_time,old.callback_time),nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(selectid,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_timefrom dwd_refund_paymentwhere dt='9999-99-99'
)old
full outer join
(selectrp.id,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_timefrom(selectid,out_trade_no,order_id,sku_id,payment_type,trade_no,refund_amount,refund_status,create_time,callback_timefrom ods_refund_paymentwhere dt='2020-06-15')rpleft join(selectid,user_id,province_idfrom ods_order_infowhere dt='2020-06-15')oion rp.order_id=oi.id
)new
on old.id=new.id;
3)查詢加載結果
6.2.9 訂單事實表(累積型快照事實表)
1)建表語句
DROP TABLE IF EXISTS dwd_order_info;
CREATE EXTERNAL TABLE dwd_order_info(`id` STRING COMMENT '編號',`order_status` STRING COMMENT '訂單狀態',`user_id` STRING COMMENT '用戶ID',`province_id` STRING COMMENT '地區ID',`payment_way` STRING COMMENT '支付方式',`delivery_address` STRING COMMENT '郵寄地址',`out_trade_no` STRING COMMENT '對外交易編號',`tracking_no` STRING COMMENT '物流單號',`create_time` STRING COMMENT '創建時間(未支付狀態)',`payment_time` STRING COMMENT '支付時間(已支付狀態)',`cancel_time` STRING COMMENT '取消時間(已取消狀態)',`finish_time` STRING COMMENT '完成時間(已完成狀態)',`refund_time` STRING COMMENT '退款時間(退款中狀態)',`refund_finish_time` STRING COMMENT '退款完成時間(退款完成狀態)',`expire_time` STRING COMMENT '過期時間',`feight_fee` DECIMAL(16,2) COMMENT '運費',`feight_fee_reduce` DECIMAL(16,2) COMMENT '運費減免',`activity_reduce_amount` DECIMAL(16,2) COMMENT '活動減免',`coupon_reduce_amount` DECIMAL(16,2) COMMENT '優惠券減免',`original_amount` DECIMAL(16,2) COMMENT '訂單原始價格',`final_amount` DECIMAL(16,2) COMMENT '訂單最終價格'
) COMMENT '訂單事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_order_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)分區規劃
3)數據裝載
(1)首日裝載
insert overwrite table dwd_order_info partition(dt)
selectoi.id,oi.order_status,oi.user_id,oi.province_id,oi.payment_way,oi.delivery_address,oi.out_trade_no,oi.tracking_no,oi.create_time,times.ts['1002'] payment_time,times.ts['1003'] cancel_time,times.ts['1004'] finish_time,times.ts['1005'] refund_time,times.ts['1006'] refund_finish_time,oi.expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amount,casewhen times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='2020-06-14' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')else '9999-99-99'end
from
(select*from ods_order_infowhere dt='2020-06-14'
)oi
left join
(selectorder_id,str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') tsfrom ods_order_status_logwhere dt='2020-06-14'group by order_id
)times
on oi.id=times.order_id;
(2)每日裝載
insert overwrite table dwd_order_info partition(dt)
selectnvl(new.id,old.id),nvl(new.order_status,old.order_status),nvl(new.user_id,old.user_id),nvl(new.province_id,old.province_id),nvl(new.payment_way,old.payment_way),nvl(new.delivery_address,old.delivery_address),nvl(new.out_trade_no,old.out_trade_no),nvl(new.tracking_no,old.tracking_no),nvl(new.create_time,old.create_time),nvl(new.payment_time,old.payment_time),nvl(new.cancel_time,old.cancel_time),nvl(new.finish_time,old.finish_time),nvl(new.refund_time,old.refund_time),nvl(new.refund_finish_time,old.refund_finish_time),nvl(new.expire_time,old.expire_time),nvl(new.feight_fee,old.feight_fee),nvl(new.feight_fee_reduce,old.feight_fee_reduce),nvl(new.activity_reduce_amount,old.activity_reduce_amount),nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),nvl(new.original_amount,old.original_amount),nvl(new.final_amount,old.final_amount),casewhen new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='2020-06-15' and new.refund_time is null then '2020-06-15'when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')else '9999-99-99'end
from
(selectid,order_status,user_id,province_id,payment_way,delivery_address,out_trade_no,tracking_no,create_time,payment_time,cancel_time,finish_time,refund_time,refund_finish_time,expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amountfrom dwd_order_infowhere dt='9999-99-99'
)old
full outer join
(selectoi.id,oi.order_status,oi.user_id,oi.province_id,oi.payment_way,oi.delivery_address,oi.out_trade_no,oi.tracking_no,oi.create_time,times.ts['1002'] payment_time,times.ts['1003'] cancel_time,times.ts['1004'] finish_time,times.ts['1005'] refund_time,times.ts['1006'] refund_finish_time,oi.expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amountfrom(select*from ods_order_infowhere dt='2020-06-15')oileft join(selectorder_id,str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') tsfrom ods_order_status_logwhere dt='2020-06-15'group by order_id)timeson oi.id=times.order_id
)new
on old.id=new.id;
6.2.10 DWD層業務數據首日裝載腳本
1)編寫腳本
(1)在/home/seven/bin目錄下創建腳本ods_to_dwd_db_init.sh
[seven@hadoop102 bin]$ vim ods_to_dwd_db_init.sh
在腳本中填寫如下內容
#!/bin/bash
APP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "請傳入日期參數"exit
fi dwd_order_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_info partition(dt)
selectoi.id,oi.order_status,oi.user_id,oi.province_id,oi.payment_way,oi.delivery_address,oi.out_trade_no,oi.tracking_no,oi.create_time,times.ts['1002'] payment_time,times.ts['1003'] cancel_time,times.ts['1004'] finish_time,times.ts['1005'] refund_time,times.ts['1006'] refund_finish_time,oi.expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amount,casewhen times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='$do_date' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')else '9999-99-99'end
from
(select*from ${APP}.ods_order_infowhere dt='$do_date'
)oi
left join
(selectorder_id,str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') tsfrom ${APP}.ods_order_status_logwhere dt='$do_date'group by order_id
)times
on oi.id=times.order_id;"dwd_order_detail="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_detail partition(dt)
selectod.id,od.order_id,oi.user_id,od.sku_id,oi.province_id,oda.activity_id,oda.activity_rule_id,odc.coupon_id,od.create_time,od.source_type,od.source_id,od.sku_num,od.order_price*od.sku_num,od.split_activity_amount,od.split_coupon_amount,od.split_final_amount,date_format(create_time,'yyyy-MM-dd')
from
(select*from ${APP}.ods_order_detailwhere dt='$do_date'
)od
left join
(selectid,user_id,province_idfrom ${APP}.ods_order_infowhere dt='$do_date'
)oi
on od.order_id=oi.id
left join
(selectorder_detail_id,activity_id,activity_rule_idfrom ${APP}.ods_order_detail_activitywhere dt='$do_date'
)oda
on od.id=oda.order_detail_id
left join
(selectorder_detail_id,coupon_idfrom ${APP}.ods_order_detail_couponwhere dt='$do_date'
)odc
on od.id=odc.order_detail_id;"dwd_payment_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_payment_info partition(dt)
selectpi.id,pi.order_id,pi.user_id,oi.province_id,pi.trade_no,pi.out_trade_no,pi.payment_type,pi.payment_amount,pi.payment_status,pi.create_time,pi.callback_time,nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(select * from ${APP}.ods_payment_info where dt='$do_date'
)pi
left join
(select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on pi.order_id=oi.id;"dwd_cart_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
selectid,user_id,sku_id,source_type,source_id,cart_price,is_ordered,create_time,operate_time,order_time,sku_num
from ${APP}.ods_cart_info
where dt='$do_date';"dwd_comment_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_comment_info partition(dt)
selectid,user_id,sku_id,spu_id,order_id,appraise,create_time,date_format(create_time,'yyyy-MM-dd')
from ${APP}.ods_comment_info
where dt='$do_date';
"dwd_favor_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
selectid,user_id,sku_id,spu_id,is_cancel,create_time,cancel_time
from ${APP}.ods_favor_info
where dt='$do_date';"dwd_coupon_use="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_coupon_use partition(dt)
selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_time,coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
from ${APP}.ods_coupon_use
where dt='$do_date';"dwd_order_refund_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_refund_info partition(dt)
selectri.id,ri.user_id,ri.order_id,ri.sku_id,oi.province_id,ri.refund_type,ri.refund_num,ri.refund_amount,ri.refund_reason_type,ri.create_time,date_format(ri.create_time,'yyyy-MM-dd')
from
(select * from ${APP}.ods_order_refund_info where dt='$do_date'
)ri
left join
(select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on ri.order_id=oi.id;"dwd_refund_payment="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_refund_payment partition(dt)
selectrp.id,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_time,nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(selectid,out_trade_no,order_id,sku_id,payment_type,trade_no,refund_amount,refund_status,create_time,callback_timefrom ${APP}.ods_refund_paymentwhere dt='$do_date'
)rp
left join
(selectid,user_id,province_idfrom ${APP}.ods_order_infowhere dt='$do_date'
)oi
on rp.order_id=oi.id;"case $1 indwd_order_info )hive -e "$dwd_order_info";;dwd_order_detail )hive -e "$dwd_order_detail";;dwd_payment_info )hive -e "$dwd_payment_info";;dwd_cart_info )hive -e "$dwd_cart_info";;dwd_comment_info )hive -e "$dwd_comment_info";;dwd_favor_info )hive -e "$dwd_favor_info";;dwd_coupon_use )hive -e "$dwd_coupon_use";;dwd_order_refund_info )hive -e "$dwd_order_refund_info";;dwd_refund_payment )hive -e "$dwd_refund_payment";;all )hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment";;
esac
(2)增加執行權限
[seven@hadoop102 bin]$ chmod +x ods_to_dwd_db_init.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 bin]$ ods_to_dwd_db_init.sh all?2020-06-14
(2)查看數據是否導入成功
6.2.11 DWD層業務數據每日裝載腳本
1)編寫腳本
(1)在/home/seven/bin目錄下創建腳本ods_to_dwd_db.sh
[seven@hadoop102 bin]$ vim ods_to_dwd_db.sh
在腳本中填寫如下內容
#!/bin/bashAPP=gmall
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fi# 假設某累積型快照事實表,某天所有的業務記錄全部完成,則會導致9999-99-99分區的數據未被覆蓋,從而導致數據重復,該函數根據9999-99-99分區的數據的末次修改時間判斷其是否被覆蓋了,如果未被覆蓋,就手動清理
clear_data(){current_date=`date +%F`current_date_timestamp=`date -d "$current_date" +%s`last_modified_date=`hadoop fs -ls /warehouse/gmall/dwd/$1 | grep '9999-99-99' | awk '{print $6}'`last_modified_date_timestamp=`date -d "$last_modified_date" +%s`if [[ $last_modified_date_timestamp -lt $current_date_timestamp ]]; thenecho "clear table $1 partition(dt=9999-99-99)"hadoop fs -rm -r -f /warehouse/gmall/dwd/$1/dt=9999-99-99/*fi
}dwd_order_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_order_info partition(dt)
selectnvl(new.id,old.id),nvl(new.order_status,old.order_status),nvl(new.user_id,old.user_id),nvl(new.province_id,old.province_id),nvl(new.payment_way,old.payment_way),nvl(new.delivery_address,old.delivery_address),nvl(new.out_trade_no,old.out_trade_no),nvl(new.tracking_no,old.tracking_no),nvl(new.create_time,old.create_time),nvl(new.payment_time,old.payment_time),nvl(new.cancel_time,old.cancel_time),nvl(new.finish_time,old.finish_time),nvl(new.refund_time,old.refund_time),nvl(new.refund_finish_time,old.refund_finish_time),nvl(new.expire_time,old.expire_time),nvl(new.feight_fee,old.feight_fee),nvl(new.feight_fee_reduce,old.feight_fee_reduce),nvl(new.activity_reduce_amount,old.activity_reduce_amount),nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),nvl(new.original_amount,old.original_amount),nvl(new.final_amount,old.final_amount),casewhen new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='$do_date' and new.refund_time is null then '$do_date'when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')else '9999-99-99'end
from
(selectid,order_status,user_id,province_id,payment_way,delivery_address,out_trade_no,tracking_no,create_time,payment_time,cancel_time,finish_time,refund_time,refund_finish_time,expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amountfrom ${APP}.dwd_order_infowhere dt='9999-99-99'
)old
full outer join
(selectoi.id,oi.order_status,oi.user_id,oi.province_id,oi.payment_way,oi.delivery_address,oi.out_trade_no,oi.tracking_no,oi.create_time,times.ts['1002'] payment_time,times.ts['1003'] cancel_time,times.ts['1004'] finish_time,times.ts['1005'] refund_time,times.ts['1006'] refund_finish_time,oi.expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amountfrom(select*from ${APP}.ods_order_infowhere dt='$do_date')oileft join(selectorder_id,str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') tsfrom ${APP}.ods_order_status_logwhere dt='$do_date'group by order_id)timeson oi.id=times.order_id
)new
on old.id=new.id;"dwd_order_detail="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_detail partition(dt='$do_date')
selectod.id,od.order_id,oi.user_id,od.sku_id,oi.province_id,oda.activity_id,oda.activity_rule_id,odc.coupon_id,od.create_time,od.source_type,od.source_id,od.sku_num,od.order_price*od.sku_num,od.split_activity_amount,od.split_coupon_amount,od.split_final_amount
from
(select*from ${APP}.ods_order_detailwhere dt='$do_date'
)od
left join
(selectid,user_id,province_idfrom ${APP}.ods_order_infowhere dt='$do_date'
)oi
on od.order_id=oi.id
left join
(selectorder_detail_id,activity_id,activity_rule_idfrom ${APP}.ods_order_detail_activitywhere dt='$do_date'
)oda
on od.id=oda.order_detail_id
left join
(selectorder_detail_id,coupon_idfrom ${APP}.ods_order_detail_couponwhere dt='$do_date'
)odc
on od.id=odc.order_detail_id;"dwd_payment_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_payment_info partition(dt)
selectnvl(new.id,old.id),nvl(new.order_id,old.order_id),nvl(new.user_id,old.user_id),nvl(new.province_id,old.province_id),nvl(new.trade_no,old.trade_no),nvl(new.out_trade_no,old.out_trade_no),nvl(new.payment_type,old.payment_type),nvl(new.payment_amount,old.payment_amount),nvl(new.payment_status,old.payment_status),nvl(new.create_time,old.create_time),nvl(new.callback_time,old.callback_time),nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(select id,order_id,user_id,province_id,trade_no,out_trade_no,payment_type,payment_amount,payment_status,create_time,callback_timefrom ${APP}.dwd_payment_infowhere dt = '9999-99-99'
)old
full outer join
(selectpi.id,pi.out_trade_no,pi.order_id,pi.user_id,oi.province_id,pi.payment_type,pi.trade_no,pi.payment_amount,pi.payment_status,pi.create_time,pi.callback_timefrom(select * from ${APP}.ods_payment_info where dt='$do_date')pileft join(select id,province_id from ${APP}.ods_order_info where dt='$do_date')oion pi.order_id=oi.id
)new
on old.id=new.id;"dwd_cart_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
selectid,user_id,sku_id,source_type,source_id,cart_price,is_ordered,create_time,operate_time,order_time,sku_num
from ${APP}.ods_cart_info
where dt='$do_date';"dwd_comment_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_comment_info partition(dt='$do_date')
selectid,user_id,sku_id,spu_id,order_id,appraise,create_time
from ${APP}.ods_comment_info where dt='$do_date';"dwd_favor_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
selectid,user_id,sku_id,spu_id,is_cancel,create_time,cancel_time
from ${APP}.ods_favor_info
where dt='$do_date';"dwd_coupon_use="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_coupon_use partition(dt)
selectnvl(new.id,old.id),nvl(new.coupon_id,old.coupon_id),nvl(new.user_id,old.user_id),nvl(new.order_id,old.order_id),nvl(new.coupon_status,old.coupon_status),nvl(new.get_time,old.get_time),nvl(new.using_time,old.using_time),nvl(new.used_time,old.used_time),nvl(new.expire_time,old.expire_time),coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
from
(selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_timefrom ${APP}.dwd_coupon_usewhere dt='9999-99-99'
)old
full outer join
(selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_timefrom ${APP}.ods_coupon_usewhere dt='$do_date'
)new
on old.id=new.id;"dwd_order_refund_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_refund_info partition(dt='$do_date')
selectri.id,ri.user_id,ri.order_id,ri.sku_id,oi.province_id,ri.refund_type,ri.refund_num,ri.refund_amount,ri.refund_reason_type,ri.create_time
from
(select * from ${APP}.ods_order_refund_info where dt='$do_date'
)ri
left join
(select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on ri.order_id=oi.id;"dwd_refund_payment="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_refund_payment partition(dt)
selectnvl(new.id,old.id),nvl(new.user_id,old.user_id),nvl(new.order_id,old.order_id),nvl(new.sku_id,old.sku_id),nvl(new.province_id,old.province_id),nvl(new.trade_no,old.trade_no),nvl(new.out_trade_no,old.out_trade_no),nvl(new.payment_type,old.payment_type),nvl(new.refund_amount,old.refund_amount),nvl(new.refund_status,old.refund_status),nvl(new.create_time,old.create_time),nvl(new.callback_time,old.callback_time),nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(selectid,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_timefrom ${APP}.dwd_refund_paymentwhere dt='9999-99-99'
)old
full outer join
(selectrp.id,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_timefrom(selectid,out_trade_no,order_id,sku_id,payment_type,trade_no,refund_amount,refund_status,create_time,callback_timefrom ${APP}.ods_refund_paymentwhere dt='$do_date')rpleft join(selectid,user_id,province_idfrom ${APP}.ods_order_infowhere dt='$do_date')oion rp.order_id=oi.id
)new
on old.id=new.id;"case $1 indwd_order_info )hive -e "$dwd_order_info"clear_data dwd_order_info;;dwd_order_detail )hive -e "$dwd_order_detail";;dwd_payment_info )hive -e "$dwd_payment_info"clear_data dwd_payment_info;;dwd_cart_info )hive -e "$dwd_cart_info";;dwd_comment_info )hive -e "$dwd_comment_info";;dwd_favor_info )hive -e "$dwd_favor_info";;dwd_coupon_use )hive -e "$dwd_coupon_use"clear_data dwd_coupon_use;;dwd_order_refund_info )hive -e "$dwd_order_refund_info";;dwd_refund_payment )hive -e "$dwd_refund_payment"clear_data dwd_refund_payment;;all )hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment"clear_data dwd_order_infoclear_data dwd_payment_infoclear_data dwd_coupon_useclear_data dwd_refund_payment;;
esac
(2)增加腳本執行權限
[seven@hadoop102 bin]$ chmod 777 ods_to_dwd_db.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 bin]$ ods_to_dwd_db.sh all?2020-06-14
(2)查看數據是否導入成功
第7章 數倉搭建-DWS層
7.1 系統函數
7.1.1 nvl函數
1)基本語法
NVL(表達式1,表達式2)
如果表達式1為空值,NVL返回值為表達式2的值,否則返回表達式1的值。
該函數的目的是把一個空值(null)轉換成一個實際的值。其表達式的值可以是數字型、字符型和日期型。但是表達式1和表達式2的數據類型必須為同一個類型。
2)案例實操
hive (gmall)> select nvl(1,0);
1
hive (gmall)> select nvl(null,"hello");
hello
7.1.2 日期處理函數
1)date_format函數(根據格式整理日期)
hive (gmall)> select date_format('2020-06-14','yyyy-MM');
2020-06
2)date_add函數(加減日期)
hive (gmall)> select date_add('2020-06-14',-1);
2020-06-13
hive (gmall)> select date_add('2020-06-14',1);
2020-06-15
3)next_day函數
(1)取當前天的下一個周一
hive (gmall)> select next_day('2020-06-14','MO');
2020-06-15
說明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)
(2)取當前周的周一
hive (gmall)>?select date_add(next_day('2020-06-14','MO'),-7);
2020-06-8
4)last_day函數(求當月最后一天日期)
hive (gmall)> select last_day('2020-06-14');
2020-06-30
7.1.3 復雜數據類型定義
1)map結構數據定義
????????map<string,string>
2)array結構數據定義
????????array<string>
3)struct結構數據定義
????????struct<id:int,name:string,age:int>
4)struct和array嵌套定義
????????array<struct<id:int,name:string,age:int>>
7.2 DWS層
7.2.1 訪客主題
1)建表語句
DROP TABLE IF EXISTS dws_visitor_action_daycount;
CREATE EXTERNAL TABLE dws_visitor_action_daycount
(`mid_id` STRING COMMENT '設備id',`brand` STRING COMMENT '設備品牌',`model` STRING COMMENT '設備型號',`is_new` STRING COMMENT '是否首次訪問',`channel` ARRAY<STRING> COMMENT '渠道',`os` ARRAY<STRING> COMMENT '操作系統',`area_code` ARRAY<STRING> COMMENT '地區ID',`version_code` ARRAY<STRING> COMMENT '應用版本',`visit_count` BIGINT COMMENT '訪問次數',`page_stats` ARRAY<STRUCT<page_id:STRING,page_count:BIGINT,during_time:BIGINT>> COMMENT '頁面訪問統計'
) COMMENT '每日設備行為表'
PARTITIONED BY(`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_visitor_action_daycount'
TBLPROPERTIES ("parquet.compression"="lzo");
2)數據裝載
insert overwrite table dws_visitor_action_daycount partition(dt='2020-06-14')
selectt1.mid_id,t1.brand,t1.model,t1.is_new,t1.channel,t1.os,t1.area_code,t1.version_code,t1.visit_count,t3.page_stats
from
(selectmid_id,brand,model,if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天內,同一設備的is_new字段,可能全部為1,可能全部為0,也可能部分為0,部分為1(卸載重裝),故做該處理collect_set(channel) channel,collect_set(os) os,collect_set(area_code) area_code,collect_set(version_code) version_code,sum(if(last_page_id is null,1,0)) visit_countfrom dwd_page_logwhere dt='2020-06-14'and last_page_id is nullgroup by mid_id,model,brand
)t1
join
(selectmid_id,brand,model,collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_statsfrom(selectmid_id,brand,model,page_id,count(*) page_count,sum(during_time) during_timefrom dwd_page_logwhere dt='2020-06-14'group by mid_id,model,brand,page_id)t2group by mid_id,model,brand
)t3
on t1.mid_id=t3.mid_id
and t1.brand=t3.brand
and t1.model=t3.model;
3)查詢加載結果
7.2.2 用戶主題
1)建表語句
DROP TABLE IF EXISTS dws_user_action_daycount;
CREATE EXTERNAL TABLE dws_user_action_daycount
(`user_id` STRING COMMENT '用戶id',`login_count` BIGINT COMMENT '登錄次數',`cart_count` BIGINT COMMENT '加入購物車次數',`favor_count` BIGINT COMMENT '收藏次數',`order_count` BIGINT COMMENT '下單次數',`order_activity_count` BIGINT COMMENT '訂單參與活動次數',`order_activity_reduce_amount` DECIMAL(16,2) COMMENT '訂單減免金額(活動)',`order_coupon_count` BIGINT COMMENT '訂單用券次數',`order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '訂單減免金額(優惠券)',`order_original_amount` DECIMAL(16,2) COMMENT '訂單單原始金額',`order_final_amount` DECIMAL(16,2) COMMENT '訂單總金額',`payment_count` BIGINT COMMENT '支付次數',`payment_amount` DECIMAL(16,2) COMMENT '支付金額',`refund_order_count` BIGINT COMMENT '退單次數',`refund_order_num` BIGINT COMMENT '退單件數',`refund_order_amount` DECIMAL(16,2) COMMENT '退單金額',`refund_payment_count` BIGINT COMMENT '退款次數',`refund_payment_num` BIGINT COMMENT '退款件數',`refund_payment_amount` DECIMAL(16,2) COMMENT '退款金額',`coupon_get_count` BIGINT COMMENT '優惠券領取次數',`coupon_using_count` BIGINT COMMENT '優惠券使用(下單)次數',`coupon_used_count` BIGINT COMMENT '優惠券使用(支付)次數',`appraise_good_count` BIGINT COMMENT '好評數',`appraise_mid_count` BIGINT COMMENT '中評數',`appraise_bad_count` BIGINT COMMENT '差評數',`appraise_default_count` BIGINT COMMENT '默認評價數',`order_detail_stats` array<struct<sku_id:string,sku_num:bigint,order_count:bigint,activity_reduce_amount:decimal(16,2),coupon_reduce_amount:decimal(16,2),original_amount:decimal(16,2),final_amount:decimal(16,2)>> COMMENT '下單明細統計'
) COMMENT '每日用戶行為'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_user_action_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)數據裝載
(1)首日裝載
with
tmp_login as
(selectdt,user_id,count(*) login_countfrom dwd_page_logwhere user_id is not nulland last_page_id is nullgroup by dt,user_id
),
tmp_cf as
(selectdt,user_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom dwd_action_logwhere user_id is not nulland action_id in ('cart_add','favor_add')group by dt,user_id
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,count(*) order_count,sum(if(activity_reduce_amount>0,1,0)) order_activity_count,sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,sum(activity_reduce_amount) order_activity_reduce_amount,sum(coupon_reduce_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom dwd_order_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,user_id,count(*) payment_count,sum(payment_amount) payment_amountfrom dwd_payment_infogroup by date_format(callback_time,'yyyy-MM-dd'),user_id
),
tmp_ri as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,rp.user_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(rp.refund_amount) refund_payment_amountfrom(selectuser_id,order_id,sku_id,refund_amount,callback_timefrom dwd_refund_payment)rpleft join(selectuser_id,order_id,sku_id,refund_numfrom dwd_order_refund_info)rion rp.order_id=ri.order_idand rp.sku_id=rp.sku_idgroup by date_format(callback_time,'yyyy-MM-dd'),rp.user_id
),
tmp_coupon as
(selectcoalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt,coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id,nvl(coupon_get_count,0) coupon_get_count,nvl(coupon_using_count,0) coupon_using_count,nvl(coupon_used_count,0) coupon_used_countfrom(selectdate_format(get_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_get_countfrom dwd_coupon_usewhere get_time is not nullgroup by user_id,date_format(get_time,'yyyy-MM-dd'))coupon_getfull outer join(selectdate_format(using_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_using_countfrom dwd_coupon_usewhere using_time is not nullgroup by user_id,date_format(using_time,'yyyy-MM-dd'))coupon_usingon coupon_get.dt=coupon_using.dtand coupon_get.user_id=coupon_using.user_idfull outer join(selectdate_format(used_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_used_countfrom dwd_coupon_usewhere used_time is not nullgroup by user_id,date_format(used_time,'yyyy-MM-dd'))coupon_usedon nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dtand nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id
),
tmp_comment as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom dwd_comment_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_od as
(selectdt,user_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_statsfrom(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,cast(sum(original_amount) as decimal(16,2)) original_amount,cast(sum(split_final_amount) as decimal(16,2)) final_amountfrom dwd_order_detailgroup by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id)t1group by dt,user_id
)
insert overwrite table dws_user_action_daycount partition(dt)
selectcoalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),nvl(login_count,0),nvl(cart_count,0),nvl(favor_count,0),nvl(order_count,0),nvl(order_activity_count,0),nvl(order_activity_reduce_amount,0),nvl(order_coupon_count,0),nvl(order_coupon_reduce_amount,0),nvl(order_original_amount,0),nvl(order_final_amount,0),nvl(payment_count,0),nvl(payment_amount,0),nvl(refund_order_count,0),nvl(refund_order_num,0),nvl(refund_order_amount,0),nvl(refund_payment_count,0),nvl(refund_payment_num,0),nvl(refund_payment_amount,0),nvl(coupon_get_count,0),nvl(coupon_using_count,0),nvl(coupon_used_count,0),nvl(appraise_good_count,0),nvl(appraise_mid_count,0),nvl(appraise_bad_count,0),nvl(appraise_default_count,0),order_detail_stats,coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt)
from tmp_login
full outer join tmp_cf
on tmp_login.user_id=tmp_cf.user_id
and tmp_login.dt=tmp_cf.dt
full outer join tmp_order
on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt
full outer join tmp_pay
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt
full outer join tmp_ri
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt
full outer join tmp_rp
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt
full outer join tmp_comment
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt
full outer join tmp_coupon
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt
full outer join tmp_od
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt;
(2)每日裝載
with
tmp_login as
(selectuser_id,count(*) login_countfrom dwd_page_logwhere dt='2020-06-15'and user_id is not nulland last_page_id is nullgroup by user_id
),
tmp_cf as
(selectuser_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom dwd_action_logwhere dt='2020-06-15'and user_id is not nulland action_id in ('cart_add','favor_add')group by user_id
),
tmp_order as
(selectuser_id,count(*) order_count,sum(if(activity_reduce_amount>0,1,0)) order_activity_count,sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,sum(activity_reduce_amount) order_activity_reduce_amount,sum(coupon_reduce_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom dwd_order_infowhere (dt='2020-06-15'or dt='9999-99-99')and date_format(create_time,'yyyy-MM-dd')='2020-06-15'group by user_id
),
tmp_pay as
(selectuser_id,count(*) payment_count,sum(payment_amount) payment_amountfrom dwd_payment_infowhere dt='2020-06-15'group by user_id
),
tmp_ri as
(selectuser_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infowhere dt='2020-06-15'group by user_id
),
tmp_rp as
(selectrp.user_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(rp.refund_amount) refund_payment_amountfrom(selectuser_id,order_id,sku_id,refund_amountfrom dwd_refund_paymentwhere dt='2020-06-15')rpleft join(selectuser_id,order_id,sku_id,refund_numfrom dwd_order_refund_infowhere dt>=date_add('2020-06-15',-15))rion rp.order_id=ri.order_idand rp.sku_id=rp.sku_idgroup by rp.user_id
),
tmp_coupon as
(selectuser_id,sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_get_count,sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_using_count,sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_used_countfrom dwd_coupon_usewhere (dt='2020-06-15' or dt='9999-99-99')and (date_format(get_time, 'yyyy-MM-dd') = '2020-06-15'or date_format(using_time,'yyyy-MM-dd')='2020-06-15'or date_format(used_time,'yyyy-MM-dd')='2020-06-15')group by user_id
),
tmp_comment as
(selectuser_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom dwd_comment_infowhere dt='2020-06-15'group by user_id
),
tmp_od as
(selectuser_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_statsfrom(selectuser_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,cast(sum(original_amount) as decimal(16,2)) original_amount,cast(sum(split_final_amount) as decimal(16,2)) final_amountfrom dwd_order_detailwhere dt='2020-06-15'group by user_id,sku_id)t1group by user_id
)
insert overwrite table dws_user_action_daycount partition(dt='2020-06-15')
selectcoalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),nvl(login_count,0),nvl(cart_count,0),nvl(favor_count,0),nvl(order_count,0),nvl(order_activity_count,0),nvl(order_activity_reduce_amount,0),nvl(order_coupon_count,0),nvl(order_coupon_reduce_amount,0),nvl(order_original_amount,0),nvl(order_final_amount,0),nvl(payment_count,0),nvl(payment_amount,0),nvl(refund_order_count,0),nvl(refund_order_num,0),nvl(refund_order_amount,0),nvl(refund_payment_count,0),nvl(refund_payment_num,0),nvl(refund_payment_amount,0),nvl(coupon_get_count,0),nvl(coupon_using_count,0),nvl(coupon_used_count,0),nvl(appraise_good_count,0),nvl(appraise_mid_count,0),nvl(appraise_bad_count,0),nvl(appraise_default_count,0),order_detail_stats
from tmp_login
full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id
full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id;
3)查詢加載結果
7.2.3 商品主題
1)建表語句
DROP TABLE IF EXISTS dws_sku_action_daycount;
CREATE EXTERNAL TABLE dws_sku_action_daycount
(`sku_id` STRING COMMENT 'sku_id',`order_count` BIGINT COMMENT '被下單次數',`order_num` BIGINT COMMENT '被下單件數',`order_activity_count` BIGINT COMMENT '參與活動被下單次數',`order_coupon_count` BIGINT COMMENT '使用優惠券被下單次數',`order_activity_reduce_amount` DECIMAL(16,2) COMMENT '優惠金額(活動)',`order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '優惠金額(優惠券)',`order_original_amount` DECIMAL(16,2) COMMENT '被下單原價金額',`order_final_amount` DECIMAL(16,2) COMMENT '被下單最終金額',`payment_count` BIGINT COMMENT '被支付次數',`payment_num` BIGINT COMMENT '被支付件數',`payment_amount` DECIMAL(16,2) COMMENT '被支付金額',`refund_order_count` BIGINT COMMENT '被退單次數',`refund_order_num` BIGINT COMMENT '被退單件數',`refund_order_amount` DECIMAL(16,2) COMMENT '被退單金額',`refund_payment_count` BIGINT COMMENT '被退款次數',`refund_payment_num` BIGINT COMMENT '被退款件數',`refund_payment_amount` DECIMAL(16,2) COMMENT '被退款金額',`cart_count` BIGINT COMMENT '被加入購物車次數',`favor_count` BIGINT COMMENT '被收藏次數',`appraise_good_count` BIGINT COMMENT '好評數',`appraise_mid_count` BIGINT COMMENT '中評數',`appraise_bad_count` BIGINT COMMENT '差評數',`appraise_default_count` BIGINT COMMENT '默認評價數'
) COMMENT '每日商品行為'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_sku_action_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)數據裝載
(1)首日裝載
with
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,count(*) order_count,sum(sku_num) order_num,sum(if(split_activity_amount>0,1,0)) order_activity_count,sum(if(split_coupon_amount>0,1,0)) order_coupon_count,sum(split_activity_amount) order_activity_reduce_amount,sum(split_coupon_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailgroup by date_format(create_time,'yyyy-MM-dd'),sku_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,sku_id,count(*) payment_count,sum(sku_num) payment_num,sum(split_final_amount) payment_amountfrom dwd_order_detail odjoin(selectorder_id,callback_timefrom dwd_payment_infowhere callback_time is not null)pi on pi.order_id=od.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),sku_id
),
tmp_ri as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),sku_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,rp.sku_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(refund_amount) refund_payment_amountfrom(selectorder_id,sku_id,refund_amount,callback_timefrom dwd_refund_payment)rpleft join(selectorder_id,sku_id,refund_numfrom dwd_order_refund_info)rion rp.order_id=ri.order_idand rp.sku_id=ri.sku_idgroup by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id
),
tmp_cf as
(selectdt,item sku_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom dwd_action_logwhere action_id in ('cart_add','favor_add')group by dt,item
),
tmp_comment as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom dwd_comment_infogroup by date_format(create_time,'yyyy-MM-dd'),sku_id
)
insert overwrite table dws_sku_action_daycount partition(dt)
selectsku_id,sum(order_count),sum(order_num),sum(order_activity_count),sum(order_coupon_count),sum(order_activity_reduce_amount),sum(order_coupon_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_order_count),sum(refund_order_num),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_num),sum(refund_payment_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count),dt
from
(selectdt,sku_id,order_count,order_num,order_activity_count,order_coupon_count,order_activity_reduce_amount,order_coupon_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_num,payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_payunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,refund_order_count,refund_order_num,refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_riunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,refund_payment_count,refund_payment_num,refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_rpunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cfunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_comment
)t1
group by dt,sku_id;
(2)每日裝載
with
tmp_order as
(selectsku_id,count(*) order_count,sum(sku_num) order_num,sum(if(split_activity_amount>0,1,0)) order_activity_count,sum(if(split_coupon_amount>0,1,0)) order_coupon_count,sum(split_activity_amount) order_activity_reduce_amount,sum(split_coupon_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere dt='2020-06-15'group by sku_id
),
tmp_pay as
(selectsku_id,count(*) payment_count,sum(sku_num) payment_num,sum(split_final_amount) payment_amountfrom dwd_order_detailwhere (dt='2020-06-15'or dt=date_add('2020-06-15',-1))and order_id in(select order_id from dwd_payment_info where dt='2020-06-15')group by sku_id
),
tmp_ri as
(selectsku_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infowhere dt='2020-06-15'group by sku_id
),
tmp_rp as
(selectrp.sku_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(refund_amount) refund_payment_amountfrom(selectorder_id,sku_id,refund_amountfrom dwd_refund_paymentwhere dt='2020-06-15')rpleft join(selectorder_id,sku_id,refund_numfrom dwd_order_refund_infowhere dt>=date_add('2020-06-15',-15))rion rp.order_id=ri.order_idand rp.sku_id=ri.sku_idgroup by rp.sku_id
),
tmp_cf as
(selectitem sku_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom dwd_action_logwhere dt='2020-06-15'and action_id in ('cart_add','favor_add')group by item
),
tmp_comment as
(selectsku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom dwd_comment_infowhere dt='2020-06-15'group by sku_id
)
insert overwrite table dws_sku_action_daycount partition(dt='2020-06-15')
selectsku_id,sum(order_count),sum(order_num),sum(order_activity_count),sum(order_coupon_count),sum(order_activity_reduce_amount),sum(order_coupon_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_order_count),sum(refund_order_num),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_num),sum(refund_payment_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count)
from
(selectsku_id,order_count,order_num,order_activity_count,order_coupon_count,order_activity_reduce_amount,order_coupon_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_num,payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_payunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,refund_order_count,refund_order_num,refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_riunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,refund_payment_count,refund_payment_num,refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_rpunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cfunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_comment
)t1
group by sku_id;
3)查詢加載結果
7.2.4 優惠券主題
1)建表語句
DROP TABLE IF EXISTS dws_coupon_info_daycount;
CREATE EXTERNAL TABLE dws_coupon_info_daycount(`coupon_id` STRING COMMENT '優惠券ID',`get_count` BIGINT COMMENT '被領取次數',`order_count` BIGINT COMMENT '被使用(下單)次數', `order_reduce_amount` DECIMAL(16,2) COMMENT '用券下單優惠金額',`order_original_amount` DECIMAL(16,2) COMMENT '用券訂單原價金額',`order_final_amount` DECIMAL(16,2) COMMENT '用券下單最終金額',`payment_count` BIGINT COMMENT '被使用(支付)次數',`payment_reduce_amount` DECIMAL(16,2) COMMENT '用券支付優惠金額',`payment_amount` DECIMAL(16,2) COMMENT '用券支付總金額',`expire_count` BIGINT COMMENT '過期次數'
) COMMENT '每日活動統計'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_coupon_info_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)數據裝載
(1)首日裝載
with
tmp_cu as
(selectcoalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt,coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id,nvl(get_count,0) get_count,nvl(order_count,0) order_count,nvl(payment_count,0) payment_count,nvl(expire_count,0) expire_countfrom(selectdate_format(get_time,'yyyy-MM-dd') dt,coupon_id,count(*) get_countfrom dwd_coupon_usegroup by date_format(get_time,'yyyy-MM-dd'),coupon_id)coupon_getfull outer join(selectdate_format(using_time,'yyyy-MM-dd') dt,coupon_id,count(*) order_countfrom dwd_coupon_usewhere using_time is not nullgroup by date_format(using_time,'yyyy-MM-dd'),coupon_id)coupon_usingon coupon_get.dt=coupon_using.dtand coupon_get.coupon_id=coupon_using.coupon_idfull outer join(selectdate_format(used_time,'yyyy-MM-dd') dt,coupon_id,count(*) payment_countfrom dwd_coupon_usewhere used_time is not nullgroup by date_format(used_time,'yyyy-MM-dd'),coupon_id)coupon_usedon nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dtand nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_idfull outer join(selectdate_format(expire_time,'yyyy-MM-dd') dt,coupon_id,count(*) expire_countfrom dwd_coupon_usewhere expire_time is not nullgroup by date_format(expire_time,'yyyy-MM-dd'),coupon_id)coupon_exprieon coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dtand coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,coupon_id,sum(split_coupon_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere coupon_id is not nullgroup by date_format(create_time,'yyyy-MM-dd'),coupon_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,coupon_id,sum(split_coupon_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom(selectorder_id,coupon_id,split_coupon_amount,split_final_amountfrom dwd_order_detailwhere coupon_id is not null)odjoin(selectorder_id,callback_timefrom dwd_payment_info)pion od.order_id=pi.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),coupon_id
)
insert overwrite table dws_coupon_info_daycount partition(dt)
selectcoupon_id,sum(get_count),sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),sum(expire_count),dt
from
(selectdt,coupon_id,get_count,order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,0 payment_reduce_amount,0 payment_amount,expire_countfrom tmp_cuunion allselectdt,coupon_id,0 get_count,0 order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amount,0 expire_countfrom tmp_orderunion allselectdt,coupon_id,0 get_count,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,payment_reduce_amount,payment_amount,0 expire_countfrom tmp_pay
)t1
group by dt,coupon_id;
(2)每日裝載
with
tmp_cu as
(selectcoupon_id,sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) get_count,sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) order_count,sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) payment_count,sum(if(date_format(expire_time,'yyyy-MM-dd')='2020-06-15',1,0)) expire_countfrom dwd_coupon_usewhere dt='9999-99-99'or dt='2020-06-15'group by coupon_id
),
tmp_order as
(selectcoupon_id,sum(split_coupon_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere dt='2020-06-15'and coupon_id is not nullgroup by coupon_id
),
tmp_pay as
(selectcoupon_id,sum(split_coupon_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom dwd_order_detailwhere (dt='2020-06-15'or dt=date_add('2020-06-15',-1))and coupon_id is not nulland order_id in(select order_id from dwd_payment_info where dt='2020-06-15')group by coupon_id
)
insert overwrite table dws_coupon_info_daycount partition(dt='2020-06-15')
selectcoupon_id,sum(get_count),sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),sum(expire_count)
from
(selectcoupon_id,get_count,order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,0 payment_reduce_amount,0 payment_amount,expire_countfrom tmp_cuunion allselectcoupon_id,0 get_count,0 order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amount,0 expire_countfrom tmp_orderunion allselectcoupon_id,0 get_count,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,payment_reduce_amount,payment_amount,0 expire_countfrom tmp_pay
)t1
group by coupon_id;
3)查詢加載結果
7.2.5 活動主題
1)建表語句
DROP TABLE IF EXISTS dws_activity_info_daycount;
CREATE EXTERNAL TABLE dws_activity_info_daycount(`activity_rule_id` STRING COMMENT '活動規則ID',`activity_id` STRING COMMENT '活動ID',`order_count` BIGINT COMMENT '參與某活動某規則下單次數', `order_reduce_amount` DECIMAL(16,2) COMMENT '參與某活動某規則下單減免金額',`order_original_amount` DECIMAL(16,2) COMMENT '參與某活動某規則下單原始金額',`order_final_amount` DECIMAL(16,2) COMMENT '參與某活動某規則下單最終金額',`payment_count` BIGINT COMMENT '參與某活動某規則支付次數',`payment_reduce_amount` DECIMAL(16,2) COMMENT '參與某活動某規則支付減免金額',`payment_amount` DECIMAL(16,2) COMMENT '參與某活動某規則支付金額'
) COMMENT '每日活動統計'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_activity_info_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)數據裝載
(1)首日裝載
with
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,activity_rule_id,activity_id,count(*) order_count,sum(split_activity_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere activity_id is not nullgroup by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,activity_rule_id,activity_id,count(*) payment_count,sum(split_activity_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom(selectactivity_rule_id,activity_id,order_id,split_activity_amount,split_final_amountfrom dwd_order_detailwhere activity_id is not null)odjoin(selectorder_id,callback_timefrom dwd_payment_info)pion od.order_id=pi.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id
)
insert overwrite table dws_activity_info_daycount partition(dt)
selectactivity_rule_id,activity_id,sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),dt
from
(selectdt,activity_rule_id,activity_id,order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amountfrom tmp_orderunion allselectdt,activity_rule_id,activity_id,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_reduce_amount,payment_amountfrom tmp_pay
)t1
group by dt,activity_rule_id,activity_id;
(2)每日裝載
with
tmp_order as
(selectactivity_rule_id,activity_id,count(*) order_count,sum(split_activity_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere dt='2020-06-15'and activity_id is not nullgroup by activity_rule_id,activity_id
),
tmp_pay as
(selectactivity_rule_id,activity_id,count(*) payment_count,sum(split_activity_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom dwd_order_detailwhere (dt='2020-06-15'or dt=date_add('2020-06-15',-1))and activity_id is not nulland order_id in(select order_id from dwd_payment_info where dt='2020-06-15')group by activity_rule_id,activity_id
)
insert overwrite table dws_activity_info_daycount partition(dt='2020-06-15')
selectactivity_rule_id,activity_id,sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount)
from
(selectactivity_rule_id,activity_id,order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amountfrom tmp_orderunion allselectactivity_rule_id,activity_id,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_reduce_amount,payment_amountfrom tmp_pay
)t1
group by activity_rule_id,activity_id;
3)查詢加載結果
7.2.6 地區主題
1)建表語句
DROP TABLE IF EXISTS dws_area_stats_daycount;
CREATE EXTERNAL TABLE dws_area_stats_daycount(`province_id` STRING COMMENT '地區編號',`visit_count` BIGINT COMMENT '訪問次數',`login_count` BIGINT COMMENT '登錄次數',`visitor_count` BIGINT COMMENT '訪客人數',`user_count` BIGINT COMMENT '用戶人數',`order_count` BIGINT COMMENT '下單次數',`order_original_amount` DECIMAL(16,2) COMMENT '下單原始金額',`order_final_amount` DECIMAL(16,2) COMMENT '下單最終金額',`payment_count` BIGINT COMMENT '支付次數',`payment_amount` DECIMAL(16,2) COMMENT '支付金額',`refund_order_count` BIGINT COMMENT '退單次數',`refund_order_amount` DECIMAL(16,2) COMMENT '退單金額',`refund_payment_count` BIGINT COMMENT '退款次數',`refund_payment_amount` DECIMAL(16,2) COMMENT '退款金額'
) COMMENT '每日地區統計表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_area_stats_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");
2)數據裝載
(1)首日裝載
with
tmp_vu as
(selectdt,id province_id,visit_count,login_count,visitor_count,user_countfrom(selectdt,area_code,count(*) visit_count,--訪客訪問次數count(user_id) login_count,--用戶訪問次數,等價于sum(if(user_id is not null,1,0))count(distinct(mid_id)) visitor_count,--訪客人數count(distinct(user_id)) user_count--用戶人數from dwd_page_logwhere last_page_id is nullgroup by dt,area_code)tmpleft join dim_base_province areaon tmp.area_code=area.area_code
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,province_id,count(*) order_count,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom dwd_order_infogroup by date_format(create_time,'yyyy-MM-dd'),province_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,province_id,count(*) payment_count,sum(payment_amount) payment_amountfrom dwd_payment_infogroup by date_format(callback_time,'yyyy-MM-dd'),province_id
),
tmp_ro as
(selectdate_format(create_time,'yyyy-MM-dd') dt,province_id,count(*) refund_order_count,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),province_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,province_id,count(*) refund_payment_count,sum(refund_amount) refund_payment_amountfrom dwd_refund_paymentgroup by date_format(callback_time,'yyyy-MM-dd'),province_id
)
insert overwrite table dws_area_stats_daycount partition(dt)
selectprovince_id,sum(visit_count),sum(login_count),sum(visitor_count),sum(user_count),sum(order_count),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_amount),sum(refund_order_count),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_amount),dt
from
(selectdt,province_id,visit_count,login_count,visitor_count,user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_vuunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,order_count,order_original_amount,order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_orderunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,payment_count,payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_payunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,refund_order_count,refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_rounion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,refund_payment_count,refund_payment_amountfrom tmp_rp
)t1
group by dt,province_id;
(2)每日裝載
with
tmp_vu as
(selectid province_id,visit_count,login_count,visitor_count,user_countfrom(selectarea_code,count(*) visit_count,--訪客訪問次數count(user_id) login_count,--用戶訪問次數,等價于sum(if(user_id is not null,1,0))count(distinct(mid_id)) visitor_count,--訪客人數count(distinct(user_id)) user_count--用戶人數from dwd_page_logwhere dt='2020-06-15'and last_page_id is nullgroup by area_code)tmpleft join dim_base_province areaon tmp.area_code=area.area_code
),
tmp_order as
(selectprovince_id,count(*) order_count,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom dwd_order_infowhere dt='2020-06-15'or dt='9999-99-99'and date_format(create_time,'yyyy-MM-dd')='2020-06-15'group by province_id
),
tmp_pay as
(selectprovince_id,count(*) payment_count,sum(payment_amount) payment_amountfrom dwd_payment_infowhere dt='2020-06-15'group by province_id
),
tmp_ro as
(selectprovince_id,count(*) refund_order_count,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infowhere dt='2020-06-15'group by province_id
),
tmp_rp as
(selectprovince_id,count(*) refund_payment_count,sum(refund_amount) refund_payment_amountfrom dwd_refund_paymentwhere dt='2020-06-15'group by province_id
)
insert overwrite table dws_area_stats_daycount partition(dt='2020-06-15')
selectprovince_id,sum(visit_count),sum(login_count),sum(visitor_count),sum(user_count),sum(order_count),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_amount),sum(refund_order_count),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_amount)
from
(selectprovince_id,visit_count,login_count,visitor_count,user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_vuunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,order_count,order_original_amount,order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_orderunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,payment_count,payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_payunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,refund_order_count,refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_rounion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,refund_payment_count,refund_payment_amountfrom tmp_rp
)t1
group by province_id;
3)查詢加載結果
7.2.7 DWS層首日數據裝載腳本
1)編寫腳本
(1)在/home/seven/bin目錄下創建腳本dwd_to_dws_init.sh
#!/bin/bashAPP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "請傳入日期參數"exit
fidws_visitor_action_daycount="
insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date')
selectt1.mid_id,t1.brand,t1.model,t1.is_new,t1.channel,t1.os,t1.area_code,t1.version_code,t1.visit_count,t3.page_stats
from
(selectmid_id,brand,model,if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天內,同一設備的is_new字段,可能全部為1,可能全部為0,也可能部分為0,部分為1(卸載重裝),故做該處理collect_set(channel) channel,collect_set(os) os,collect_set(area_code) area_code,collect_set(version_code) version_code,sum(if(last_page_id is null,1,0)) visit_countfrom ${APP}.dwd_page_logwhere dt='$do_date'and last_page_id is nullgroup by mid_id,model,brand
)t1
join
(selectmid_id,brand,model,collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_statsfrom(selectmid_id,brand,model,page_id,count(*) page_count,sum(during_time) during_timefrom ${APP}.dwd_page_logwhere dt='$do_date'group by mid_id,model,brand,page_id)t2group by mid_id,model,brand
)t3
on t1.mid_id=t3.mid_id
and t1.brand=t3.brand
and t1.model=t3.model;
"dws_area_stats_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_vu as
(selectdt,id province_id,visit_count,login_count,visitor_count,user_countfrom(selectdt,area_code,count(*) visit_count,--訪客訪問次數count(user_id) login_count,--用戶訪問次數,等價于sum(if(user_id is not null,1,0))count(distinct(mid_id)) visitor_count,--訪客人數count(distinct(user_id)) user_count--用戶人數from ${APP}.dwd_page_logwhere last_page_id is nullgroup by dt,area_code)tmpleft join ${APP}.dim_base_province areaon tmp.area_code=area.area_code
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,province_id,count(*) order_count,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom ${APP}.dwd_order_infogroup by date_format(create_time,'yyyy-MM-dd'),province_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,province_id,count(*) payment_count,sum(payment_amount) payment_amountfrom ${APP}.dwd_payment_infogroup by date_format(callback_time,'yyyy-MM-dd'),province_id
),
tmp_ro as
(selectdate_format(create_time,'yyyy-MM-dd') dt,province_id,count(*) refund_order_count,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),province_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,province_id,count(*) refund_payment_count,sum(refund_amount) refund_payment_amountfrom ${APP}.dwd_refund_paymentgroup by date_format(callback_time,'yyyy-MM-dd'),province_id
)
insert overwrite table ${APP}.dws_area_stats_daycount partition(dt)
selectprovince_id,sum(visit_count),sum(login_count),sum(visitor_count),sum(user_count),sum(order_count),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_amount),sum(refund_order_count),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_amount),dt
from
(selectdt,province_id,visit_count,login_count,visitor_count,user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_vuunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,order_count,order_original_amount,order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_orderunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,payment_count,payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_payunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,refund_order_count,refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_rounion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,refund_payment_count,refund_payment_amountfrom tmp_rp
)t1
group by dt,province_id;
"dws_user_action_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_login as
(selectdt,user_id,count(*) login_countfrom ${APP}.dwd_page_logwhere user_id is not nulland last_page_id is nullgroup by dt,user_id
),
tmp_cf as
(selectdt,user_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom ${APP}.dwd_action_logwhere user_id is not nulland action_id in ('cart_add','favor_add')group by dt,user_id
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,count(*) order_count,sum(if(activity_reduce_amount>0,1,0)) order_activity_count,sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,sum(activity_reduce_amount) order_activity_reduce_amount,sum(coupon_reduce_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom ${APP}.dwd_order_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,user_id,count(*) payment_count,sum(payment_amount) payment_amountfrom ${APP}.dwd_payment_infogroup by date_format(callback_time,'yyyy-MM-dd'),user_id
),
tmp_ri as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,rp.user_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(rp.refund_amount) refund_payment_amountfrom(selectuser_id,order_id,sku_id,refund_amount,callback_timefrom ${APP}.dwd_refund_payment)rpleft join(selectuser_id,order_id,sku_id,refund_numfrom ${APP}.dwd_order_refund_info)rion rp.order_id=ri.order_idand rp.sku_id=rp.sku_idgroup by date_format(callback_time,'yyyy-MM-dd'),rp.user_id
),
tmp_coupon as
(selectcoalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt,coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id,nvl(coupon_get_count,0) coupon_get_count,nvl(coupon_using_count,0) coupon_using_count,nvl(coupon_used_count,0) coupon_used_countfrom(selectdate_format(get_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_get_countfrom ${APP}.dwd_coupon_usewhere get_time is not nullgroup by user_id,date_format(get_time,'yyyy-MM-dd'))coupon_getfull outer join(selectdate_format(using_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_using_countfrom ${APP}.dwd_coupon_usewhere using_time is not nullgroup by user_id,date_format(using_time,'yyyy-MM-dd'))coupon_usingon coupon_get.dt=coupon_using.dtand coupon_get.user_id=coupon_using.user_idfull outer join(selectdate_format(used_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_used_countfrom ${APP}.dwd_coupon_usewhere used_time is not nullgroup by user_id,date_format(used_time,'yyyy-MM-dd'))coupon_usedon nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dtand nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id
),
tmp_comment as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom ${APP}.dwd_comment_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_od as
(selectdt,user_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_statsfrom(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,cast(sum(original_amount) as decimal(16,2)) original_amount,cast(sum(split_final_amount) as decimal(16,2)) final_amountfrom ${APP}.dwd_order_detailgroup by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id)t1group by dt,user_id
)
insert overwrite table ${APP}.dws_user_action_daycount partition(dt)
selectcoalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),nvl(login_count,0),nvl(cart_count,0),nvl(favor_count,0),nvl(order_count,0),nvl(order_activity_count,0),nvl(order_activity_reduce_amount,0),nvl(order_coupon_count,0),nvl(order_coupon_reduce_amount,0),nvl(order_original_amount,0),nvl(order_final_amount,0),nvl(payment_count,0),nvl(payment_amount,0),nvl(refund_order_count,0),nvl(refund_order_num,0),nvl(refund_order_amount,0),nvl(refund_payment_count,0),nvl(refund_payment_num,0),nvl(refund_payment_amount,0),nvl(coupon_get_count,0),nvl(coupon_using_count,0),nvl(coupon_used_count,0),nvl(appraise_good_count,0),nvl(appraise_mid_count,0),nvl(appraise_bad_count,0),nvl(appraise_default_count,0),order_detail_stats,coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt)
from tmp_login
full outer join tmp_cf
on tmp_login.user_id=tmp_cf.user_id
and tmp_login.dt=tmp_cf.dt
full outer join tmp_order
on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt
full outer join tmp_pay
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt
full outer join tmp_ri
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt
full outer join tmp_rp
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt
full outer join tmp_comment
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt
full outer join tmp_coupon
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt
full outer join tmp_od
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt;
"dws_activity_info_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,activity_rule_id,activity_id,count(*) order_count,sum(split_activity_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere activity_id is not nullgroup by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,activity_rule_id,activity_id,count(*) payment_count,sum(split_activity_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom(selectactivity_rule_id,activity_id,order_id,split_activity_amount,split_final_amountfrom ${APP}.dwd_order_detailwhere activity_id is not null)odjoin(selectorder_id,callback_timefrom ${APP}.dwd_payment_info)pion od.order_id=pi.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id
)
insert overwrite table ${APP}.dws_activity_info_daycount partition(dt)
selectactivity_rule_id,activity_id,sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),dt
from
(selectdt,activity_rule_id,activity_id,order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amountfrom tmp_orderunion allselectdt,activity_rule_id,activity_id,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_reduce_amount,payment_amountfrom tmp_pay
)t1
group by dt,activity_rule_id,activity_id;"dws_sku_action_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,count(*) order_count,sum(sku_num) order_num,sum(if(split_activity_amount>0,1,0)) order_activity_count,sum(if(split_coupon_amount>0,1,0)) order_coupon_count,sum(split_activity_amount) order_activity_reduce_amount,sum(split_coupon_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailgroup by date_format(create_time,'yyyy-MM-dd'),sku_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,sku_id,count(*) payment_count,sum(sku_num) payment_num,sum(split_final_amount) payment_amountfrom ${APP}.dwd_order_detail odjoin(selectorder_id,callback_timefrom ${APP}.dwd_payment_infowhere callback_time is not null)pi on pi.order_id=od.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),sku_id
),
tmp_ri as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),sku_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,rp.sku_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(refund_amount) refund_payment_amountfrom(selectorder_id,sku_id,refund_amount,callback_timefrom ${APP}.dwd_refund_payment)rpleft join(selectorder_id,sku_id,refund_numfrom ${APP}.dwd_order_refund_info)rion rp.order_id=ri.order_idand rp.sku_id=ri.sku_idgroup by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id
),
tmp_cf as
(selectdt,item sku_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom ${APP}.dwd_action_logwhere action_id in ('cart_add','favor_add')group by dt,item
),
tmp_comment as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom ${APP}.dwd_comment_infogroup by date_format(create_time,'yyyy-MM-dd'),sku_id
)
insert overwrite table ${APP}.dws_sku_action_daycount partition(dt)
selectsku_id,sum(order_count),sum(order_num),sum(order_activity_count),sum(order_coupon_count),sum(order_activity_reduce_amount),sum(order_coupon_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_order_count),sum(refund_order_num),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_num),sum(refund_payment_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count),dt
from
(selectdt,sku_id,order_count,order_num,order_activity_count,order_coupon_count,order_activity_reduce_amount,order_coupon_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_num,payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_payunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,refund_order_count,refund_order_num,refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_riunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,refund_payment_count,refund_payment_num,refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_rpunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cfunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_comment
)t1
group by dt,sku_id;"dws_coupon_info_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_cu as
(selectcoalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt,coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id,nvl(get_count,0) get_count,nvl(order_count,0) order_count,nvl(payment_count,0) payment_count,nvl(expire_count,0) expire_countfrom(selectdate_format(get_time,'yyyy-MM-dd') dt,coupon_id,count(*) get_countfrom ${APP}.dwd_coupon_usegroup by date_format(get_time,'yyyy-MM-dd'),coupon_id)coupon_getfull outer join(selectdate_format(using_time,'yyyy-MM-dd') dt,coupon_id,count(*) order_countfrom ${APP}.dwd_coupon_usewhere using_time is not nullgroup by date_format(using_time,'yyyy-MM-dd'),coupon_id)coupon_usingon coupon_get.dt=coupon_using.dtand coupon_get.coupon_id=coupon_using.coupon_idfull outer join(selectdate_format(used_time,'yyyy-MM-dd') dt,coupon_id,count(*) payment_countfrom ${APP}.dwd_coupon_usewhere used_time is not nullgroup by date_format(used_time,'yyyy-MM-dd'),coupon_id)coupon_usedon nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dtand nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_idfull outer join(selectdate_format(expire_time,'yyyy-MM-dd') dt,coupon_id,count(*) expire_countfrom ${APP}.dwd_coupon_usewhere expire_time is not nullgroup by date_format(expire_time,'yyyy-MM-dd'),coupon_id)coupon_exprieon coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dtand coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,coupon_id,sum(split_coupon_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere coupon_id is not nullgroup by date_format(create_time,'yyyy-MM-dd'),coupon_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,coupon_id,sum(split_coupon_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom(selectorder_id,coupon_id,split_coupon_amount,split_final_amountfrom ${APP}.dwd_order_detailwhere coupon_id is not null)odjoin(selectorder_id,callback_timefrom ${APP}.dwd_payment_info)pion od.order_id=pi.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),coupon_id
)
insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt)
selectcoupon_id,sum(get_count),sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),sum(expire_count),dt
from
(selectdt,coupon_id,get_count,order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,0 payment_reduce_amount,0 payment_amount,expire_countfrom tmp_cuunion allselectdt,coupon_id,0 get_count,0 order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amount,0 expire_countfrom tmp_orderunion allselectdt,coupon_id,0 get_count,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,payment_reduce_amount,payment_amount,0 expire_countfrom tmp_pay
)t1
group by dt,coupon_id;
"case $1 in"dws_visitor_action_daycount" )hive -e "$dws_visitor_action_daycount";;"dws_user_action_daycount" )hive -e "$dws_user_action_daycount";;"dws_activity_info_daycount" )hive -e "$dws_activity_info_daycount";;"dws_area_stats_daycount" )hive -e "$dws_area_stats_daycount";;"dws_sku_action_daycount" )hive -e "$dws_sku_action_daycount";;"dws_coupon_info_daycount" )hive -e "$dws_coupon_info_daycount";;"all" )hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount";;
esac
(2)增加執行權限
[seven@hadoop102 bin]$ chmod +x dwd_to_dws_init.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 bin]$ dwd_to_dws_init.sh all?2020-06-14
(2)查看數據是否導入成功
7.2.8 DWS層每日數據裝載腳本
1)編寫腳本
(1)在/home/seven/bin目錄下創建腳本dwd_to_dws.sh
#!/bin/bashAPP=gmall
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fidws_visitor_action_daycount="insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date')
selectt1.mid_id,t1.brand,t1.model,t1.is_new,t1.channel,t1.os,t1.area_code,t1.version_code,t1.visit_count,t3.page_stats
from
(selectmid_id,brand,model,if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天內,同一設備的is_new字段,可能全部為1,可能全部為0,也可能部分為0,部分為1(卸載重裝),故做該處理collect_set(channel) channel,collect_set(os) os,collect_set(area_code) area_code,collect_set(version_code) version_code,sum(if(last_page_id is null,1,0)) visit_countfrom ${APP}.dwd_page_logwhere dt='$do_date'and last_page_id is nullgroup by mid_id,model,brand
)t1
join
(selectmid_id,brand,model,collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_statsfrom(selectmid_id,brand,model,page_id,count(*) page_count,sum(during_time) during_timefrom ${APP}.dwd_page_logwhere dt='$do_date'group by mid_id,model,brand,page_id)t2group by mid_id,model,brand
)t3
on t1.mid_id=t3.mid_id
and t1.brand=t3.brand
and t1.model=t3.model;"dws_user_action_daycount="
with
tmp_login as
(selectuser_id,count(*) login_countfrom ${APP}.dwd_page_logwhere dt='$do_date'and user_id is not nulland last_page_id is nullgroup by user_id
),
tmp_cf as
(selectuser_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom ${APP}.dwd_action_logwhere dt='$do_date'and user_id is not nulland action_id in ('cart_add','favor_add')group by user_id
),
tmp_order as
(selectuser_id,count(*) order_count,sum(if(activity_reduce_amount>0,1,0)) order_activity_count,sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,sum(activity_reduce_amount) order_activity_reduce_amount,sum(coupon_reduce_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom ${APP}.dwd_order_infowhere (dt='$do_date'or dt='9999-99-99')and date_format(create_time,'yyyy-MM-dd')='$do_date'group by user_id
),
tmp_pay as
(selectuser_id,count(*) payment_count,sum(payment_amount) payment_amountfrom ${APP}.dwd_payment_infowhere dt='$do_date'group by user_id
),
tmp_ri as
(selectuser_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infowhere dt='$do_date'group by user_id
),
tmp_rp as
(selectrp.user_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(rp.refund_amount) refund_payment_amountfrom(selectuser_id,order_id,sku_id,refund_amountfrom ${APP}.dwd_refund_paymentwhere dt='$do_date')rpleft join(selectuser_id,order_id,sku_id,refund_numfrom ${APP}.dwd_order_refund_infowhere dt>=date_add('$do_date',-15))rion rp.order_id=ri.order_idand rp.sku_id=rp.sku_idgroup by rp.user_id
),
tmp_coupon as
(selectuser_id,sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_get_count,sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_using_count,sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_used_countfrom ${APP}.dwd_coupon_usewhere (dt='$do_date' or dt='9999-99-99')and (date_format(get_time, 'yyyy-MM-dd') = '$do_date'or date_format(using_time,'yyyy-MM-dd')='$do_date'or date_format(used_time,'yyyy-MM-dd')='$do_date')group by user_id
),
tmp_comment as
(selectuser_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom ${APP}.dwd_comment_infowhere dt='$do_date'group by user_id
),
tmp_od as
(selectuser_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_statsfrom(selectuser_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,cast(sum(original_amount) as decimal(16,2)) original_amount,cast(sum(split_final_amount) as decimal(16,2)) final_amountfrom ${APP}.dwd_order_detailwhere dt='$do_date'group by user_id,sku_id)t1group by user_id
)
insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
selectcoalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),nvl(login_count,0),nvl(cart_count,0),nvl(favor_count,0),nvl(order_count,0),nvl(order_activity_count,0),nvl(order_activity_reduce_amount,0),nvl(order_coupon_count,0),nvl(order_coupon_reduce_amount,0),nvl(order_original_amount,0),nvl(order_final_amount,0),nvl(payment_count,0),nvl(payment_amount,0),nvl(refund_order_count,0),nvl(refund_order_num,0),nvl(refund_order_amount,0),nvl(refund_payment_count,0),nvl(refund_payment_num,0),nvl(refund_payment_amount,0),nvl(coupon_get_count,0),nvl(coupon_using_count,0),nvl(coupon_used_count,0),nvl(appraise_good_count,0),nvl(appraise_mid_count,0),nvl(appraise_bad_count,0),nvl(appraise_default_count,0),order_detail_stats
from tmp_login
full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id
full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id;
"dws_activity_info_daycount="
with
tmp_order as
(selectactivity_rule_id,activity_id,count(*) order_count,sum(split_activity_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere dt='$do_date'and activity_id is not nullgroup by activity_rule_id,activity_id
),
tmp_pay as
(selectactivity_rule_id,activity_id,count(*) payment_count,sum(split_activity_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom ${APP}.dwd_order_detailwhere (dt='$do_date'or dt=date_add('$do_date',-1))and activity_id is not nulland order_id in(select order_id from ${APP}.dwd_payment_info where dt='$do_date')group by activity_rule_id,activity_id
)
insert overwrite table ${APP}.dws_activity_info_daycount partition(dt='$do_date')
selectactivity_rule_id,activity_id,sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount)
from
(selectactivity_rule_id,activity_id,order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amountfrom tmp_orderunion allselectactivity_rule_id,activity_id,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_reduce_amount,payment_amountfrom tmp_pay
)t1
group by activity_rule_id,activity_id;"dws_sku_action_daycount="
with
tmp_order as
(selectsku_id,count(*) order_count,sum(sku_num) order_num,sum(if(split_activity_amount>0,1,0)) order_activity_count,sum(if(split_coupon_amount>0,1,0)) order_coupon_count,sum(split_activity_amount) order_activity_reduce_amount,sum(split_coupon_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere dt='$do_date'group by sku_id
),
tmp_pay as
(selectsku_id,count(*) payment_count,sum(sku_num) payment_num,sum(split_final_amount) payment_amountfrom ${APP}.dwd_order_detailwhere (dt='$do_date'or dt=date_add('$do_date',-1))and order_id in(select order_id from ${APP}.dwd_payment_info where dt='$do_date')group by sku_id
),
tmp_ri as
(selectsku_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infowhere dt='$do_date'group by sku_id
),
tmp_rp as
(selectrp.sku_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(refund_amount) refund_payment_amountfrom(selectorder_id,sku_id,refund_amountfrom ${APP}.dwd_refund_paymentwhere dt='$do_date')rpleft join(selectorder_id,sku_id,refund_numfrom ${APP}.dwd_order_refund_infowhere dt>=date_add('$do_date',-15))rion rp.order_id=ri.order_idand rp.sku_id=ri.sku_idgroup by rp.sku_id
),
tmp_cf as
(selectitem sku_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom ${APP}.dwd_action_logwhere dt='$do_date'and action_id in ('cart_add','favor_add')group by item
),
tmp_comment as
(selectsku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom ${APP}.dwd_comment_infowhere dt='$do_date'group by sku_id
)
insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
selectsku_id,sum(order_count),sum(order_num),sum(order_activity_count),sum(order_coupon_count),sum(order_activity_reduce_amount),sum(order_coupon_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_order_count),sum(refund_order_num),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_num),sum(refund_payment_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count)
from
(selectsku_id,order_count,order_num,order_activity_count,order_coupon_count,order_activity_reduce_amount,order_coupon_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_num,payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_payunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,refund_order_count,refund_order_num,refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_riunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,refund_payment_count,refund_payment_num,refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_rpunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cfunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_comment
)t1
group by sku_id;"dws_coupon_info_daycount="
with
tmp_cu as
(selectcoupon_id,sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) get_count,sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) order_count,sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) payment_count,sum(if(date_format(expire_time,'yyyy-MM-dd')='$do_date',1,0)) expire_countfrom ${APP}.dwd_coupon_usewhere dt='9999-99-99'or dt='$do_date'group by coupon_id
),
tmp_order as
(selectcoupon_id,sum(split_coupon_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere dt='$do_date'and coupon_id is not nullgroup by coupon_id
),
tmp_pay as
(selectcoupon_id,sum(split_coupon_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom ${APP}.dwd_order_detailwhere (dt='$do_date'or dt=date_add('$do_date',-1))and coupon_id is not nulland order_id in(select order_id from ${APP}.dwd_payment_info where dt='$do_date')group by coupon_id
)
insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt='$do_date')
selectcoupon_id,sum(get_count),sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),sum(expire_count)
from
(selectcoupon_id,get_count,order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,0 payment_reduce_amount,0 payment_amount,expire_countfrom tmp_cuunion allselectcoupon_id,0 get_count,0 order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amount,0 expire_countfrom tmp_orderunion allselectcoupon_id,0 get_count,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,payment_reduce_amount,payment_amount,0 expire_countfrom tmp_pay
)t1
group by coupon_id;"dws_area_stats_daycount="
with
tmp_vu as
(selectid province_id,visit_count,login_count,visitor_count,user_countfrom(selectarea_code,count(*) visit_count,--訪客訪問次數count(user_id) login_count,--用戶訪問次數,等價于sum(if(user_id is not null,1,0))count(distinct(mid_id)) visitor_count,--訪客人數count(distinct(user_id)) user_count--用戶人數from ${APP}.dwd_page_logwhere dt='$do_date'and last_page_id is nullgroup by area_code)tmpleft join ${APP}.dim_base_province areaon tmp.area_code=area.area_code
),
tmp_order as
(selectprovince_id,count(*) order_count,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom ${APP}.dwd_order_infowhere dt='$do_date'or dt='9999-99-99'and date_format(create_time,'yyyy-MM-dd')='$do_date'group by province_id
),
tmp_pay as
(selectprovince_id,count(*) payment_count,sum(payment_amount) payment_amountfrom ${APP}.dwd_payment_infowhere dt='$do_date'group by province_id
),
tmp_ro as
(selectprovince_id,count(*) refund_order_count,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infowhere dt='$do_date'group by province_id
),
tmp_rp as
(selectprovince_id,count(*) refund_payment_count,sum(refund_amount) refund_payment_amountfrom ${APP}.dwd_refund_paymentwhere dt='$do_date'group by province_id
)
insert overwrite table ${APP}.dws_area_stats_daycount partition(dt='$do_date')
selectprovince_id,sum(visit_count),sum(login_count),sum(visitor_count),sum(user_count),sum(order_count),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_amount),sum(refund_order_count),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_amount)
from
(selectprovince_id,visit_count,login_count,visitor_count,user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_vuunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,order_count,order_original_amount,order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_orderunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,payment_count,payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_payunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,refund_order_count,refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_rounion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,refund_payment_count,refund_payment_amountfrom tmp_rp
)t1
group by province_id;"case $1 in"dws_visitor_action_daycount" )hive -e "$dws_visitor_action_daycount";;"dws_user_action_daycount" )hive -e "$dws_user_action_daycount";;"dws_activity_info_daycount" )hive -e "$dws_activity_info_daycount";;"dws_area_stats_daycount" )hive -e "$dws_area_stats_daycount";;"dws_sku_action_daycount" )hive -e "$dws_sku_action_daycount";;"dws_coupon_info_daycount" )hive -e "$dws_coupon_info_daycount";;"all" )hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount";;
esac
(2)增加執行權限
[seven@hadoop102 bin]$ chmod +x dwd_to_dws.sh
2)腳本使用
(1)執行腳本
[seven@hadoop102 bin]$ dwd_to_dws.sh all?2020-06-14
(2)查看數據是否導入成功