源系統每日上傳一個csv數據文件到數據中臺指定目錄,數據中臺用hive表進行ETL工作。
先建一個外部分區表:
create external table tmp_lease_contract
(
contract_id string,
vin string,
amount float
)
partitioned by (dt string)
row format delimited
fields terminated by ","
stored as textfile
TBLPROPERTIES ('skip.header.line.count'='1')
location "/dmp/tmp/sales/lease_contract";
每日數據按命名規則存放到相應的./dt=20250718這樣的子目錄,再加一下分區信息:
alter table tmp_lease_contract add if not exists partition(dt='20250718');
select * from tmp_lease_contract where dt='20250718'
目錄示例如下:
/dmp/tmp/sales/lease_contract/
|-- dt=20250716
| |-- lease_contract_20250716.csv
|-- dt=20250715
| |-- lease_contract_20250715.csv