論文網址:[2401.10134] Spatial-Temporal Large Language Model for Traffic Prediction
論文代碼:GitHub - ChenxiLiu-HNU/ST-LLM: Official implementation of the paper "Spatial-Temporal Large Language Model for Traffic Prediction"
英文是純手打的!論文原文的summarizing and paraphrasing。可能會出現難以避免的拼寫錯誤和語法錯誤,若有發現歡迎評論指正!文章偏向于筆記,謹慎食用
目錄
1. 心得
2. 論文逐段精讀
2.1. Abstract
2.2. Introduction
2.3. Related Work
2.3.1.?Large Language Models for Time Series Analysis
2.3.2.?Traffic Prediction
2.4. Problem Definition
2.5. Methodology
2.5.1.?Overview
2.5.2.?Spatial-Temporal Embedding and Fusion
2.5.3.?Partially Frozen Attention (PFA) LLM
2.6. Experiments
2.6.1. Datasdets
2.6.2. Baselines
2.6.3. Implementations
2.6.4. Evaluation Metrics
2.6.5. Main Results
2.6.6.?Performance of ST-LLM and Ablation Studies
2.6.7.?Parameter Analysis
2.6.8.?Inference Time Analysis
2.6.9. Few-Shot Prediction
2.6.10.?Zero-Shot Prediction
2.7. Conclusion
3. Reference
1. 心得
(1)盡管幾天后要投的論文還沒開始寫,仍然嚼嚼餅干寫寫閱讀筆記。哎。這年頭大家都跑得太快了
(2)比起數學,LLM適合配一杯奶茶讀,全程輕松愉悅,這一篇就是分開三個卷積→合在一起→LLM(部分解凍一些模塊)→over
2. 論文逐段精讀
2.1. Abstract
? ? ? ? ①They proposed Spatial-Temporal Large Language Model (ST-LLM) to predict traffic(好像沒什么特別的我就不寫了,就是在介紹方法,說以前的精度不高。具體方法看以下圖吧)
2.2. Introduction
? ? ? ? ①Traditional CNN and RNN cannot capture complex/long range spatial and temporal dependencies. GNNs are prone to overfitting, thus reseachers mainly use attention mechanism.
? ? ? ? ②Existing traffic prediction methods mainly focus on temporal feature rather than spatial
? ? ? ? ③For better long term prediction, they proposed?partially frozen attention (PFA)
2.3. Related Work
2.3.1.?Large Language Models for Time Series Analysis
? ? ? ? ①Listing TEMPO-GPT, TIME-LLM, OFA, TEST, and LLM-TIME, which all utilize temporal feature only. However, GATGPT, which introduced spatial feature, ignores temporal dependencies.
imputation??n.歸責;歸罪;歸咎;歸因
2.3.2.?Traffic Prediction
? ? ? ? ①Filter is a common and classic method for processing traffic data
? ? ? ? ②Irrgular city net makes CNN hard to apply or extract spatial feature
2.4. Problem Definition
? ? ? ? ①Input traffic data:?, where?
?denotes timesteps,?
?denotes numberof spatial stations,?
?denotes feature
? ? ? ? ②Task: given historical traffic data??of?
?time steps only, learning a function?
?with parameter?
?to predict future?
?timesteps:?
:
2.5. Methodology
2.5.1.?Overview
? ? ? ? ①Overall framework of ST-LLM:
where Spatial-Temporal Embedding layer extracts?timesteps , spatial embedding
, and temporal embedding?
?of historical?
?timesteps. Then, they three are combined to?
. Freeze first?
?layers and preserve last?
?layers in PFA LLM and get output?
. Lastly, regresion convolution convert it to?
.
2.5.2.?Spatial-Temporal Embedding and Fusion
? ? ? ? ①They get?tokens by?pointwise convolution:
? ? ? ? ②Applying linear layer to encode input??to day?
?and week?
:
where??and?
?are learnable parameter and the output is?
? ? ? ? ③They extract spatial correlations by:
? ? ? ? ④Fusion convolution:
where?
2.5.3.?Partially Frozen Attention (PFA) LLM
? ? ? ? ①They freeze the first??layers (including multihead attention and feed-forward layers) which contains important information:
where?,?
,?
?denotes?learnable positional encoding,?
?represents the intermediate representation of the
-th layer after applying the frozen multi-head attention (MHA) and the first unfrozen layer normalization (LN),?
?symbolizes the final representation after applying the unfrozen LN and frozen feed-forward network (FFN), and:
? ? ? ? ②Unfreezing the last??layers:
? ? ? ? ③The final regresion convolution (RConv):
? ? ? ? ④Loss function:
where??is ground truth
? ? ? ? ⑤Algorithm:
2.6. Experiments
2.6.1. Datasdets
? ? ? ? ①Statistics of datasets:
? ? ? ? ②NYCTaxi: includes 266?virtual stations and?4,368 timesteps (each timestep is half-hour)
? ? ? ? ③CHBike: includes 250 sites and?4,368 timesteps (30 mins as well)
2.6.2. Baselines
? ? ? ? ①GNN based baselines: DCRNN, STGCN, GWN, AGCRN, STGNCDE, DGCRN
? ? ? ? ②Attention based model: ASTGCN, GMAN, ASTGNN
? ? ? ? ③LLMs: OFA, GATGPT, GCNGPT, LLAMA2
2.6.3. Implementations
? ? ? ? ①Data split: 6:2:2
? ? ? ? ②Historical and future timesteps:?
? ? ? ? ③
? ? ? ? ④Learning rate: 0.001 and Ranger21 optimizer for LLM and 0.001 and Adam for GCN and attention based
? ? ? ? ⑤LLM:?GPT2 and LLAMA2 7B
? ? ? ? ⑥Layer: 6 for?GPT2 and 8 for LLAMA2
? ? ? ? ⑦Epoch: 100
? ? ? ? ⑧Batch size: 64
2.6.4. Evaluation Metrics
? ? ? ? ①Metrics:?Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Weighted Absolute Percentage Error (WAPE)
2.6.5. Main Results
? ? ? ? ①Performance table:
2.6.6.?Performance of ST-LLM and Ablation Studies
? ? ? ? ①Module ablation:
? ? ? ? ②Frozen ablation:
2.6.7.?Parameter Analysis
? ? ? ? ①Hyperparameter??ablation:
2.6.8.?Inference Time Analysis
? ? ? ? ①Inference time table:
2.6.9. Few-Shot Prediction
? ? ? ? ①10% samples few-shot learning:
2.6.10.?Zero-Shot Prediction
? ? ? ? ①Performance:
2.7. Conclusion
? ? ? ? ~
3. Reference
@inproceedings{liu2024spatial,
? title={Spatial-Temporal Large Language Model for Traffic Prediction},
? author={Liu, Chenxi and Yang, Sun and Xu, Qianxiong and Li, Zhishuai and Long, Cheng and Li, Ziyue and Zhao, Rui},
? booktitle={MDM},
? year={2024}
}