目錄
1. prompt-tuning background
2. Prompt Tuning 模型介紹
2.1 2021 prefix-tuning?
2.2 2021 P-tuning v1
2.3 2021 Parameter-efficient prompt tuning (PET)
2.4 2022 P-tuning v2?
2.5 2019 Adapter?
?2.6 2021 LoRA (Low-Rank Adaptation)?
2.7 2024 DoRA (Weight-Decoupled Low-Rank Adaptation)
3. LoRA Implementation
3.1? LoRA 復現 01: MiniLoRA
3.1.1 core codes:torch.nn.utils.parametrize.register_parameterization?參數化應用函數
3.2 LoRA 復現 02: LoRA from Scratch on MNIST?
3.2.1 core codes: Lightning 深度學習框架?
3.3 LoRA 復現 03: Torch tutorial with torchtune
3.3.1 core codes:??torchtune package 介紹
3.4 LoRA 復現 04: peft implementation
3.4.1 core codes: AutoModelForSeq2SeqLM 介紹
3.4.2 code codes: peft package 介紹
3.5 *LoRA 05: Explanation
Reference:?
1. prompt-tuning background
problem: 之前的fune-tuning/model-tuning是對大模型進行下游任務re-training,即對whole模型參數進行微調!但由于LLM參數量太大,fine-tuning需要大量的數據、算力去更新學習參數,不夠實用!
solution:prompt-tuning (p-tuning),是一種通過提示詞(prompt tokens)優化生成式預訓練模型(e.g. GPT)的技術,旨在通過調整prompts而不是整個模型參數來提高模型在特定任務上的表現,達到節省計算開銷和資源消耗、保持甚至提升model performance的目的。
按照時間順序,prompt-tuning演進過程分別是:prefix-tuning、p-tuning v1、parameter-efficient prompt tuning、p-tuning v2。
2. Prompt Tuning 模型介紹
2.1 2021 prefix-tuning?
prefix-tuning, paper: Optimizing Continuous Prompts for Generation, 就是在input tokens前面加上幾個與任務相關task-specific的tokens,并用單獨訓練生成embeddings。
Note:tokens不拼接!原有的input tokens依舊用transformer生成embeddings,并且保持transformer參數不變。The prefix tokens' embeddings?, hi are drawn from a trainable matrix MLP~
. Then remaining tokens' embeddings are computed by the Transformer.
- 優點:實現簡單、高效訓練、任務一致性。
- 缺點:適用性有限,prefix-tuning在一些特定任務中效果不如p-tuning,e.g. 上下文限制,由于prefix embeddings始終位于序列前端,可能無法充分利用輸入序列的上下文信息。
2.2 2021 P-tuning v1
p-tuning v1, paper: GPT Understands, Too. 它通過在輸入層提示模板固定位置插入可訓練的提示詞向量trainable prompt tokens?embeddings,來提升模型性能。
problem: Previous prompts方法是離散discrete向量空間,主要是從詞庫V中選詞vi作為提示詞prompt來出入提示模板的第i個位置,并用prompt generator來生成提示詞向量prompt embeddings。這種固定的提示詞叫作hard prompt,只能用來微調整個模型的參數 pre-trained model parameters。
solution: p-tuning v1是連續continuous向量空間,主要是通過prompt encoder生成trainable parameterized prompt embeddings來代替詞庫詞vi插入輸入層,這種generated trainable prompts稱為soft prompt。
- 初始化 initialize prompts: <T1> <T2> The movie was fantastic <T3> <T4>. -> 訓練優化 -> 推理 inference,這時不BP。
- 優點:少量參數、提高性能、通用性強。
- 缺點:訓練復雜;依賴提示詞位置。
2.3 2021 Parameter-efficient prompt tuning (PET)
Parameter-efficient prompt tuning, paper: The power of scale for parameter-efficient prompt tuning, 可以在輸入序列的任意位置插入trianable prompt embeddings。
2.4 2022 P-tuning v2?
p-tuning v2, paper: Prompt tuning can be comparable to fine-tuning universally across scale and tasks,? 多層提示prompt,在每一層加上prefix prompt embeddings。?
problem: 在模型參數量小于10B的訓練中,prompt training效果要低于fine-tuning。
solution:p-tuning v2在每一層都加上了layer prefix prompt embeddings,不同任務可以共享相同的網絡參數,支持多任務學習。?
- 優點:可以更好地捕捉和利用上下文信息,進一步提高模型性能、更好泛化、靈活性強。
- 缺點:實現復雜;計算開銷增加。
2.5 2019 Adapter?
paper: Parameter-Efficient transfer learning for NLP.
2.6 2021 LoRA (Low-Rank Adaptation)?
paper: Low-Rank Adaptation of Large Language Models.
?
LoRA保持pre-trained model參數凍結,只在原始矩陣中添加一個參數,其參數比原始矩陣少。?
problem: 如果我們構造一個與Worig具有相同維度nxm的新矩陣來對模型進行微調,模型performance沒有提升!還會將參數加倍!
solution:所以設計鬼才提出了低秩概念r,通過基于低秩r的低維矩陣乘法來構造, r << n和r << m,B和A相乘會產生一個與
具有相同維度的矩陣,但由更少的參數構成。因為我們希望訓練開始時增量為零,讓微調像原始模型一樣開始。因此,B通常被初始化為零矩陣,而A被初始化為隨機值(即正態分布)。
For example,input dim=1024,那origin W參數量=1024*1024100萬,而低秩參數量=1024*4+4*1024?
8k。
優點:
- 效率高,使用更少的參數。?
- 提高泛化性能 《-- 通過限制模型復雜性,防止過擬合。
- 可以無縫集成到現有的神經網絡中。
2.7 2024 DoRA (Weight-Decoupled Low-Rank Adaptation)
核心:每個權重矩陣W通過多個低秩矩陣Ai和Bi的乘積進行近似,可以表示為:。
3. LoRA Implementation
LoRA實現公式:
my github link:?GitHub - yuyongsheng1990/LLM_Prompts
3.1? LoRA 復現 01: MiniLoRA
簡單、通俗、易懂、powerful
reference:minLoRA/demo.ipynb at main · cccntu/minLoRA · GitHub
3.1.1 core codes:torch.nn.utils.parametrize.register_parameterization?參數化應用函數
from functools import partial # 用于固定某些函數的參數,從而創建一個新的函數。這個新函數會記住被固定的參數,并在調用時使用這些固定參數。
'''
simple example: torch.nn.utils.parametrize.register_parametrizationoutput: 原始參數(weight或bias)會被替換為一個通過指定參數模塊生成的參數。Linear((weight): ParametrizationList((0): MyParametrization())(bias): Parameter containing: [torch.FloatTensor of size 5])
'''
# -----------------single lora parameters---------------
linear = nn.Linear(5, 5)
print(linear)
class LowRankParametrization(nn.Module):def __init__(self, original_weight, rank=4):super().__init__()self.rank = rankself.U = nn.Parameter(torch.randn(original_weight.size(0), rank))self.V = nn.Parameter(torch.randn(rank, original_weight.size(1)))def forward(self, x):return self.U @ self.V# 注冊低秩參數化
'''torch.nn.utils.parametrize.register_parametrization函數用于在模型的參數上注冊新的參數化方法。這個功能允許你在現有參數layer.weight上應用一些變換LoRAParametrization,特別適用于LoRA
'''
parametrize.register_parametrization(linear, 'weight', LowRankParametrization(linear.weight))
# ----------------multiple lora parameters-------------------
# 可以順序應用多個參數化方法,繼續加就行 <--對應DoRA
# 定義第二個參數化方法
class MultiplyByTwoParametrization(nn.Module):def __init__(self, original_weight, rank=4):super().__init__()self.rank = rankself.U = nn.Parameter(torch.randn(original_weight.size(0), rank))self.V = nn.Parameter(torch.randn(rank, original_weight.size(1)))def forward(self, x):return self.U @ self.V
parametrize.register_parametrization(linear, 'weight', MultiplyByTwoParametrization(linear.weight, rank=3))# 打印線性層,查看參數化后的結果
print(linear)
'''
output:Linear(in_features=5, out_features=5, bias=True) # 原始linear層-------------------------------------------------ParametrizedLinear( # 替換后的參數化線性層para linearin_features=5, out_features=5, bias=True # 這表示layer原始參數original weight(parametrizations): ModuleDict( # parametrizations表示應用參數化方法,新模型參數會存儲在ModuleDict中,ModuleDict是一個module容器,它像一個dict一樣工作。(weight): ParametrizationList( # 這表示weight原始參數現在被替換/應用了ParametrizationList中一個或多個參數化方法.(0): LowRankParametrization() # (0)表示ParametrizationList的第一個參數化方法。# (1): MultiplyByTwoParametrization() # 順序應用:當ParametrizationList存儲多個參數化方法時,所有方法會按順序應用到weight參數上。) ))
'''
3.2 LoRA 復現 02: LoRA from Scratch on MNIST?
reference:?lora_from_scratch/lora_on_mnist.ipynb at main · sunildkumar/lora_from_scratch · GitHub
3.2.1 core codes: Lightning 深度學習框架?
import lightning as L # lightning是一個高層次的深度學習框架,建立在pytorch之上,用于簡化和加速模型的開發和訓練過程。
from lightning.pytorch.loggers import CSVLogger # 用于將訓練日志記錄到csv文件中,便于之后的分析和可視化。
from lightning.pytorch.callbacks import LearningRateFinder # 通過在training過程中調整學習率lr來找到最優的學習率,以提升模型性能
from lightning.pytorch.callbacks.early_stopping import EarlyStopping # 用于在validation loss不再改善時提前停止,防止模型過擬合。from pytorch_lightning import Callback # 用于實現自定義的回調函數,在training過程中的特定時間點執行特定的操作,比如記錄日志、保存model、調整lr。
3.3 LoRA 復現 03: Torch tutorial with torchtune
reference:?Finetuning Llama2 with LoRA — TorchTune documentation
3.3.1 core codes:??torchtune package 介紹
from torchtune.models.llama2 import llama2_7b, lora_llama2_7b # torchtune是一個torch庫,用于輕松創作、微調和試驗LLM。
'''torchtune, https://pytorch.org/torchtune/stable/index.html- Llama3 in torchtune- Finetuning with LoRA in torchtune- Understanding QLoRA in TorchTune- End-to-End Workflow with torchtune
'''
3.4 LoRA 復現 04: peft implementation
reference:??LoRA-Implementation/prepare_data.py at main · hahuyhoang411/LoRA-Implementation · GitHub
3.4.1 core codes: AutoModelForSeq2SeqLM 介紹
'''from transformers import AutoModelForSeq2SeqLM, AutoTokenizer# 指定模型名稱或路徑model_name = "t5-small"# 加載預訓練模型和分詞器model = AutoModelForSeq2SeqLM.from_pretrained(model_name)tokenizer = AutoTokenizer.from_pretrained(model_name)# 輸入文本input_text = "Translate English to French: How are you?"# 編碼文本--成模型可接受的輸入格式inputs = tokenizer(input_text, return_tensors="pt")# 生成輸出outputs = model.generate(**inputs)# 解碼輸出文本output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)print(f"Input: {input_text}")print(f"Output: {output_text}")
'''
3.4.2 code codes: peft package 介紹
'''
peft (Parameter-Efficient Fine-Tuning) package introduction:Fine-tuning large pretrained models is often prohibitively costly due to their scale. PEFT methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a
small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.PEFT is integrated with Transformers for easy model training and inference,
peft簡化了LLM-finetuning 模型配置和加載功能,特別是使用LoRA等技術。- LoraConfig,用于配置LoRA參數。- TaskType,用于定義任務類型, e.g. task_type = TaskType.TEXT_GENERATION- get_peft_config,用于獲取peft配置- get_peft_model,用于獲取pretrained peft模型。
''''''
----------------peft翻譯模型---------------------
# 翻譯模型bigscience/mt0-large: English -> French
'''
# prepare a model for training with a PEFT method such as LoRA by wrapping the base model and PEFT configuration with get_peft_model.
# For the bigscience/mt0-large model, you are only training 0.19% of the parameters!
from transformers import AutoModelForSeq2SeqLM # 用于加載和處理pre-trained seq2seq模型,用于處理nlp任務
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType# 加載預訓練模型和分詞器
model_name = 'bigscience/mt0-large'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)# 定義lora配置
lora_config = LoraConfig(task_type = TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)# 獲取peft model
peft_model = get_peft_model(model, peft_config)
print(peft_model.print_trainable_parameters()) # 輸出peft mode可訓練參數# 準備輸入數據
input_text = "Translate English to French: How are you?"
inputs = tokenizer(input_text, return_tensors="pt")# 使用 PEFT 模型生成輸出
outputs = peft_model.generate(**inputs)
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True) # 解碼
print(outputs)
print(output_text)'''
------------peft因果推理模型----------------------
因果推理模型 ybelkada/opt-350m-lora; gpt2
'''
from peft import AutoPeftModelForCausalLM # 用于加載和配置因果語言模型Causal LM,并進行高效微調參數
from transformers import AutoTokenizer
import torchdevice = 'cuda' if torch.cuda.is_available() else 'cpu'
model = AutoPeftModelForCausalLM.from_pretrained('ybelkada/opt-350m-lora').to(device)
tokenizer = AutoTokenizer.from_pretrained('facebook/opt-350m')model.eval()
inputs = tokenizer('Preheat the oven to 350 degrees and place the cookie dough', return_tensors='pt')outputs = model.generate(input_ids=inputs['input_ids'].to(device), max_new_tokens=50) # 生成輸出
outputs_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0] # tokenizer解碼輸出文本
print(outputs)
print(outputs_text)
3.5 *LoRA 05: Explanation
***選看:太難、太復雜,不做實現嘍
reference:?使用Pytorch從零開始構建LoRA_torch lora 使用 nn-CSDN博客
3.5 *LoRA 06: huanhuan chat
***選看:太難、太復雜,不做實現嘍
reference:?https://github.com/datawhalechina/self-llm/blob/master/GLM-4/05-GLM-4-9B-chat%20Lora%20%E5%BE%AE%E8%B0%83.ipynb
Reference:?
[1]?He J, Zhou C, Ma X, Berg-Kirkpatrick T, Neubig G. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366. 2021 Oct 8.
[2]?https://mltalks.medium.com/%E8%AF%A6%E8%A7%A3%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%BE%AE%E8%B0%83%E6%96%B9%E6%B3%95prompt-tuning-%E5%86%85%E9%99%84%E5%AE%9E%E7%8E%B0%E4%BB%A3%E7%A0%81-7e4276927729