? ? ? ?接上回接著說,前面我們通過分析源碼,了解了大模型推理的詳細流程,包括提示詞從輸入,到對話模版包裝,到tokenID轉換,到Embedding詞向量轉換;通過大模型推理,再將大模型輸出進行最后一個token-tensor提取;再到模型的后處理,又包括多次Softmax計算、Top_p采樣、解碼為文本數據等;中間還穿插了對模型結構的分析;這樣一套下來,我們對大模型就有了一個比較清晰的認識,那接下來我們就進入下一個環節——探索LLM外掛LoRA的推理流程。
? ? ? ?說到LoRA,想必大家也不陌生,甚至還可以說出它的基本原理和實現的思路,比如它可以降低訓練成本是因為它減少了可訓練參數,可訓練參數可減少又是因為矩陣的向量冗余,可用低階矩陣來代替;再比如它可以外掛在指定的預訓練權重上不改變原模型的權重;但具體它是如何外掛的、哪些權重層可以外掛、如何將原權重和其對應的LoRA外掛一一對應呢?對于在寫這篇博客之前的我來說真的是一頭霧水,有點紙上談兵,真刀真槍干的時候就拉垮了。
? ? ? ?為了弄清LoRA在代碼上的具體實現,而不只是知道其原理,其優點;還是以Minimind項目為基礎,從代碼入手,探索其具體的實現流程。
一、LoRA
? ? ? ?此博客的主題是LoRA,就先從原理上‘紙上談兵’,分享一下我對LoRA的一些淺顯的認識,如有不準確、不恰當、不對的地方,還請各位大神指正;
? ? ? ?首先,模型的預訓練權重其實就是一堆標量組成的向量矩陣,LoRA也是一樣,它也是矩陣;說到矩陣肯定就離不開線性代數,我們大學線代里學了一個詞叫極大線性無關組,說的就是一個矩陣中所包含的向量不一定是相互獨立的,可能某幾個向量的線性組合就可以表示另外一個向量,我們需要在這個矩陣中找到一組向量,使得這組向量不可以被矩陣中的其他向量表示出來,我們就說這組向量是該矩陣的一個極大線性無關組,也就是矩陣的秩,秩的個數就是極大線性無關組的向量個數,同樣也可以反過來說這組極大線性無關組可以表示該矩陣,這個就是LoRA的核心;
? ? ? ?然后,我們再說回LoRA,LoRA由A、B兩個低階矩陣組成,用來替代原高階權重矩陣,同時有一個超參叫rank,這個參數定義的是A、B矩陣較小的那個維度,就類似于原權重矩陣秩的概念,原理同上,就是原高階權重矩陣(k,k)其實它內部的向量或者信息大概率是冗余的,存在一個極大線性無關組可以替代,假設這個極大線性無關組個數為n(n<k),但又要保持該層的輸入輸出維度保持不變,所以可以由兩個(k,n)維的矩陣相乘來替換,然后針對該層的可訓練參數就又原來的k的平方變為2倍的k*n,若使得2*k*n小于k方,n小于二分之一k即可,而在一般的權重層,其維度都是四位數,但rank的超參一般指定8或者16等小數值,遠小于其一半,所以可以起到降低訓練參數的目的,同時結合極大線性無關的概念,其代替的效果也不會交原權重有較大的差異;
二、具體實現
? ? ? ?上面我們說了一下LoRA實現的原理及背后簡單的數學邏輯,下面就以Minimind項目為基礎,通過代碼來探索它的具體實現,該項目的環境安裝和預訓練模型權重下載大家自行完成,這里就不再演示了;
2.1 代碼解析
? ? ? ?我們先簡單看一下代碼的架構,從eval_model.py中,我們可以找到在init_model(args)函數中有LoRA的身影,這里有兩個函數調用,apply_lora()和oad_lora()分別用來添加LoRA層和加載LoRA權重,我們分別來看這兩個函數。
# eval_model.py的init_model()函數def init_model(args):tokenizer = AutoTokenizer.from_pretrained('./model/minimind_tokenizer')if args.load == 0:moe_path = '_moe' if args.use_moe else ''modes = {0: 'pretrain', 1: 'full_sft', 2: 'rlhf', 3: 'reason'}# ckp = f'./{args.out_dir}/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'ckp = f'../Weights/MiniMind2-PyTorch/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'model = MiniMindLM(LMConfig(dim=args.dim,n_layers=args.n_layers,max_seq_len=args.max_seq_len,use_moe=args.use_moe))state_dict = torch.load(ckp, map_location=args.device)model.load_state_dict({k: v for k, v in state_dict.items() if 'mask' not in k}, strict=True)if args.lora_name != 'None':apply_lora(model)load_lora(model, f'./{args.out_dir}/lora/{args.lora_name}_{args.dim}.pth')else:transformers_model_path = '../Weights/MiniMind2'tokenizer = AutoTokenizer.from_pretrained(transformers_model_path)model = AutoModelForCausalLM.from_pretrained(transformers_model_path, trust_remote_code=True)print(f'MiniMind模型參數量: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M(illion)')return model.eval().to(args.device), tokenizer
2.1.1??apply_lora()
# model/model_lora.pydef apply_lora(model, rank=16):for name, module in model.named_modules():if isinstance(module, nn.Linear) and module.weight.shape[0] == module.weight.shape[1]:lora = LoRA(module.weight.shape[0], module.weight.shape[1], rank=rank).to(model.device)setattr(module, "lora", lora)original_forward = module.forward# 顯式綁定def forward_with_lora(x, layer1=original_forward, layer2=lora):return layer1(x) + layer2(x)module.forward = forward_with_lora
? ? ? ?這個API的目的將是在model對象里面添加lora結構,代碼很簡單,首先該API接收model對象參數傳遞:
1)通過named_modules()遍歷模型中的所有模塊(包括子模塊),并獲取它們的名稱(name)和對應的模塊對象(module);
2)對模型對象符合nn.Linear類別且權重為方形矩陣的層進行LoRA層添加;
3)接下來就是定義LoRA層,這個我們在2.1.2詳解;
4)通過setattr()函數為該層添加lora屬性;
5)獲取該層的前向推理original_forward;
6)通過forward_with_lora()函數將原forward和添加的lora層并行運行,輸出結果疊加;
2.1.2?class LoRA(nn.Module)
# model/model_lora.py# 定義Lora網絡結構
class LoRA(nn.Module):def __init__(self, in_features, out_features, rank):super().__init__()self.rank = rank # LoRA的秩(rank),控制低秩矩陣的大小self.A = nn.Linear(in_features, rank, bias=False) # 低秩矩陣Aself.B = nn.Linear(rank, out_features, bias=False) # 低秩矩陣B# 矩陣A高斯初始化self.A.weight.data.normal_(mean=0.0, std=0.02)# 矩陣B全0初始化self.B.weight.data.zero_()def forward(self, x):return self.B(self.A(x))
這是一個通過pytorch自定義的一個標準的LoRA層,先看它包括哪些子層:
1)可以看到定義了兩個全鏈接層A、B;
2)同時指定了rank的大小,就是第一節中的超參rank;
3)下面采用不同方式初始化了A、B;
接著我們在看LoRA層的推理流程:
1)通過forward()函數,我們可以看到輸入張量先經過A矩陣,再接結果輸入到B矩陣,最后返回LoRA層的輸出結果;
2.1.3?load_lora()?
# model/model_lora.pydef load_lora(model, path):state_dict = torch.load(path, map_location=model.device)for name, module in model.named_modules():if hasattr(module, 'lora'):lora_state = {k.replace(f'{name}.lora.', ''): v for k, v in state_dict.items() if f'{name}.lora.' in k}module.lora.load_state_dict(lora_state)
? ? ? ?這段API目的就是將已經訓練好的LoRA權重文件讀取并加載到model中對應的層上,步驟也很簡單,API接收添加LoRA結構的model對象和本地LoRA權重文件路徑:
1)?通過torch.load()函數將權重文件加載到內存,state_dict為一個字典迭代器;
2)遍歷model,獲取model中的層名以及層對象;
3)通過hasattr()函數判斷module子層對象有沒有lora屬性;
4)對于有lora屬性的子層,通過其層名稱找到對應state_dict中的權重張量;
5)通過load_state_dict()函數將權重張量賦值到lora層;
這樣我們就可以得到包含lora結構以及加載上lora權重的model對象啦;
2.2 具體實現
? ? ? ?和之前一樣,依然在eval_model.py同級目錄創建一個jujupyterNotebook文件,我們自己實操一下LoRA的添加過程;
2.2.1 導包
代碼:
import argparse
import random
import time
import numpy as np
import torch
import warnings
from transformers import AutoTokenizer, AutoModelForCausalLM
from model.model import MiniMindLM
from model.LMConfig import LMConfig
from model.model_lora import *warnings.filterwarnings('ignore')
2.2.2 定義超參
代碼:
class ARG():def __init__(self):self.lora_name = 'None'self.out_dir = 'out'self.temperature = 0.85self.top_p = 0.85self.device = 'cpu'self.dim = 512self.n_layers = 8self.max_seq_len = 8192self.use_moe = Falseself.history_cnt = 0self.stream = Trueself.load = 0self.model_mode = 1
2.2.3?定義模型初始化API
代碼:
def init_model(args):if args.load == 0:moe_path = '_moe' if args.use_moe else ''modes = {0: 'pretrain', 1: 'full_sft', 2: 'rlhf', 3: 'reason'}# ckp = f'./{args.out_dir}/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'ckp = f'../Weights/MiniMind2-PyTorch/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'model = MiniMindLM(LMConfig(dim=args.dim,n_layers=args.n_layers,max_seq_len=args.max_seq_len,use_moe=args.use_moe))state_dict = torch.load(ckp, map_location=args.device)model.load_state_dict({k: v for k, v in state_dict.items() if 'mask' not in k}, strict=True)else:transformers_model_path = '../Weights/MiniMind2'tokenizer = AutoTokenizer.from_pretrained(transformers_model_path)model = AutoModelForCausalLM.from_pretrained(transformers_model_path, trust_remote_code=True)print(f'MiniMind模型參數量: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M(illion)')return model
2.2.4? 初始化超參、模型
? ? ? ?從遍歷出來的model的子層中我們可以清晰的看到模型的結果,以及原始的model對象中不包含lora層;
代碼:
args = ARG()
model = init_model(args)
# 遍歷模型中的所有模塊(包括子模塊),并獲取它們的名稱和對應的模塊對象
for name, module in model.named_modules():print(f"Module Name: {name}")print(f"Module Type: {type(module)}")print(hasattr(module, 'lora'))if hasattr(module, 'weight') and module.weight is not None:print(f" Weight Shape: {module.weight.shape}")if hasattr(module, 'bias') and module.bias is not None:print(f" Bias Shape: {module.bias.shape}")print("-" * 40)
輸出結果:?
MiniMind模型參數量: 25.83M(illion)
Module Name:
Module Type: <class 'model.model.MiniMindLM'>
False
----------------------------------------
Module Name: tok_embeddings
Module Type: <class 'torch.nn.modules.sparse.Embedding'>
FalseWeight Shape: torch.Size([6400, 512])
----------------------------------------
Module Name: dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers
Module Type: <class 'torch.nn.modules.container.ModuleList'>
False
----------------------------------------
Module Name: layers.0
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.0.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.0.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.0.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.0.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.1.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.1.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.1.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.1.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.1.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.1.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.1.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.1.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.1.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.1.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.1.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.1.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
.....
2.2.5 定義LoRA層
代碼:
# 定義Lora網絡結構
class LoRA(nn.Module):def __init__(self, in_features, out_features, rank):super().__init__()self.rank = rank # LoRA的秩(rank),控制低秩矩陣的大小self.A = nn.Linear(in_features, rank, bias=False) # 低秩矩陣Aself.B = nn.Linear(rank, out_features, bias=False) # 低秩矩陣B# 矩陣A高斯初始化self.A.weight.data.normal_(mean=0.0, std=0.02)# 矩陣B全0初始化self.B.weight.data.zero_()def forward(self, x):return self.B(self.A(x))
2.2.6 定義LoRA添加API
代碼:
def apply_lora(model, rank=16):for name, module in model.named_modules():if isinstance(module, nn.Linear) and module.weight.shape[0] == module.weight.shape[1]:lora = LoRA(module.weight.shape[0], module.weight.shape[1], rank=rank).to(model.device)setattr(module, "lora", lora)original_forward = module.forward# 顯式綁定def forward_with_lora(x, layer1=original_forward, layer2=lora):return layer1(x) + layer2(x)module.forward = forward_with_lora
?2.2.7 添加LoRA層
? ? ? ?當添加完LoRA層在此遍歷model子層,我們可以看到在transformer模塊的Query權重層添加了LoRA層,因為Query權重層符合nn.Linear類別并且它的權重尺寸是512*512的方形;
代碼:
apply_lora(model)
# 遍歷模型中的所有模塊(包括子模塊),并獲取它們的名稱和對應的模塊對象
for name, module in model.named_modules():print(f"Module Name: {name}")print(f"Module Type: {type(module)}")print(hasattr(module, 'lora'))if hasattr(module, 'weight') and module.weight is not None:print(f" Weight Shape: {module.weight.shape}")if hasattr(module, 'bias') and module.bias is not None:print(f" Bias Shape: {module.bias.shape}")print("-" * 40)
輸出結果:
Module Name:
Module Type: <class 'model.model.MiniMindLM'>
False
----------------------------------------
Module Name: tok_embeddings
Module Type: <class 'torch.nn.modules.sparse.Embedding'>
FalseWeight Shape: torch.Size([6400, 512])
----------------------------------------
Module Name: dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers
Module Type: <class 'torch.nn.modules.container.ModuleList'>
False
----------------------------------------
Module Name: layers.0
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.0.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.0.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
TrueWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wq.lora
Module Type: <class '__main__.LoRA'>
False
----------------------------------------
Module Name: layers.0.attention.wq.lora.A
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([16, 512])
----------------------------------------
Module Name: layers.0.attention.wq.lora.B
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 16])
----------------------------------------
Module Name: layers.0.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
TrueWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wo.lora
Module Type: <class '__main__.LoRA'>
False
----------------------------------------
Module Name: layers.0.attention.wo.lora.A
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([16, 512])
----------------------------------------
Module Name: layers.0.attention.wo.lora.B
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 16])
----------------------------------------
Module Name: layers.0.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.0.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.0.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
......
2.2.8 加載LoRA權重
? ? ? ?由于我本地沒有訓練好的適配的LoRA權重文件,所以這里就先保留;截止到此模型中的LoRA層的權重還是初始化的權重張量;
三、總結
? ? ? ?至此,我們就獲得到了添加了LoRA層的模型對象;通過此博客,我們回顧了LoRA的本質、原理以及其背后的數學邏輯,并通過代碼完成了LoRA層的定義、通過屬性添加完成了指定層的lora屬性添加、通過顯式綁定完成了指定層的推理流程重定義等操作,可以說不管是理論還是實踐,我們都加深了對LoRA的理解;如果此博客對你有所幫助,請點一個贊,如果還想繼續跟著這個系列一起學習更多大模型的實踐,可以點一個關注,我將不斷學習,持續更新該系列內容,謝謝~