手撕LLM（二）：從源碼出發，探索LoRA加載、推理全流程

? ? ? ?接上回接著說，前面我們通過分析源碼，了解了大模型推理的詳細流程，包括提示詞從輸入，到對話模版包裝，到tokenID轉換，到Embedding詞向量轉換；通過大模型推理，再將大模型輸出進行最后一個token-tensor提取；再到模型的后處理，又包括多次Softmax計算、Top_p采樣、解碼為文本數據等；中間還穿插了對模型結構的分析；這樣一套下來，我們對大模型就有了一個比較清晰的認識，那接下來我們就進入下一個環節——探索LLM外掛LoRA的推理流程。

? ? ? ?說到LoRA，想必大家也不陌生，甚至還可以說出它的基本原理和實現的思路，比如它可以降低訓練成本是因為它減少了可訓練參數，可訓練參數可減少又是因為矩陣的向量冗余，可用低階矩陣來代替；再比如它可以外掛在指定的預訓練權重上不改變原模型的權重；但具體它是如何外掛的、哪些權重層可以外掛、如何將原權重和其對應的LoRA外掛一一對應呢？對于在寫這篇博客之前的我來說真的是一頭霧水，有點紙上談兵，真刀真槍干的時候就拉垮了。

? ? ? ?為了弄清LoRA在代碼上的具體實現，而不只是知道其原理，其優點；還是以Minimind項目為基礎，從代碼入手，探索其具體的實現流程。

一、LoRA

? ? ? ?此博客的主題是LoRA，就先從原理上‘紙上談兵’，分享一下我對LoRA的一些淺顯的認識，如有不準確、不恰當、不對的地方，還請各位大神指正；

? ? ? ?首先，模型的預訓練權重其實就是一堆標量組成的向量矩陣，LoRA也是一樣，它也是矩陣；說到矩陣肯定就離不開線性代數，我們大學線代里學了一個詞叫極大線性無關組，說的就是一個矩陣中所包含的向量不一定是相互獨立的，可能某幾個向量的線性組合就可以表示另外一個向量，我們需要在這個矩陣中找到一組向量，使得這組向量不可以被矩陣中的其他向量表示出來，我們就說這組向量是該矩陣的一個極大線性無關組，也就是矩陣的秩，秩的個數就是極大線性無關組的向量個數，同樣也可以反過來說這組極大線性無關組可以表示該矩陣，這個就是LoRA的核心；

? ? ? ?然后，我們再說回LoRA，LoRA由A、B兩個低階矩陣組成，用來替代原高階權重矩陣，同時有一個超參叫rank，這個參數定義的是A、B矩陣較小的那個維度，就類似于原權重矩陣秩的概念，原理同上，就是原高階權重矩陣（k，k）其實它內部的向量或者信息大概率是冗余的，存在一個極大線性無關組可以替代，假設這個極大線性無關組個數為n（n<k），但又要保持該層的輸入輸出維度保持不變，所以可以由兩個（k，n）維的矩陣相乘來替換，然后針對該層的可訓練參數就又原來的k的平方變為2倍的k*n，若使得2*k*n小于k方，n小于二分之一k即可，而在一般的權重層，其維度都是四位數，但rank的超參一般指定8或者16等小數值，遠小于其一半，所以可以起到降低訓練參數的目的，同時結合極大線性無關的概念，其代替的效果也不會交原權重有較大的差異；

二、具體實現

? ? ? ?上面我們說了一下LoRA實現的原理及背后簡單的數學邏輯，下面就以Minimind項目為基礎，通過代碼來探索它的具體實現，該項目的環境安裝和預訓練模型權重下載大家自行完成，這里就不再演示了；

2.1 代碼解析

? ? ? ?我們先簡單看一下代碼的架構，從eval_model.py中，我們可以找到在init_model(args)函數中有LoRA的身影，這里有兩個函數調用，apply_lora()和oad_lora()分別用來添加LoRA層和加載LoRA權重，我們分別來看這兩個函數。

# eval_model.py的init_model（）函數def init_model(args):tokenizer = AutoTokenizer.from_pretrained('./model/minimind_tokenizer')if args.load == 0:moe_path = '_moe' if args.use_moe else ''modes = {0: 'pretrain', 1: 'full_sft', 2: 'rlhf', 3: 'reason'}# ckp = f'./{args.out_dir}/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'ckp = f'../Weights/MiniMind2-PyTorch/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'model = MiniMindLM(LMConfig(dim=args.dim,n_layers=args.n_layers,max_seq_len=args.max_seq_len,use_moe=args.use_moe))state_dict = torch.load(ckp, map_location=args.device)model.load_state_dict({k: v for k, v in state_dict.items() if 'mask' not in k}, strict=True)if args.lora_name != 'None':apply_lora(model)load_lora(model, f'./{args.out_dir}/lora/{args.lora_name}_{args.dim}.pth')else:transformers_model_path = '../Weights/MiniMind2'tokenizer = AutoTokenizer.from_pretrained(transformers_model_path)model = AutoModelForCausalLM.from_pretrained(transformers_model_path, trust_remote_code=True)print(f'MiniMind模型參數量: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M(illion)')return model.eval().to(args.device), tokenizer

2.1.1??apply_lora()

# model/model_lora.pydef apply_lora(model, rank=16):for name, module in model.named_modules():if isinstance(module, nn.Linear) and module.weight.shape[0] == module.weight.shape[1]:lora = LoRA(module.weight.shape[0], module.weight.shape[1], rank=rank).to(model.device)setattr(module, "lora", lora)original_forward = module.forward# 顯式綁定def forward_with_lora(x, layer1=original_forward, layer2=lora):return layer1(x) + layer2(x)module.forward = forward_with_lora

? ? ? ?這個API的目的將是在model對象里面添加lora結構，代碼很簡單，首先該API接收model對象參數傳遞：
1）通過named_modules()遍歷模型中的所有模塊（包括子模塊），并獲取它們的名稱（name）和對應的模塊對象（module）；
2）對模型對象符合nn.Linear類別且權重為方形矩陣的層進行LoRA層添加；
3）接下來就是定義LoRA層，這個我們在2.1.2詳解；
4）通過setattr()函數為該層添加lora屬性；
5）獲取該層的前向推理original_forward；
6）通過forward_with_lora（）函數將原forward和添加的lora層并行運行，輸出結果疊加；

2.1.2?class LoRA(nn.Module)

# model/model_lora.py# 定義Lora網絡結構
class LoRA(nn.Module):def __init__(self, in_features, out_features, rank):super().__init__()self.rank = rank  # LoRA的秩（rank），控制低秩矩陣的大小self.A = nn.Linear(in_features, rank, bias=False)  # 低秩矩陣Aself.B = nn.Linear(rank, out_features, bias=False)  # 低秩矩陣B# 矩陣A高斯初始化self.A.weight.data.normal_(mean=0.0, std=0.02)# 矩陣B全0初始化self.B.weight.data.zero_()def forward(self, x):return self.B(self.A(x))

這是一個通過pytorch自定義的一個標準的LoRA層，先看它包括哪些子層：
1）可以看到定義了兩個全鏈接層A、B；
2）同時指定了rank的大小，就是第一節中的超參rank；
3）下面采用不同方式初始化了A、B；

接著我們在看LoRA層的推理流程：
1）通過forward（）函數，我們可以看到輸入張量先經過A矩陣，再接結果輸入到B矩陣，最后返回LoRA層的輸出結果；

2.1.3?load_lora()?

# model/model_lora.pydef load_lora(model, path):state_dict = torch.load(path, map_location=model.device)for name, module in model.named_modules():if hasattr(module, 'lora'):lora_state = {k.replace(f'{name}.lora.', ''): v for k, v in state_dict.items() if f'{name}.lora.' in k}module.lora.load_state_dict(lora_state)

? ? ? ?這段API目的就是將已經訓練好的LoRA權重文件讀取并加載到model中對應的層上，步驟也很簡單，API接收添加LoRA結構的model對象和本地LoRA權重文件路徑：
1）?通過torch.load()函數將權重文件加載到內存，state_dict為一個字典迭代器；
2）遍歷model，獲取model中的層名以及層對象；
3）通過hasattr()函數判斷module子層對象有沒有lora屬性；
4）對于有lora屬性的子層，通過其層名稱找到對應state_dict中的權重張量；
5）通過load_state_dict（）函數將權重張量賦值到lora層；

這樣我們就可以得到包含lora結構以及加載上lora權重的model對象啦；

2.2 具體實現

? ? ? ?和之前一樣，依然在eval_model.py同級目錄創建一個jujupyterNotebook文件，我們自己實操一下LoRA的添加過程；

2.2.1 導包

代碼：

import argparse
import random
import time
import numpy as np
import torch
import warnings
from transformers import AutoTokenizer, AutoModelForCausalLM
from model.model import MiniMindLM
from model.LMConfig import LMConfig
from model.model_lora import *warnings.filterwarnings('ignore')

2.2.2 定義超參

代碼：

class ARG():def __init__(self):self.lora_name = 'None'self.out_dir = 'out'self.temperature = 0.85self.top_p = 0.85self.device = 'cpu'self.dim = 512self.n_layers = 8self.max_seq_len = 8192self.use_moe = Falseself.history_cnt = 0self.stream = Trueself.load = 0self.model_mode = 1

2.2.3?定義模型初始化API

代碼：

def init_model(args):if args.load == 0:moe_path = '_moe' if args.use_moe else ''modes = {0: 'pretrain', 1: 'full_sft', 2: 'rlhf', 3: 'reason'}# ckp = f'./{args.out_dir}/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'ckp = f'../Weights/MiniMind2-PyTorch/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'model = MiniMindLM(LMConfig(dim=args.dim,n_layers=args.n_layers,max_seq_len=args.max_seq_len,use_moe=args.use_moe))state_dict = torch.load(ckp, map_location=args.device)model.load_state_dict({k: v for k, v in state_dict.items() if 'mask' not in k}, strict=True)else:transformers_model_path = '../Weights/MiniMind2'tokenizer = AutoTokenizer.from_pretrained(transformers_model_path)model = AutoModelForCausalLM.from_pretrained(transformers_model_path, trust_remote_code=True)print(f'MiniMind模型參數量: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M(illion)')return model

2.2.4? 初始化超參、模型

? ? ? ?從遍歷出來的model的子層中我們可以清晰的看到模型的結果，以及原始的model對象中不包含lora層；

代碼：

args = ARG()
model = init_model(args)
# 遍歷模型中的所有模塊（包括子模塊），并獲取它們的名稱和對應的模塊對象
for name, module in model.named_modules():print(f"Module Name: {name}")print(f"Module Type: {type(module)}")print(hasattr(module, 'lora'))if hasattr(module, 'weight') and module.weight is not None:print(f"  Weight Shape: {module.weight.shape}")if hasattr(module, 'bias') and module.bias is not None:print(f"  Bias Shape: {module.bias.shape}")print("-" * 40)

輸出結果：?

MiniMind模型參數量: 25.83M(illion)
Module Name: 
Module Type: <class 'model.model.MiniMindLM'>
False
----------------------------------------
Module Name: tok_embeddings
Module Type: <class 'torch.nn.modules.sparse.Embedding'>
FalseWeight Shape: torch.Size([6400, 512])
----------------------------------------
Module Name: dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers
Module Type: <class 'torch.nn.modules.container.ModuleList'>
False
----------------------------------------
Module Name: layers.0
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.0.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.0.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.0.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.0.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.1.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.1.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.1.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.1.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.1.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.1.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.1.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.1.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.1.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.1.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.1.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.1.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
.....

2.2.5 定義LoRA層

代碼：

# 定義Lora網絡結構
class LoRA(nn.Module):def __init__(self, in_features, out_features, rank):super().__init__()self.rank = rank  # LoRA的秩（rank），控制低秩矩陣的大小self.A = nn.Linear(in_features, rank, bias=False)  # 低秩矩陣Aself.B = nn.Linear(rank, out_features, bias=False)  # 低秩矩陣B# 矩陣A高斯初始化self.A.weight.data.normal_(mean=0.0, std=0.02)# 矩陣B全0初始化self.B.weight.data.zero_()def forward(self, x):return self.B(self.A(x))

2.2.6 定義LoRA添加API

代碼：

def apply_lora(model, rank=16):for name, module in model.named_modules():if isinstance(module, nn.Linear) and module.weight.shape[0] == module.weight.shape[1]:lora = LoRA(module.weight.shape[0], module.weight.shape[1], rank=rank).to(model.device)setattr(module, "lora", lora)original_forward = module.forward# 顯式綁定def forward_with_lora(x, layer1=original_forward, layer2=lora):return layer1(x) + layer2(x)module.forward = forward_with_lora

?2.2.7 添加LoRA層

? ? ? ?當添加完LoRA層在此遍歷model子層，我們可以看到在transformer模塊的Query權重層添加了LoRA層，因為Query權重層符合nn.Linear類別并且它的權重尺寸是512*512的方形；

代碼：

apply_lora(model)
# 遍歷模型中的所有模塊（包括子模塊），并獲取它們的名稱和對應的模塊對象
for name, module in model.named_modules():print(f"Module Name: {name}")print(f"Module Type: {type(module)}")print(hasattr(module, 'lora'))if hasattr(module, 'weight') and module.weight is not None:print(f"  Weight Shape: {module.weight.shape}")if hasattr(module, 'bias') and module.bias is not None:print(f"  Bias Shape: {module.bias.shape}")print("-" * 40)

輸出結果：

Module Name: 
Module Type: <class 'model.model.MiniMindLM'>
False
----------------------------------------
Module Name: tok_embeddings
Module Type: <class 'torch.nn.modules.sparse.Embedding'>
FalseWeight Shape: torch.Size([6400, 512])
----------------------------------------
Module Name: dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers
Module Type: <class 'torch.nn.modules.container.ModuleList'>
False
----------------------------------------
Module Name: layers.0
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.0.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.0.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
TrueWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wq.lora
Module Type: <class '__main__.LoRA'>
False
----------------------------------------
Module Name: layers.0.attention.wq.lora.A
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([16, 512])
----------------------------------------
Module Name: layers.0.attention.wq.lora.B
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 16])
----------------------------------------
Module Name: layers.0.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
TrueWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wo.lora
Module Type: <class '__main__.LoRA'>
False
----------------------------------------
Module Name: layers.0.attention.wo.lora.A
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([16, 512])
----------------------------------------
Module Name: layers.0.attention.wo.lora.B
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 16])
----------------------------------------
Module Name: layers.0.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.0.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.0.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
......

2.2.8 加載LoRA權重

? ? ? ?由于我本地沒有訓練好的適配的LoRA權重文件，所以這里就先保留；截止到此模型中的LoRA層的權重還是初始化的權重張量；

三、總結

? ? ? ?至此，我們就獲得到了添加了LoRA層的模型對象；通過此博客，我們回顧了LoRA的本質、原理以及其背后的數學邏輯，并通過代碼完成了LoRA層的定義、通過屬性添加完成了指定層的lora屬性添加、通過顯式綁定完成了指定層的推理流程重定義等操作，可以說不管是理論還是實踐，我們都加深了對LoRA的理解；如果此博客對你有所幫助，請點一個贊，如果還想繼續跟著這個系列一起學習更多大模型的實踐，可以點一個關注，我將不斷學習，持續更新該系列內容，謝謝～