詳解PEFT庫中LoRA源碼

前言

  • GitHub項目地址Some-Paper-CN。本項目是譯者在學習長時間序列預測、CV、NLP和機器學習過程中精讀的一些論文,并對其進行了中文翻譯。還有部分最佳示例教程
  • 如果有幫助到大家,請幫忙點亮Star,也是對譯者莫大的鼓勵,謝謝啦~
  • 本文代碼已同步至項目Some-Paper-CN

準備工作

  • Github上下載PEFT最新源碼,源碼地址
  • 關鍵源碼在項目文件夾src/peft下,我們可以在項目文件下新建一個Lora_demo.py文件,用于后期Debug,下面是我寫的實例代碼,用的Qwen/Qwen2-0.5B-Instruct模型,大家可以用任何自己熟悉的模型。
from pprint import pprint
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoConfiglora_config = LoraConfig(task_type="CAUSAL_LM",target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],inference_mode=False,r=8,lora_alpha=32,lora_dropout=0.1,
)
config = AutoConfig.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
og_model = AutoModelForCausalLM.from_config(config)
pprint(og_model)
lora_model = get_peft_model(og_model, lora_config)
pprint(lora_model)# pprint([key for key, _ in og_model.named_modules()])
# key = 'model.layers.0.self_attn.rotary_emb'
# pprint(og_model.get_submodule(".".join(key.split(".")[:-1])))
# pprint(key.split(".")[-1])
# pprint(og_model.get_submodule(key))
  • 后面我注釋的是一些在Debug期間不是很好理解的代碼,大家可以先不管,后面到對應的地方后我會提到。

  • 可以看到,在上面代碼中,最關鍵的函數是get_peft_model(),當原始模型經過get_peft_model()函數后,模型結構中就加入了LoRA分支。

  • og_model模型結構

Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(151936, 896)(layers): ModuleList((0-23): 24 x Qwen2DecoderLayer((self_attn): Qwen2SdpaAttention((q_proj): Linear(in_features=896, out_features=896, bias=True)(k_proj): Linear(in_features=896, out_features=128, bias=True)(v_proj): Linear(in_features=896, out_features=128, bias=True)(o_proj): Linear(in_features=896, out_features=896, bias=False)(rotary_emb): Qwen2RotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features=896, out_features=4864, bias=False)(up_proj): Linear(in_features=896, out_features=4864, bias=False)(down_proj): Linear(in_features=4864, out_features=896, bias=False)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm()(post_attention_layernorm): Qwen2RMSNorm()))(norm): Qwen2RMSNorm())(lm_head): Linear(in_features=896, out_features=151936, bias=False)
)
  • lora_model模型架構
PeftModelForCausalLM((base_model): LoraModel((model): Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(151936, 896)(layers): ModuleList((0-23): 24 x Qwen2DecoderLayer((self_attn): Qwen2SdpaAttention((q_proj): lora.Linear((base_layer): Linear(in_features=896, out_features=896, bias=True)(lora_dropout): ModuleDict((default): Dropout(p=0.1, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=896, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=896, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(k_proj): lora.Linear((base_layer): Linear(in_features=896, out_features=128, bias=True)(lora_dropout): ModuleDict((default): Dropout(p=0.1, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=896, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=128, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(v_proj): lora.Linear((base_layer): Linear(in_features=896, out_features=128, bias=True)(lora_dropout): ModuleDict((default): Dropout(p=0.1, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=896, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=128, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(o_proj): lora.Linear((base_layer): Linear(in_features=896, out_features=896, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.1, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=896, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=896, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(rotary_emb): Qwen2RotaryEmbedding())(mlp): Qwen2MLP((gate_proj): lora.Linear((base_layer): Linear(in_features=896, out_features=4864, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.1, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=896, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=4864, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(up_proj): lora.Linear((base_layer): Linear(in_features=896, out_features=4864, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.1, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=896, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=4864, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(down_proj): lora.Linear((base_layer): Linear(in_features=4864, out_features=896, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.1, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=4864, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=896, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm()(post_attention_layernorm): Qwen2RMSNorm()))(norm): Qwen2RMSNorm())(lm_head): Linear(in_features=896, out_features=151936, bias=False)))
)
  • 可以發現,lora_model對比og_model架構,多了很多以lora為前綴的層,如lora_Alora_Blora_embedding_Alora_embedding_B,那這些層是如何通過函數添加上的呢?按住Ctrl單機get_peft_model()查看底層源碼。

源碼詳解

  • 因為一些代碼規范的關系,PEFT內部的源碼封裝的非常厲害,在解析源碼的時候要耐心一點,慢慢看,層級關系也千萬要理清楚。

  • 進入get_peft_model函數,跳轉到了peft/mapping.py文件,部分注釋如下:

def get_peft_model(model: PreTrainedModel,peft_config: PeftConfig,adapter_name: str = "default",mixed: bool = False,autocast_adapter_dtype: bool = True,revision: Optional[str] = None,
) -> PeftModel | PeftMixedModel:# 讀取模型參數文件model_config = getattr(model, "config", {"model_type": "custom"})# 將模型參數轉換為dict格式if hasattr(model_config, "to_dict"):model_config = model_config.to_dict()# 讀取微調配置peft_config.base_model_name_or_path = model.__dict__.get("name_or_path", None)if revision is not None:if peft_config.revision is not None and peft_config.revision != revision:warnings.warn(f"peft config has already set base model revision to {peft_config.revision}, overwriting with revision {revision}")peft_config.revision = revisionif mixed:# note: PeftMixedModel does not support autocast_adapter_dtype, so don't pass itreturn PeftMixedModel(model, peft_config, adapter_name=adapter_name)# 判斷任務類型,以便進入對應的微調類if peft_config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys() and not peft_config.is_prompt_learning:return PeftModel(model, peft_config, adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype)if peft_config.is_prompt_learning:peft_config = _prepare_prompt_learning_config(peft_config, model_config)return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype)
  • 不難發現get_peft_model()函數的作用是判斷當前模型的任務類型,以便進入對應的微調類。在前面的lora_demo.py文件中,lora_config的參數task_type規定了模型類型為CAUSAL_LM因果語言模型,進入MODEL_TYPE_TO_PEFT_MODEL_MAPPING
MODEL_TYPE_TO_PEFT_MODEL_MAPPING: dict[str, type[PeftModel]] = {"SEQ_CLS": PeftModelForSequenceClassification,"SEQ_2_SEQ_LM": PeftModelForSeq2SeqLM,"CAUSAL_LM": PeftModelForCausalLM,"TOKEN_CLS": PeftModelForTokenClassification,"QUESTION_ANS": PeftModelForQuestionAnswering,"FEATURE_EXTRACTION": PeftModelForFeatureExtraction,
}
  • CAUSAL_LM對應的是PeftModelForCausalLM類,那我們繼續進入PeftModelForCausalLM
  • 跳轉到peft/peft_model.py文件,發現PeftModelForCausalLM類繼承自PeftModel類,進入PeftModel類,我們主要關注PeftModel類的初始化方法,部分注釋如下:
class PeftModel(PushToHubMixin, torch.nn.Module):def __init__(self,model: PreTrainedModel,peft_config: PeftConfig,adapter_name: str = "default",autocast_adapter_dtype: bool = True,) -> None:super().__init__()self.modules_to_save = Noneself.active_adapter = adapter_name# 獲取微調方法self.peft_type = peft_config.peft_type# These args are special PEFT arguments that users can pass. They need to be removed before passing them to# forward.self.special_peft_forward_args = {"adapter_names"}# 判斷是否為提示詞學習方法self._is_prompt_learning = peft_config.is_prompt_learningif self._is_prompt_learning:self._peft_config = {adapter_name: peft_config}self.base_model = modelself.add_adapter(adapter_name, peft_config)else:self._peft_config = None# 獲取微調方法類cls = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type]# 實例化微調方法類self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)self.set_additional_trainable_modules(peft_config, adapter_name)if hasattr(self.base_model, "_cast_adapter_dtype"):self.base_model._cast_adapter_dtype(adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype)if getattr(model, "is_gradient_checkpointing", True):model = self._prepare_model_for_gradient_checkpointing(model)# the `pretraining_tp` is set for some models to simulate Tensor Parallelism during inference to avoid# numerical differences, https://github.com/pytorch/pytorch/issues/76232 - to avoid any unexpected# behavior we disable that in this line.if hasattr(self.base_model, "config") and hasattr(self.base_model.config, "pretraining_tp"):self.base_model.config.pretraining_tp = 1
  • PeftModel類的作用是實例化對應的微調方法類,需要注意的是LoRA微調方法不屬于提示詞學習方法,要注意if self._is_prompt_learning判斷結果,應該走else分支
  • 進入PEFT_TYPE_TO_MODEL_MAPPING
PEFT_TYPE_TO_MODEL_MAPPING = {PeftType.LORA: LoraModel,PeftType.LOHA: LoHaModel,PeftType.LOKR: LoKrModel,PeftType.PROMPT_TUNING: PromptEmbedding,PeftType.P_TUNING: PromptEncoder,PeftType.PREFIX_TUNING: PrefixEncoder,PeftType.ADALORA: AdaLoraModel,PeftType.BOFT: BOFTModel,PeftType.ADAPTION_PROMPT: AdaptionPromptModel,PeftType.IA3: IA3Model,PeftType.OFT: OFTModel,PeftType.POLY: PolyModel,PeftType.LN_TUNING: LNTuningModel,PeftType.VERA: VeraModel,
}
  • LoRA微調方法對應LoraModel類,進入LoraModel類,跳轉到peft/tuners/lora/model.py文件,發現LoraModel類繼承自BaseTuner類,那我們繼續進入BaseTuner類,跳轉到peft/tuners/tuners_utils.py文件。
  • 還是重點關注BaseTuner類的初始化方法,部分注釋如下:
class BaseTuner(nn.Module, ABC):def __init__(self, model, peft_config: Union[PeftConfig, dict[str, PeftConfig]], adapter_name: str) -> None:super().__init__()self.model = modelself.targeted_module_names: list[str] = []# For advanced developers, if you want to attach multiple adapters to your# model, just add a `peft_config` dict attribute to your model.if not hasattr(self, "peft_config"):self.peft_config = {adapter_name: peft_config} if isinstance(peft_config, PeftConfig) else peft_configelse:logger.info("Already found a `peft_config` attribute in the model. This will lead to having multiple adapters"" in the model. Make sure to know what you are doing!")if isinstance(peft_config, PeftConfig):self.peft_config[adapter_name] = peft_configelse:# user is adding a dict of PeftConfigsself.peft_config.update(peft_config)self.active_adapter: str | list[str] = adapter_nameself._pre_injection_hook(self.model, self.peft_config[adapter_name], adapter_name)# 插入Adapter模塊self.inject_adapter(self.model, adapter_name)# Copy the peft_config in the injected model.self.model.peft_config = self.peft_config
  • 上面初始化方法有一些指標記錄和信息輸出的代碼,最關鍵的函數是inject_adapter(),即插入Adapter模塊,進入inject_adapter()函數中
  • inject_adapter()函數仍然在peft/tuners/tuners_utils.py文件中,部分注釋如下
    def inject_adapter(self, model: nn.Module, adapter_name: str, autocast_adapter_dtype: bool = True) -> None:# 讀取配置文件peft_config = self.peft_config[adapter_name]self._check_new_adapter_config(peft_config)_check_for_modules_to_save = getattr(peft_config, "modules_to_save", None) is not None_has_modules_to_save = Falsemodel_config = getattr(model, "config", {"model_type": "custom"})if hasattr(model_config, "to_dict"):model_config = model_config.to_dict()peft_config = self._prepare_adapter_config(peft_config, model_config)self._prepare_model(peft_config, model)is_target_modules_in_base_model = False# 存儲模型每一層的名字key_list = [key for key, _ in model.named_modules()]# update peft_config.target_modules if requiredpeft_config = _maybe_include_all_linear_layers(peft_config, model)for key in key_list:# Check for modules_to_save in caseif _check_for_modules_to_save and any(key.endswith(f"{module_to_save}") for module_to_save in peft_config.modules_to_save):# Optionally set the modules to saveparent, target, target_name = _get_submodules(model, key)if not isinstance(target, ModulesToSaveWrapper):new_module = ModulesToSaveWrapper(target, adapter_name)setattr(parent, target_name, new_module)else:target.update(adapter_name)_has_modules_to_save = Truecontinueif not self._check_target_module_exists(peft_config, key):continueself.targeted_module_names.append(key)is_target_modules_in_base_model = True# 獲取當前層的父模塊類,層名,層名對應的模塊類parent, target, target_name = _get_submodules(model, key)# 創建層替代原本的層self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key)if not is_target_modules_in_base_model:raise ValueError(f"Target modules {peft_config.target_modules} not found in the base model. "f"Please check the target modules and try again.")self.set_adapter(self.active_adapters)self._mark_only_adapters_as_trainable(model)if self.peft_config[adapter_name].inference_mode:for n, p in model.named_parameters():if adapter_name in n:p.requires_grad = Falseif _has_modules_to_save:if not hasattr(model, "modules_to_save"):model.modules_to_save = set(peft_config.modules_to_save)else:model.modules_to_save.update(set(peft_config.modules_to_save))
  • inject_adapter()函數的關鍵行為是用key_list存儲了模型每一層的名字,可以在前面我們寫的lora_demo.py文件中打印出來。
pprint([key for key, _ in og_model.named_modules()])

輸出如下,因為后面都是重復的模塊,這里只截取了第0個模塊的各個層

['','model','model.embed_tokens','model.layers','model.layers.0','model.layers.0.self_attn','model.layers.0.self_attn.q_proj','model.layers.0.self_attn.k_proj','model.layers.0.self_attn.v_proj','model.layers.0.self_attn.o_proj','model.layers.0.self_attn.rotary_emb','model.layers.0.mlp','model.layers.0.mlp.gate_proj','model.layers.0.mlp.up_proj','model.layers.0.mlp.down_proj','model.layers.0.mlp.act_fn','model.layers.0.input_layernorm','model.layers.0.post_attention_layernorm',...
  • 接著通過遍歷key_list獲取當前層的父模塊類,層名,層名對應的模塊類。比較重要的函數是_get_submodules(),點擊_get_submodules()函數,跳轉到peft/utils/other.py文件,看看是如何實現的。
def _get_submodules(model, key):parent = model.get_submodule(".".join(key.split(".")[:-1]))target_name = key.split(".")[-1]target = model.get_submodule(key)return parent, target, target_name
  • 這個函數可以在前面我們寫的lora_demo.py文件中打印出來。隨便選一個key,比如model.layers.0.self_attn.rotary_emb
key = 'model.layers.0.self_attn.rotary_emb'
pprint(og_model.get_submodule(".".join(key.split(".")[:-1])))
pprint(key.split(".")[-1])
pprint(og_model.get_submodule(key))

得到的輸出如下:

    """key = model.layers.0.self_attn.rotary_embparent = Qwen2SdpaAttention((q_proj): Linear(in_features=896, out_features=896, bias=True)(k_proj): Linear(in_features=896, out_features=128, bias=True)(v_proj): Linear(in_features=896, out_features=128, bias=True)(o_proj): Linear(in_features=896, out_features=896, bias=False)(rotary_emb): Qwen2RotaryEmbedding())target_name = 'rotary_emb'target = Qwen2RotaryEmbedding()"""
  • 我們回到peft/tuners/tuners_utils.py文件中,繼續看inject_adapter()函數,另一個關鍵函數是_create_and_replace()函數,我們點擊_create_and_replace()函數,其與inject_adapter()函數在一個文件中,但是_create_and_replace()函數是一個抽象函數,我們需要回到peft/tuners/lora/model.py文件,LoraModel類中搜索_create_and_replace()函數查看具體實現。
    def _create_and_replace(self,lora_config,adapter_name,target,target_name,parent,current_key,):if current_key is None:raise ValueError("Current Key shouldn't be `None`")# Regexp 匹配 - 在提供的模式中查找與當前目標名稱匹配的鍵值pattern_keys = list(chain(lora_config.rank_pattern.keys(), lora_config.alpha_pattern.keys()))target_name_key = next(filter(lambda key: re.match(rf".*\.{key}$", current_key), pattern_keys), current_key)# 獲取r和alpha參數r = lora_config.rank_pattern.get(target_name_key, lora_config.r)alpha = lora_config.alpha_pattern.get(target_name_key, lora_config.lora_alpha)kwargs = {"r": r,"lora_alpha": alpha,"lora_dropout": lora_config.lora_dropout,"fan_in_fan_out": lora_config.fan_in_fan_out,"init_lora_weights": lora_config.init_lora_weights,"use_rslora": lora_config.use_rslora,"use_dora": lora_config.use_dora,"loaded_in_8bit": getattr(self.model, "is_loaded_in_8bit", False),"loaded_in_4bit": getattr(self.model, "is_loaded_in_4bit", False),}quant_methods = ["gptq", "aqlm", "awq"]for quant_method in quant_methods:quantization_config = get_quantization_config(self.model, method=quant_method)if quantization_config is not None:kwargs[f"{quant_method}_quantization_config"] = quantization_config# note: AdaLoraLayer is a subclass of LoraLayer, we need to exclude itfrom peft.tuners.adalora import AdaLoraLayer# 當前傳入的模型還是原始的基于nn.module的模型,所以不屬于LoraLayer或Adaloralayer# 應該走elseif isinstance(target, LoraLayer) and not isinstance(target, AdaLoraLayer):target.update_layer(adapter_name,r,lora_alpha=alpha,lora_dropout=lora_config.lora_dropout,init_lora_weights=lora_config.init_lora_weights,use_rslora=lora_config.use_rslora,use_dora=lora_config.use_dora,)else:# 根據LoRA關鍵參數創建新的模型new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)if adapter_name not in self.active_adapters:# adding an additional adapter: it is not automatically trainablenew_module.requires_grad_(False)self._replace_module(parent, target_name, new_module, target)
  • _create_and_replace()函數在獲取了LoRA微調的關鍵參數ralpha后,使用_create_new_module()函數創建新的模型架構。
  • 需要注意的是,現在傳入該函數的模型是原始的,基于torch.nn.module的模型,不屬于LoraLayer或者AdaLoraLayer類型,應該要走下面的else分支。
  • 進入_create_new_module()函數,與_create_and_replace()函數在同一文件下,
    def _create_new_module(lora_config, adapter_name, target, **kwargs):dispatchers = []if lora_config._custom_modules:def dynamic_dispatch_func(target, adapter_name, lora_config, **kwargs):new_module = Noneif isinstance(target, BaseTunerLayer):target_base_layer = target.get_base_layer()else:target_base_layer = targetfor key, custom_cls in lora_config._custom_modules.items():if isinstance(target_base_layer, key):new_module = custom_cls(target, adapter_name, **kwargs)breakreturn new_moduledispatchers.append(dynamic_dispatch_func)# avoid eager bnb importif is_bnb_available():from .bnb import dispatch_bnb_8bitdispatchers.append(dispatch_bnb_8bit)if is_bnb_4bit_available():from .bnb import dispatch_bnb_4bitdispatchers.append(dispatch_bnb_4bit)dispatchers.extend([dispatch_eetq,dispatch_aqlm,dispatch_awq,dispatch_gptq,dispatch_hqq,dispatch_megatron,dispatch_default,])new_module = Nonefor dispatcher in dispatchers:new_module = dispatcher(target, adapter_name, lora_config=lora_config, **kwargs)if new_module is not None:  # first match winsbreakif new_module is None:# no module could be matchedraise ValueError(f"Target module {target} is not supported. Currently, only the following modules are supported: ""`torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.")return new_module
  • 上面函數提到的awqgptqhqq都是不同的模型量化或部署格式,一般我們微調選擇的都是非量化版本,所以最后都會走dispatch_default分支,但是我們也可以看看其他的分支,這里以dispatch_eetq()函數為例,點擊跳轉到peft/tuners/lora/eetq.py文件
def dispatch_eetq(target: torch.nn.Module,adapter_name: str,**kwargs: Any,
) -> Optional[torch.nn.Module]:new_module = Noneif isinstance(target, BaseTunerLayer):target_base_layer = target.get_base_layer()else:target_base_layer = targetif is_eetq_available() and isinstance(target_base_layer, EetqLinear):new_module = EetqLoraLinear(target, adapter_name, **kwargs)target.weight = target_base_layer.weightif hasattr(target, "bias"):target.bias = target_base_layer.biasreturn new_module
  • 函數先初始化了new_moduleNone,但是由于is_eetq_available()isinstance(target_base_layer, EetqLinear)均為False,最后的new_module仍然為None
  • 回到peft/tuners/lora/model.py文件下的_create_new_module()函數,點擊dispatch_default,跳轉到peft/tuners/lora/layer.py文件下的dispatch_default()函數,部分注釋如下。
def dispatch_default(target: torch.nn.Module,adapter_name: str,lora_config: LoraConfig,**kwargs,
) -> Optional[torch.nn.Module]:new_module = Noneif isinstance(target, BaseTunerLayer):target_base_layer = target.get_base_layer()else:target_base_layer = target# 更新Embedding層if isinstance(target_base_layer, torch.nn.Embedding):embedding_kwargs = kwargs.copy()embedding_kwargs.pop("fan_in_fan_out", None)embedding_kwargs.update(lora_config.loftq_config)new_module = Embedding(target, adapter_name, **embedding_kwargs)# 更新Conv2d層elif isinstance(target_base_layer, torch.nn.Conv2d):kwargs.update(lora_config.loftq_config)new_module = Conv2d(target, adapter_name, **kwargs)# 更新Linear層elif isinstance(target_base_layer, torch.nn.Linear):if kwargs["fan_in_fan_out"]:warnings.warn("fan_in_fan_out is set to True but the target module is `torch.nn.Linear`. ""Setting fan_in_fan_out to False.")kwargs["fan_in_fan_out"] = lora_config.fan_in_fan_out = Falsekwargs.update(lora_config.loftq_config)new_module = Linear(target, adapter_name, **kwargs)# 更新Conv1D層elif isinstance(target_base_layer, Conv1D):if not kwargs["fan_in_fan_out"]:warnings.warn("fan_in_fan_out is set to False but the target module is `Conv1D`. " "Setting fan_in_fan_out to True.")kwargs["fan_in_fan_out"] = lora_config.fan_in_fan_out = Truekwargs.update(lora_config.loftq_config)new_module = Linear(target, adapter_name, is_target_conv_1d_layer=True, **kwargs)return new_module
  • 可以看到LoRA微調方法主要是針對EmbeddingConv1DConv2DLinear層,這里我們以Linear為例進行講解。
  • 進入isinstance(target_base_layer, torch.nn.Linear)分支下的new_module = Linear(target, adapter_name, **kwargs),點擊LinearLinear類與dispatch_default()函數處于同一文件。
class Linear(nn.Module, LoraLayer):def __init__(self,base_layer,adapter_name: str,r: int = 0,lora_alpha: int = 1,lora_dropout: float = 0.0,fan_in_fan_out: bool = False,  is_target_conv_1d_layer: bool = False,init_lora_weights: Union[bool, str] = True,use_rslora: bool = False,use_dora: bool = False,**kwargs,) -> None:super().__init__()LoraLayer.__init__(self, base_layer, **kwargs)self.fan_in_fan_out = fan_in_fan_outself._active_adapter = adapter_nameself.update_layer(adapter_name,r,lora_alpha=lora_alpha,lora_dropout=lora_dropout,init_lora_weights=init_lora_weights,use_rslora=use_rslora,use_dora=use_dora,)self.is_target_conv_1d_layer = is_target_conv_1d_layer
  • 可以看到Linear類除了繼承nn.Module類,還繼承了LoraLayer類,我們先看LoraLayer類,點擊進入與Linear類同一文件的LoraLayer類,主要關注LoraLayer類的初始化方法
class LoraLayer(BaseTunerLayer):adapter_layer_names = ("lora_A", "lora_B", "lora_embedding_A", "lora_embedding_B")other_param_names = ("r", "lora_alpha", "scaling", "lora_dropout")def __init__(self, base_layer: nn.Module, **kwargs) -> None:self.base_layer = base_layerself.r = {}self.lora_alpha = {}self.scaling = {}self.lora_dropout = nn.ModuleDict({})self.lora_A = nn.ModuleDict({})self.lora_B = nn.ModuleDict({})self.lora_embedding_A = nn.ParameterDict({})self.lora_embedding_B = nn.ParameterDict({})self._disable_adapters = Falseself.merged_adapters = []self.use_dora: dict[str, bool] = {}self.lora_magnitude_vector = torch.nn.ModuleDict()  # for DoRAself._caches: dict[str, Any] = {}self.kwargs = kwargsbase_layer = self.get_base_layer()# 獲取輸入輸出維度if isinstance(base_layer, nn.Linear):in_features, out_features = base_layer.in_features, base_layer.out_featureselif isinstance(base_layer, nn.Conv2d):in_features, out_features = base_layer.in_channels, base_layer.out_channelselif isinstance(base_layer, nn.Embedding):in_features, out_features = base_layer.num_embeddings, base_layer.embedding_dimelif isinstance(base_layer, Conv1D):in_features, out_features = (base_layer.weight.ds_shape if hasattr(base_layer.weight, "ds_shape") else base_layer.weight.shape)elif hasattr(base_layer, "infeatures") and hasattr(base_layer, "outfeatures"):in_features, out_features = base_layer.infeatures, base_layer.outfeatureselif hasattr(base_layer, "input_size") and hasattr(base_layer, "output_size"):in_features, out_features = base_layer.input_size, base_layer.output_sizeelif hasattr(base_layer, "codebooks") and base_layer.__class__.__name__ == "QuantizedLinear":in_features, out_features = base_layer.in_features, base_layer.out_featureselif hasattr(base_layer, "w_bit") and base_layer.__class__.__name__ == "WQLinear_GEMM":in_features, out_features = base_layer.in_features, base_layer.out_featureselif base_layer.__class__.__name__ == "EetqLinear":in_features, out_features = base_layer.in_features, base_layer.out_featureselif hasattr(base_layer, "W_q") and base_layer.__class__.__name__ == "HQQLinear":in_features, out_features = base_layer.in_features, base_layer.out_featureselse:if hasattr(base_layer, "in_features") and hasattr(base_layer, "out_features"):in_features, out_features = base_layer.in_features, base_layer.out_featureselse:in_features, out_features = None, Nonewarnings.warn(f"Unsupported layer type '{type(base_layer)}' encountered, proceed at your own risk.", UserWarning)self.in_features = in_featuresself.out_features = out_features
  • LoraLayer類的初始化方法關鍵行為在于獲取可調節層(EmbeddingConv1DConv2DLinear)的輸入輸出維度,方便構造新的層。
  • 我們回到Linear類中,Linear類的初始化中另外一個關鍵方法是update_layer(),點擊進入update_layer()函數中。
  • 到這里我終于可以把LoRA微調方法的經典原理圖放上來了。
    請添加圖片描述
    def update_layer(self, adapter_name, r, lora_alpha, lora_dropout, init_lora_weights, use_rslora, use_dora: bool = False):# This code works for linear layers, override for other layer typesif r <= 0:raise ValueError(f"`r` should be a positive integer value but the value passed is {r}")# 讀取r、alpha參數self.r[adapter_name] = rself.lora_alpha[adapter_name] = lora_alpha# 如果存在dropout參數則加入Dropout層if lora_dropout > 0.0:lora_dropout_layer = nn.Dropout(p=lora_dropout)else:lora_dropout_layer = nn.Identity()# 在lora_dropout中加入lora_dropout_layerself.lora_dropout.update(nn.ModuleDict({adapter_name: lora_dropout_layer}))# 實際可訓練參數,矩陣A,Bself.lora_A[adapter_name] = nn.Linear(self.in_features, r, bias=False)self.lora_B[adapter_name] = nn.Linear(r, self.out_features, bias=False)if use_rslora:self.scaling[adapter_name] = lora_alpha / math.sqrt(r)else:self.scaling[adapter_name] = lora_alpha / r# for inits that require access to the base weight, use gather_param_ctx so that the weight is gathered when using DeepSpeedif isinstance(init_lora_weights, str) and init_lora_weights.startswith("pissa"):with gather_params_ctx(self.get_base_layer().weight):self.pissa_init(adapter_name, init_lora_weights)elif isinstance(init_lora_weights, str) and init_lora_weights.lower() == "olora":with gather_params_ctx(self.get_base_layer().weight):self.olora_init(adapter_name)elif init_lora_weights == "loftq":with gather_params_ctx(self.get_base_layer().weight):self.loftq_init(adapter_name)elif init_lora_weights:self.reset_lora_parameters(adapter_name, init_lora_weights)# call this before dora_initself._move_adapter_to_device_of_base_layer(adapter_name)if use_dora:self.dora_init(adapter_name)self.use_dora[adapter_name] = Trueelse:self.use_dora[adapter_name] = False# 設置可訓練參數self.set_adapter(self.active_adapters)
  • update_layer()方法的關鍵行為有:
    • 1、讀取LoRA關鍵參數ralpha
    • 2、根據Dropout參數判斷是否加入Dropout
    • 3、創建lora_Alora_B線性層,并進行初始化。lora_A的初始化有多種方式,lora_B的初始值均為0,這樣lora分支的值在沒開始訓練之前為0,不改變原模型權重值。
    • 設置lora分支為可訓練參數,凍結其它層
  • lora_A層的初始化方法可以在reset_lora_parameters()方法中找到,部分注釋如下
    def reset_lora_parameters(self, adapter_name, init_lora_weights):if init_lora_weights is False:returnif adapter_name in self.lora_A.keys():# 若init_lora_weights為true則使用kaiming初始化if init_lora_weights is True:nn.init.kaiming_uniform_(self.lora_A[adapter_name].weight, a=math.sqrt(5))# 如果為gaussian則進行正態初始化elif init_lora_weights.lower() == "gaussian":nn.init.normal_(self.lora_A[adapter_name].weight, std=1 / self.r[adapter_name])else:raise ValueError(f"Unknown initialization {init_lora_weights=}")# 對B矩陣使用全0初始化nn.init.zeros_(self.lora_B[adapter_name].weight)if adapter_name in self.lora_embedding_A.keys():nn.init.zeros_(self.lora_embedding_A[adapter_name])nn.init.normal_(self.lora_embedding_B[adapter_name])
  • 設置可訓練參數可以查看set_adapter()方法,部分注釋如下
    def set_adapter(self, adapter_names: str | list[str]) -> None:if isinstance(adapter_names, str):adapter_names = [adapter_names]for layer_name in self.adapter_layer_names:module_dict = getattr(self, layer_name)for key, layer in module_dict.items():# 如果是adapter_names中需要訓練的層,則開啟梯度傳播,否則關閉if key in adapter_names:layer.requires_grad_(True)else:layer.requires_grad_(False)self._active_adapter = adapter_names
  • 看完Linear類的初始化方法,還可以看一下forward方法,描述了LoRA模型如何與原模型推理的結果進行合并的,代碼部分注釋如下
    def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor:self._check_forward_args(x, *args, **kwargs)adapter_names = kwargs.pop("adapter_names", None)if self.disable_adapters:if self.merged:self.unmerge()result = self.base_layer(x, *args, **kwargs)elif adapter_names is not None:result = self._mixed_batch_forward(x, *args, adapter_names=adapter_names, **kwargs)elif self.merged:result = self.base_layer(x, *args, **kwargs)else:# 得到原始模型中的結果result = self.base_layer(x, *args, **kwargs)torch_result_dtype = result.dtypefor active_adapter in self.active_adapters:if active_adapter not in self.lora_A.keys():continuelora_A = self.lora_A[active_adapter]lora_B = self.lora_B[active_adapter]dropout = self.lora_dropout[active_adapter]scaling = self.scaling[active_adapter]x = x.to(lora_A.weight.dtype)if not self.use_dora[active_adapter]:# 原始模型輸出+可訓練lora層的結果result = result + lora_B(lora_A(dropout(x))) * scalingelse:x = dropout(x)result = result + self.lora_magnitude_vector[active_adapter](x,lora_A=lora_A,lora_B=lora_B,scaling=scaling,base_layer=self.get_base_layer(),)result = result.to(torch_result_dtype)return result
  • 到這里,關于LoRA部分的源碼就解析完了,后面就是訓練這些標記為可訓練參數的模塊。

后記

  • PEFT庫是我看過的,封裝的比較復制的庫了,雖然看起來很繁瑣,但是在看這些源碼的過程中,我也逐漸明白了為什么代碼需要這樣構建,以后如果需要構建自己的大型項目應該如何做,受益匪淺。
  • 在這里也希望大家能深入底層原理去了解算法,只有明白了其機理,才知道算法的優缺點

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/web/37145.shtml
繁體地址,請注明出處:http://hk.pswp.cn/web/37145.shtml
英文地址,請注明出處:http://en.pswp.cn/web/37145.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

讀書筆記-《Spring技術內幕》(三)MVC與Web環境

前面我們學習了 Spring 最核心的 IoC 與 AOP 模塊&#xff08;讀書筆記-《Spring技術內幕》&#xff08;一&#xff09;IoC容器的實現、讀書筆記-《Spring技術內幕》&#xff08;二&#xff09;AOP的實現&#xff09;&#xff0c;接下來繼續學習 MVC&#xff0c;其同樣也是經典…

Spring底層原理之bean的加載方式八 BeanDefinitionRegistryPostProcessor注解

BeanDefinitionRegistryPostProcessor注解 這種方式和第七種比較像 要實現兩個方法 第一個方法是實現工廠 第二個方法叫后處理bean注冊 package com.bigdata1421.bean;import org.springframework.beans.BeansException; import org.springframework.beans.factory.config.…

解決idea中git無法管理項目中所有需要管理的文件

點擊文件->設置 選擇版本控制—>目錄映射 點擊加號 設置整個項目被Git管理

【python入門】自定義函數

文章目錄 定義自定義函數的基本語法參數類型示例代碼函數作用域匿名函數&#xff08;Lambda&#xff09;閉包裝飾器 Python中的自定義函數允許你編寫一段可重用的代碼塊&#xff0c;這段代碼可以帶參數&#xff08;輸入&#xff09;&#xff0c;并可能返回一個值&#xff08;輸…

MySQL高級-事務-并發事務演示及隔離級別

文章目錄 0、四種隔離級別1、創建表 account2、修改當前會話隔離級別為 read uncommitted2.1、會出現臟讀 3、修改當前會話隔離級別為 read committed3.1、可以解決臟讀3.2、會出現不可重復讀 4、修改當前會話隔離級別為 repeatable read&#xff08;默認&#xff09;4.1、解決…

解決docker鏡像pull失敗的有效

機器環境 本實踐將在 Ubuntu 22.04.3LTS 系統上進行測試 docker 版本Docker Engine - Community 24.0.6 &#xff0c;原則上docker版本無影響 本實踐進僅學習研究使用&#xff0c;無作他用途。 背景 曾幾何時&#xff0c;docker鏡像的拉去會失敗&#xff0c;網速會慢&#xff0…

代碼隨想錄算法訓練營第五十三天| 739. 每日溫度、 496.下一個更大元素 I、503.下一個更大元素II

LeetCode 739. 每日溫度 題目鏈接&#xff1a;https://leetcode.cn/problems/daily-temperatures/description/ 文章鏈接&#xff1a;https://programmercarl.com/0739.%E6%AF%8F%E6%97%A5%E6%B8%A9%E5%BA%A6.html 思路 * 單調棧的本質是空間換時間&#xff0c;因為在遍歷的過…

【論文閱讀】transformer及其變體

寫在前面&#xff1a; transformer模型已經是老生常談的一個東西&#xff0c;以transformer為基礎出現了很多變體和文章&#xff0c;Informer、autoformer、itransformer等等都是頂刊頂會。一提到transformer自然就是注意力機制&#xff0c;變體更是數不勝數&#xff0c;一提到…

【目標檢測】DN-DETR

一、引言 論文&#xff1a; DN-DETR: Accelerate DETR Training by Introducing Query DeNoising 作者&#xff1a; IDEA 代碼&#xff1a; DN-DETR 注意&#xff1a; 該算法是在DAB-DETR基礎上的改進&#xff0c;在學習該算法前&#xff0c;建議掌握DETR、DAB-DETR等相關知識…

TCP和UDP的區別以及應用場景

TCP&#xff08;傳輸控制協議&#xff09;和UDP&#xff08;用戶數據報協議&#xff09;是兩種不同的傳輸層協議 區別 TCP是面向連接的&#xff0c;UDP是無連接的&#xff1b; TCP是可靠的&#xff0c;UDP是不可靠的&#xff1b; TCP是面向字節流的&#xff0c;UDP是面向數據…

如何高效配置與使用Pip換源

目錄 1. Pip源的基本概念 1.1 常見的國內鏡像源 2. 臨時換源 2.1 使用命令行參數指定鏡像源 2.2 安裝多個包時指定鏡像源 3. 永久換源 3.1 修改用戶級配置文件 3.1.1 創建和編輯配置文件 3.2 修改全局配置文件 3.2.1 創建和編輯全局配置文件 4. 驗證換源配置 5. 切…

VMamba: Visual State Space Model論文筆記

文章目錄 VMamba: Visual State Space Model摘要引言相關工作Preliminaries方法網絡結構2D-Selective-Scan for Vision Data(SS2D) VMamba: Visual State Space Model 論文地址: https://arxiv.org/abs/2401.10166 代碼地址: https://github.com/MzeroMiko/VMamba 摘要 卷積神…

防火墻共性檢測技術

防火墻共性檢測技術 防火墻共性檢測技術是指防火墻在監控和控制網絡流量時&#xff0c;共同采用的一些檢測和過濾方法。無論是哪種類型的防火墻&#xff0c;這些技術都可以用于識別和阻止惡意流量&#xff0c;確保網絡安全。以下是防火墻共性檢測技術的詳細介紹&#xff0c;包…

axios的基本使用和vue腳手架自帶的跨域問題解決

axios的基本使用和vue腳手架自帶的跨域問題解決 1. axios 1.1 導入axios npm i axios1.2 創建serve1.js serve1.js const express require(express) const app express()app.use((request,response,next)>{console.log(有人請求服務器1了);console.log(請求來自于,re…

go Channel 原理 (一)

Channel 設計原理 不要通過共享內存的方式進行通信&#xff0c;而是應該通過通信的方式共享內存。 在主流編程語言中&#xff0c;多個線程傳遞數據的方式一般都是共享內存。 Go 可以使用共享內存加互斥鎖進行通信&#xff0c;同時也提供了一種不同的并發模型&#xff0c;即通…

npm ci vs npm i

npm ci vs npm i 幾個關鍵區別&#xff1a;該選擇哪個&#xff1f; 通過 npm ci 和 npm i 兩個命令&#xff0c;都可安裝項目的依賴。那么這兩個命令有什么區別呢&#xff1f; 幾個關鍵區別&#xff1a; 目的和用途&#xff1a; npm ci &#xff1a;根據項目中的 package-lock…

AI奏響未來樂章:音樂界的革命性變革

AI在創造還是毀掉音樂 引言 隨著科技的飛速發展&#xff0c;人工智能&#xff08;AI&#xff09;正在逐漸滲透到我們生活的每一個角落&#xff0c;音樂領域也不例外。AI技術的引入&#xff0c;不僅為音樂創作、教育、體驗帶來了革命性的變革&#xff0c;更為整個音樂產業注入了…

順序表應用——通訊錄

在本篇之前的順序表專題我們已經學習的順序表的實現&#xff0c;了解了如何實現順序表的插入和刪除等功能&#xff0c;那么在本篇當中就要學習基于順序表來實現通訊錄&#xff0c;在通訊錄當中能實現聯系人的增、刪、查改等功能&#xff0c;接下來就讓我們一起來實現通訊錄吧&a…

grpc學習golang版( 五、多proto文件示例 )

系列文章目錄 第一章 grpc基本概念與安裝 第二章 grpc入門示例 第三章 proto文件數據類型 第四章 多服務示例 第五章 多proto文件示例 第六章 服務器流式傳輸 第七章 客戶端流式傳輸 第八章 雙向流示例 文章目錄 一、前言二、定義proto文件2.1 公共proto文件2.2 語音喚醒proto文…

解決Vue3項目中跨域問題的步驟

決Vue3項目中跨域問題的步驟可以按照以下方式清晰地分點表示和歸納&#xff1a; 1. 使用代理服務器&#xff08;Proxy&#xff09; 步驟&#xff1a; 在Vue項目的根目錄下找到或創建vue.config.js文件。在vue.config.js中配置devServer的proxy選項。設定需要代理的接口前綴&a…