從RNN到Transformer

從RNN到Transformer

目錄

  1. 基礎篇:序列模型概述
  2. RNN循環神經網絡
  3. LSTM長短期記憶網絡
  4. Transformer架構
  5. 時間序列預測應用
  6. 計算機視覺應用
  7. 大語言模型應用
  8. 實戰與優化
  9. 前沿發展

基礎篇:序列模型概述 {#基礎篇}

什么是序列數據?

序列數據是按照特定順序排列的數據點集合,其中順序信息至關重要:

  • 時間序列:股票價格、溫度變化、銷售數據
  • 文本序列:句子、段落、文檔
  • 視頻序列:連續的圖像幀
  • 音頻序列:聲波信號

為什么需要專門的序列模型?

傳統的前饋神經網絡存在以下限制:

  1. 固定輸入大小:無法處理變長序列
  2. 位置不變性:忽略了序列中的順序信息
  3. 無記憶能力:不能利用之前的信息
  4. 參數爆炸:對長序列需要大量參數

序列建模的核心挑戰

  • 長期依賴:捕獲序列中相距較遠的元素之間的關系
  • 梯度消失/爆炸:深層網絡訓練困難
  • 計算效率:處理長序列的計算復雜度
  • 泛化能力:適應不同長度和類型的序列

RNN循環神經網絡 {#rnn}

基礎概念

RNN通過引入循環連接,使網絡能夠保持內部狀態(記憶),從而處理序列數據。

核心思想
# RNN的基本公式
h_t = tanh(W_hh @ h_{t-1} + W_xh @ x_t + b_h)
y_t = W_hy @ h_t + b_y# 其中:
# h_t: 時刻t的隱藏狀態
# x_t: 時刻t的輸入
# y_t: 時刻t的輸出
# W_*: 權重矩陣
# b_*: 偏置向量

RNN的結構

1. 基本RNN單元
  • 輸入層:接收當前時刻的輸入x_t
  • 隱藏層:結合當前輸入和上一時刻的隱藏狀態
  • 輸出層:產生當前時刻的輸出
2. 展開的RNN

RNN可以展開成一個深度網絡,每個時間步共享相同的參數:

x_0 → [RNN] → h_0 → y_0↓
x_1 → [RNN] → h_1 → y_1↓
x_2 → [RNN] → h_2 → y_2

RNN的變體

1. 多對一(Many-to-One)
  • 應用:情感分析、文本分類
  • 特點:整個序列輸入,單個輸出
2. 一對多(One-to-Many)
  • 應用:圖像描述生成、音樂生成
  • 特點:單個輸入,序列輸出
3. 多對多(Many-to-Many)
  • 同步:機器翻譯、視頻分類
  • 異步:序列到序列任務

RNN的訓練

反向傳播穿越時間(BPTT)
# BPTT算法偽代碼
def bptt(sequences, targets, rnn_params):total_loss = 0for seq, target in zip(sequences, targets):# 前向傳播hidden_states = []h = initial_hidden_statefor x in seq:h = rnn_cell(x, h, rnn_params)hidden_states.append(h)# 計算損失loss = compute_loss(hidden_states[-1], target)# 反向傳播gradients = backward_pass(loss, hidden_states)# 更新參數update_parameters(rnn_params, gradients)

RNN的問題

  1. 梯度消失:長序列訓練時梯度呈指數級衰減
  2. 梯度爆炸:梯度值變得非常大,導致訓練不穩定
  3. 長期依賴困難:難以捕獲序列中的長距離依賴關系

LSTM長短期記憶網絡 {#lstm}

LSTM的動機

LSTM專門設計用于解決RNN的長期依賴問題,通過引入門控機制和記憶單元。

LSTM的核心組件

1. 細胞狀態(Cell State)
  • 信息的長期記憶通道
  • 可以選擇性地保留或遺忘信息
2. 三個門(Gates)
遺忘門(Forget Gate)
f_t = sigmoid(W_f @ [h_{t-1}, x_t] + b_f)
# 決定從細胞狀態中丟棄什么信息
輸入門(Input Gate)
i_t = sigmoid(W_i @ [h_{t-1}, x_t] + b_i)
C?_t = tanh(W_C @ [h_{t-1}, x_t] + b_C)
# 決定什么新信息存儲在細胞狀態中
輸出門(Output Gate)
o_t = sigmoid(W_o @ [h_{t-1}, x_t] + b_o)
h_t = o_t * tanh(C_t)
# 決定輸出什么信息

LSTM的完整計算流程

def lstm_cell(x_t, h_prev, C_prev, W, b):# 1. 遺忘門:決定丟棄什么f_t = sigmoid(W_f @ concat([h_prev, x_t]) + b_f)# 2. 輸入門:決定存儲什么i_t = sigmoid(W_i @ concat([h_prev, x_t]) + b_i)C_tilde = tanh(W_C @ concat([h_prev, x_t]) + b_C)# 3. 更新細胞狀態C_t = f_t * C_prev + i_t * C_tilde# 4. 輸出門:決定輸出什么o_t = sigmoid(W_o @ concat([h_prev, x_t]) + b_o)h_t = o_t * tanh(C_t)return h_t, C_t

LSTM的變體

1. GRU(門控循環單元)
  • 簡化版LSTM,只有兩個門
  • 更新門和重置門
  • 參數更少,訓練更快
def gru_cell(x_t, h_prev, W, b):# 重置門r_t = sigmoid(W_r @ concat([h_prev, x_t]) + b_r)# 更新門z_t = sigmoid(W_z @ concat([h_prev, x_t]) + b_z)# 候選隱藏狀態h_tilde = tanh(W_h @ concat([r_t * h_prev, x_t]) + b_h)# 最終隱藏狀態h_t = (1 - z_t) * h_prev + z_t * h_tildereturn h_t
2. 雙向LSTM(BiLSTM)
  • 同時處理正向和反向序列
  • 捕獲雙向上下文信息
  • 在NLP任務中表現優異
3. 多層LSTM
  • 垂直堆疊多個LSTM層
  • 學習更抽象的特征表示
  • 增強模型表達能力

LSTM的優勢與局限

優勢:

  • 有效解決梯度消失問題
  • 能夠捕獲長期依賴關系
  • 在多種序列任務中表現優秀

局限:

  • 計算復雜度高
  • 難以并行化
  • 對于超長序列仍有挑戰

Transformer架構 {#transformer}

Transformer的革命性創新

2017年提出的"Attention is All You Need"徹底改變了序列建模范式。

核心概念:自注意力機制

1. 縮放點積注意力
def scaled_dot_product_attention(Q, K, V, mask=None):# Q: Query矩陣 [batch, seq_len, d_k]# K: Key矩陣 [batch, seq_len, d_k]# V: Value矩陣 [batch, seq_len, d_v]d_k = K.shape[-1]# 計算注意力分數scores = Q @ K.transpose(-2, -1) / sqrt(d_k)# 應用mask(可選)if mask is not None:scores = scores.masked_fill(mask == 0, -1e9)# Softmax歸一化attention_weights = softmax(scores, dim=-1)# 加權求和output = attention_weights @ Vreturn output, attention_weights
2. 多頭注意力
class MultiHeadAttention:def __init__(self, d_model, n_heads):self.d_model = d_modelself.n_heads = n_headsself.d_k = d_model // n_heads# 線性投影層self.W_q = Linear(d_model, d_model)self.W_k = Linear(d_model, d_model)self.W_v = Linear(d_model, d_model)self.W_o = Linear(d_model, d_model)def forward(self, query, key, value, mask=None):batch_size = query.shape[0]# 1. 線性投影并重塑為多頭Q = self.W_q(query).view(batch_size, -1, self.n_heads, self.d_k)K = self.W_k(key).view(batch_size, -1, self.n_heads, self.d_k)V = self.W_v(value).view(batch_size, -1, self.n_heads, self.d_k)# 2. 轉置以便于并行計算Q = Q.transpose(1, 2)K = K.transpose(1, 2)V = V.transpose(1, 2)# 3. 計算注意力attn_output, _ = scaled_dot_product_attention(Q, K, V, mask)# 4. 拼接多頭輸出attn_output = attn_output.transpose(1, 2).contiguous()attn_output = attn_output.view(batch_size, -1, self.d_model)# 5. 最終線性投影output = self.W_o(attn_output)return output

Transformer的架構組件

1. 編碼器(Encoder)
class TransformerEncoder:def __init__(self, d_model, n_heads, d_ff, dropout=0.1):# 子層self.self_attention = MultiHeadAttention(d_model, n_heads)self.feed_forward = FeedForward(d_model, d_ff)# 層歸一化self.norm1 = LayerNorm(d_model)self.norm2 = LayerNorm(d_model)# Dropoutself.dropout = Dropout(dropout)def forward(self, x, mask=None):# 1. 自注意力子層attn_output = self.self_attention(x, x, x, mask)x = self.norm1(x + self.dropout(attn_output))# 2. 前饋網絡子層ff_output = self.feed_forward(x)x = self.norm2(x + self.dropout(ff_output))return x
2. 解碼器(Decoder)
class TransformerDecoder:def __init__(self, d_model, n_heads, d_ff, dropout=0.1):# 三個子層self.masked_self_attention = MultiHeadAttention(d_model, n_heads)self.cross_attention = MultiHeadAttention(d_model, n_heads)self.feed_forward = FeedForward(d_model, d_ff)# 層歸一化self.norm1 = LayerNorm(d_model)self.norm2 = LayerNorm(d_model)self.norm3 = LayerNorm(d_model)self.dropout = Dropout(dropout)def forward(self, x, encoder_output, src_mask=None, tgt_mask=None):# 1. 掩碼自注意力attn1 = self.masked_self_attention(x, x, x, tgt_mask)x = self.norm1(x + self.dropout(attn1))# 2. 交叉注意力attn2 = self.cross_attention(x, encoder_output, encoder_output, src_mask)x = self.norm2(x + self.dropout(attn2))# 3. 前饋網絡ff_output = self.feed_forward(x)x = self.norm3(x + self.dropout(ff_output))return x

位置編碼(Positional Encoding)

由于Transformer沒有循環結構,需要顯式注入位置信息:

def positional_encoding(seq_len, d_model):position = np.arange(seq_len)[:, np.newaxis]div_term = np.exp(np.arange(0, d_model, 2) * -(np.log(10000.0) / d_model))pos_encoding = np.zeros((seq_len, d_model))pos_encoding[:, 0::2] = np.sin(position * div_term)pos_encoding[:, 1::2] = np.cos(position * div_term)return pos_encoding

Transformer的優勢

  1. 并行計算:所有位置可以同時計算
  2. 長距離依賴:直接建模任意兩個位置的關系
  3. 可解釋性:注意力權重提供了可視化依據
  4. 遷移學習:預訓練模型可以適應多種下游任務

時間序列預測應用 {#時間序列}

傳統時間序列預測

1. 問題定義
  • 單變量預測:預測單一時間序列的未來值
  • 多變量預測:同時預測多個相關時間序列
  • 多步預測:預測未來多個時間點
2. 數據預處理
class TimeSeriesPreprocessor:def __init__(self, window_size, horizon):self.window_size = window_sizeself.horizon = horizondef create_sequences(self, data):X, y = [], []for i in range(len(data) - self.window_size - self.horizon + 1):X.append(data[i:i + self.window_size])y.append(data[i + self.window_size:i + self.window_size + self.horizon])return np.array(X), np.array(y)def normalize(self, data):self.mean = np.mean(data)self.std = np.std(data)return (data - self.mean) / self.stddef denormalize(self, data):return data * self.std + self.mean

RNN/LSTM時間序列模型

1. 單步預測LSTM
class LSTMForecaster(nn.Module):def __init__(self, input_size, hidden_size, num_layers, output_size):super().__init__()self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)self.linear = nn.Linear(hidden_size, output_size)def forward(self, x):# x: [batch, seq_len, input_size]lstm_out, (h_n, c_n) = self.lstm(x)# 使用最后一個時間步的輸出predictions = self.linear(lstm_out[:, -1, :])return predictions
2. Seq2Seq模型
class Seq2SeqForecaster(nn.Module):def __init__(self, input_size, hidden_size, output_size, horizon):super().__init__()self.encoder = nn.LSTM(input_size, hidden_size, batch_first=True)self.decoder = nn.LSTM(output_size, hidden_size, batch_first=True)self.output_layer = nn.Linear(hidden_size, output_size)self.horizon = horizondef forward(self, x):# 編碼歷史序列_, (h_n, c_n) = self.encoder(x)# 解碼預測序列decoder_input = torch.zeros(x.size(0), 1, self.output_layer.out_features)predictions = []for _ in range(self.horizon):output, (h_n, c_n) = self.decoder(decoder_input, (h_n, c_n))prediction = self.output_layer(output)predictions.append(prediction)decoder_input = predictionreturn torch.cat(predictions, dim=1)

Transformer時間序列模型

1. Temporal Fusion Transformer (TFT)
class TemporalFusionTransformer:"""Google提出的時間序列預測專用Transformer"""def __init__(self, config):# 變量選擇網絡self.vsn = VariableSelectionNetwork(config)# 門控殘差網絡self.grn = GatedResidualNetwork(config)# 多頭注意力self.attention = InterpretableMultiHeadAttention(config)# 位置編碼self.positional_encoding = PositionalEncoding(config)# 量化損失self.quantile_loss = QuantileLoss(config.quantiles)def forward(self, x_static, x_historical, x_future):# 1. 變量選擇selected_historical = self.vsn(x_historical)selected_future = self.vsn(x_future)# 2. 靜態特征編碼static_encoding = self.grn(x_static)# 3. LSTM編碼歷史信息historical_features = self.lstm_encoder(selected_historical)# 4. 自注意力機制temporal_features = self.attention(historical_features,static_context=static_encoding)# 5. 預測未來predictions = self.output_layer(temporal_features)return predictions
2. Autoformer
class Autoformer:"""基于自相關機制的Transformer變體"""def __init__(self, config):# 序列分解self.decomposition = SeriesDecomposition(config.kernel_size)# 自相關機制self.auto_correlation = AutoCorrelation(factor=config.factor,attention_dropout=config.dropout)# 編碼器self.encoder = AutoformerEncoder(config)# 解碼器self.decoder = AutoformerDecoder(config)def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec):# 1. 分解輸入序列enc_seasonal, enc_trend = self.decomposition(x_enc)# 2. 編碼器處理enc_out = self.encoder(enc_seasonal, x_mark_enc)# 3. 解碼器生成預測seasonal_output, trend_output = self.decoder(x_dec, x_mark_dec, enc_out, enc_trend)# 4. 組合預測結果predictions = seasonal_output + trend_outputreturn predictions

時間序列預測的關鍵技術

1. 特征工程
def create_time_features(df, date_column):"""提取時間特征"""df['hour'] = df[date_column].dt.hourdf['dayofweek'] = df[date_column].dt.dayofweekdf['month'] = df[date_column].dt.monthdf['dayofyear'] = df[date_column].dt.dayofyeardf['weekofyear'] = df[date_column].dt.isocalendar().week# 周期性編碼df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)return df
2. 處理多尺度模式
class MultiScaleBlock(nn.Module):def __init__(self, scales=[1, 4, 8]):super().__init__()self.scales = scalesself.convs = nn.ModuleList([nn.Conv1d(in_channels, out_channels, kernel_size=s, stride=s)for s in scales])def forward(self, x):multi_scale_features = []for scale, conv in zip(self.scales, self.convs):features = conv(x)# 上采樣到原始尺度features = F.interpolate(features, size=x.size(-1))multi_scale_features.append(features)return torch.cat(multi_scale_features, dim=1)

計算機視覺應用 {#視覺}

視覺序列建模的挑戰

  1. 空間-時間建模:同時處理空間和時間維度
  2. 計算復雜度:視頻數據量巨大
  3. 長距離依賴:動作可能跨越多幀

RNN/LSTM在視覺中的應用

1. 圖像描述生成(Image Captioning)
class ImageCaptioningModel(nn.Module):def __init__(self, encoder, decoder, vocab_size, embed_dim, hidden_dim):super().__init__()# CNN編碼器self.encoder = encoder  # 例如:ResNetself.encoder_fc = nn.Linear(encoder.output_dim, hidden_dim)# LSTM解碼器self.embedding = nn.Embedding(vocab_size, embed_dim)self.lstm = nn.LSTM(embed_dim + hidden_dim, hidden_dim, batch_first=True)self.output_layer = nn.Linear(hidden_dim, vocab_size)# 注意力機制self.attention = AdditiveAttention(hidden_dim)def forward(self, images, captions=None):# 1. 編碼圖像features = self.encoder(images)  # [batch, features_dim, H, W]features = features.view(features.size(0), features.size(1), -1)features = features.permute(0, 2, 1)  # [batch, H*W, features_dim]# 2. 初始化LSTMh_0 = self.encoder_fc(features.mean(dim=1))  # 全局特征c_0 = torch.zeros_like(h_0)if self.training and captions is not None:# Teacher forcing訓練embedded = self.embedding(captions)outputs = []h_t, c_t = h_0.unsqueeze(0), c_0.unsqueeze(0)for t in range(embedded.size(1)):# 注意力機制context, _ = self.attention(h_t.squeeze(0), features)# LSTM步進input_t = torch.cat([embedded[:, t], context], dim=1)output, (h_t, c_t) = self.lstm(input_t.unsqueeze(1), (h_t, c_t))# 預測下一個詞prediction = self.output_layer(output.squeeze(1))outputs.append(prediction)return torch.stack(outputs, dim=1)else:# 推理時生成return self.generate(features, h_0, c_0)
2. 視頻理解
class VideoUnderstandingModel(nn.Module):def __init__(self, feature_extractor, hidden_dim, num_classes):super().__init__()# 3D CNN或2D CNN特征提取self.feature_extractor = feature_extractor# 雙向LSTMself.bilstm = nn.LSTM(feature_extractor.output_dim,hidden_dim,bidirectional=True,batch_first=True)# 時間注意力self.temporal_attention = nn.Sequential(nn.Linear(hidden_dim * 2, hidden_dim),nn.Tanh(),nn.Linear(hidden_dim, 1))# 分類器self.classifier = nn.Linear(hidden_dim * 2, num_classes)def forward(self, video_frames):# 1. 提取幀特征batch_size, num_frames = video_frames.shape[:2]frame_features = []for t in range(num_frames):features = self.feature_extractor(video_frames[:, t])frame_features.append(features)frame_features = torch.stack(frame_features, dim=1)# 2. 時序建模lstm_out, _ = self.bilstm(frame_features)# 3. 時間注意力聚合attention_weights = self.temporal_attention(lstm_out)attention_weights = F.softmax(attention_weights, dim=1)# 加權平均video_representation = (lstm_out * attention_weights).sum(dim=1)# 4. 分類output = self.classifier(video_representation)return output

Vision Transformer (ViT)

1. 基礎ViT架構
class VisionTransformer(nn.Module):def __init__(self, img_size=224, patch_size=16, in_channels=3,embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0,num_classes=1000):super().__init__()# Patch embeddingself.patch_embed = PatchEmbedding(img_size, patch_size, in_channels, embed_dim)num_patches = (img_size // patch_size) ** 2# 位置編碼self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))# CLS tokenself.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))# Transformer編碼器self.transformer = nn.ModuleList([TransformerBlock(embed_dim, num_heads, mlp_ratio)for _ in range(depth)])# 分類頭self.norm = nn.LayerNorm(embed_dim)self.head = nn.Linear(embed_dim, num_classes)def forward(self, x):# 1. Patch embeddingx = self.patch_embed(x)  # [B, num_patches, embed_dim]# 2. 添加CLS tokencls_tokens = self.cls_token.expand(x.shape[0], -1, -1)x = torch.cat((cls_tokens, x), dim=1)# 3. 添加位置編碼x = x + self.pos_embed# 4. Transformer編碼器for block in self.transformer:x = block(x)# 5. 提取CLS token用于分類x = self.norm(x)cls_token_final = x[:, 0]# 6. 分類output = self.head(cls_token_final)return output
2. Patch Embedding實現
class PatchEmbedding(nn.Module):def __init__(self, img_size, patch_size, in_channels, embed_dim):super().__init__()self.img_size = img_sizeself.patch_size = patch_sizeself.num_patches = (img_size // patch_size) ** 2# 使用卷積實現patch embeddingself.proj = nn.Conv2d(in_channels, embed_dim,kernel_size=patch_size, stride=patch_size)def forward(self, x):# x: [B, C, H, W]x = self.proj(x)  # [B, embed_dim, H/P, W/P]x = x.flatten(2)  # [B, embed_dim, num_patches]x = x.transpose(1, 2)  # [B, num_patches, embed_dim]return x

視覺Transformer的改進

1. Swin Transformer
class SwinTransformer(nn.Module):"""具有層次結構和滑動窗口的Vision Transformer"""def __init__(self, img_size, patch_size, embed_dim, depths, num_heads):super().__init__()# Patch分區self.patch_partition = PatchPartition(patch_size)# 多個階段self.stages = nn.ModuleList()for i, (depth, num_head) in enumerate(zip(depths, num_heads)):stage = nn.ModuleList([SwinTransformerBlock(dim=embed_dim * (2 ** i),num_heads=num_head,window_size=7,shift_size=0 if j % 2 == 0 else 3)for j in range(depth)])self.stages.append(stage)# Patch merging(除了最后一個階段)if i < len(depths) - 1:self.stages.append(PatchMerging(embed_dim * (2 ** i)))def forward(self, x):x = self.patch_partition(x)for stage in self.stages:if isinstance(stage, nn.ModuleList):for block in stage:x = block(x)else:x = stage(x)  # Patch mergingreturn x
2. DETR(Detection Transformer)
class DETR(nn.Module):"""目標檢測Transformer"""def __init__(self, backbone, transformer, num_classes, num_queries=100):super().__init__()self.backbone = backboneself.conv = nn.Conv2d(backbone.num_channels, hidden_dim, 1)self.transformer = transformer# 目標查詢self.query_embed = nn.Embedding(num_queries, hidden_dim)# 預測頭self.class_embed = nn.Linear(hidden_dim, num_classes + 1)  # +1 for no objectself.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3)def forward(self, images):# 1. CNN backbone提取特征features = self.backbone(images)# 2. 投影到transformer維度h = self.conv(features)# 3. 添加位置編碼pos_embed = self.positional_encoding(h)# 4. Transformerhs = self.transformer(self.flatten(h),query_embed=self.query_embed.weight,pos_embed=self.flatten(pos_embed))# 5. 預測類別和邊界框outputs_class = self.class_embed(hs)outputs_coord = self.bbox_embed(hs).sigmoid()return {'pred_logits': outputs_class, 'pred_boxes': outputs_coord}

大語言模型應用 {#語言模型}

語言模型的演進

1. 從N-gram到神經網絡語言模型
  • N-gram模型:基于統計的方法
  • 詞向量:Word2Vec, GloVe
  • RNN語言模型:處理變長序列
  • Transformer語言模型:并行化和長距離依賴

GPT系列(生成式預訓練)

1. GPT架構
class GPT(nn.Module):def __init__(self, vocab_size, n_layer, n_head, n_embd, block_size):super().__init__()# Token和位置嵌入self.token_embedding = nn.Embedding(vocab_size, n_embd)self.position_embedding = nn.Embedding(block_size, n_embd)# Transformer塊self.blocks = nn.ModuleList([TransformerBlock(n_embd, n_head) for _ in range(n_layer)])# 最終層歸一化和輸出投影self.ln_f = nn.LayerNorm(n_embd)self.lm_head = nn.Linear(n_embd, vocab_size, bias=False)self.block_size = block_sizedef forward(self, idx, targets=None):B, T = idx.shape# Token和位置嵌入tok_emb = self.token_embedding(idx)pos_emb = self.position_embedding(torch.arange(T, device=idx.device))x = tok_emb + pos_emb# Transformer塊for block in self.blocks:x = block(x)# 最終處理x = self.ln_f(x)logits = self.lm_head(x)# 計算損失(如果有目標)loss = Noneif targets is not None:loss = F.cross_entropy(logits.view(-1, logits.size(-1)),targets.view(-1))return logits, lossdef generate(self, idx, max_new_tokens, temperature=1.0, top_k=None):"""自回歸生成"""for _ in range(max_new_tokens):# 裁剪序列到塊大小idx_cond = idx if idx.size(1) <= self.block_size else idx[:, -self.block_size:]# 前向傳播logits, _ = self(idx_cond)logits = logits[:, -1, :] / temperature# 可選的top-k采樣if top_k is not None:v, _ = torch.topk(logits, top_k)logits[logits < v[:, [-1]]] = -float('Inf')# Softmax和采樣probs = F.softmax(logits, dim=-1)idx_next = torch.multinomial(probs, num_samples=1)# 拼接idx = torch.cat((idx, idx_next), dim=1)return idx
2. GPT訓練技巧
class GPTTrainer:def __init__(self, model, train_dataset, config):self.model = modelself.train_dataset = train_datasetself.config = config# 優化器self.optimizer = self.configure_optimizers()# 學習率調度self.scheduler = CosineAnnealingLR(self.optimizer,T_max=config.max_iters)def configure_optimizers(self):"""配置AdamW優化器,with weight decay fix"""decay = set()no_decay = set()for name, param in self.model.named_parameters():if 'bias' in name or 'ln' in name or 'embedding' in name:no_decay.add(name)else:decay.add(name)param_groups = [{'params': [p for n, p in self.model.named_parameters() if n in decay],'weight_decay': self.config.weight_decay},{'params': [p for n, p in self.model.named_parameters() if n in no_decay],'weight_decay': 0.0}]optimizer = torch.optim.AdamW(param_groups,lr=self.config.learning_rate,betas=(0.9, 0.95))return optimizer

BERT系列(雙向編碼器)

1. BERT預訓練
class BERT(nn.Module):def __init__(self, vocab_size, hidden_size, num_layers, num_heads, max_len):super().__init__()# 嵌入層self.token_embedding = nn.Embedding(vocab_size, hidden_size)self.position_embedding = nn.Embedding(max_len, hidden_size)self.segment_embedding = nn.Embedding(2, hidden_size)# Transformer編碼器self.encoder = nn.ModuleList([TransformerEncoderLayer(hidden_size, num_heads)for _ in range(num_layers)])# 預訓練任務頭self.mlm_head = nn.Linear(hidden_size, vocab_size)  # MLMself.nsp_head = nn.Linear(hidden_size, 2)  # NSPdef forward(self, input_ids, segment_ids, attention_mask, mlm_labels=None, nsp_labels=None):# 嵌入seq_len = input_ids.size(1)pos_ids = torch.arange(seq_len, device=input_ids.device)embeddings = (self.token_embedding(input_ids) +self.position_embedding(pos_ids) +self.segment_embedding(segment_ids))# 編碼hidden_states = embeddingsfor encoder_layer in self.encoder:hidden_states = encoder_layer(hidden_states, attention_mask)# MLM預測mlm_logits = self.mlm_head(hidden_states)# NSP預測(使用CLS token)nsp_logits = self.nsp_head(hidden_states[:, 0])# 計算損失total_loss = 0if mlm_labels is not None:mlm_loss = F.cross_entropy(mlm_logits.view(-1, self.config.vocab_size),mlm_labels.view(-1),ignore_index=-100)total_loss += mlm_lossif nsp_labels is not None:nsp_loss = F.cross_entropy(nsp_logits, nsp_labels)total_loss += nsp_lossreturn {'loss': total_loss, 'mlm_logits': mlm_logits, 'nsp_logits': nsp_logits}
2. BERT微調
class BERTForSequenceClassification(nn.Module):def __init__(self, bert_model, num_classes, dropout=0.1):super().__init__()self.bert = bert_modelself.dropout = nn.Dropout(dropout)self.classifier = nn.Linear(bert_model.config.hidden_size, num_classes)def forward(self, input_ids, attention_mask, labels=None):# BERT編碼outputs = self.bert(input_ids, attention_mask=attention_mask)# 使用CLS token表示pooled_output = outputs.last_hidden_state[:, 0]pooled_output = self.dropout(pooled_output)# 分類logits = self.classifier(pooled_output)# 計算損失loss = Noneif labels is not None:loss = F.cross_entropy(logits, labels)return {'loss': loss, 'logits': logits}

T5(Text-to-Text Transfer Transformer)

class T5Model(nn.Module):"""統一的文本到文本框架"""def __init__(self, config):super().__init__()# 共享嵌入self.shared = nn.Embedding(config.vocab_size, config.d_model)# 編碼器self.encoder = T5Stack(config, embed_tokens=self.shared)# 解碼器self.decoder = T5Stack(config, embed_tokens=self.shared, is_decoder=True)# 語言模型頭self.lm_head = nn.Linear(config.d_model, config.vocab_size, bias=False)def forward(self, input_ids, decoder_input_ids, labels=None):# 編碼encoder_outputs = self.encoder(input_ids)# 解碼decoder_outputs = self.decoder(decoder_input_ids,encoder_hidden_states=encoder_outputs.last_hidden_state)# 預測lm_logits = self.lm_head(decoder_outputs.last_hidden_state)# 損失計算loss = Noneif labels is not None:loss = F.cross_entropy(lm_logits.view(-1, lm_logits.size(-1)),labels.view(-1),ignore_index=-100)return {'loss': loss, 'logits': lm_logits}

現代LLM的關鍵技術

1. 高效注意力機制
class FlashAttention(nn.Module):"""Flash Attention: 內存高效的精確注意力"""def forward(self, q, k, v, causal=False):# 使用分塊計算減少內存使用B, H, N, D = q.shape# 分塊大小BLOCK_SIZE = min(64, N)# 輸出初始化O = torch.zeros_like(q)L = torch.zeros((B, H, N, 1), device=q.device)for i in range(0, N, BLOCK_SIZE):# 加載Q塊q_block = q[:, :, i:i+BLOCK_SIZE]# 初始化塊輸出o_block = torch.zeros_like(q_block)l_block = torch.zeros((B, H, q_block.shape[2], 1), device=q.device)for j in range(0, N, BLOCK_SIZE):# 因果掩碼檢查if causal and j > i + BLOCK_SIZE:break# 加載KV塊k_block = k[:, :, j:j+BLOCK_SIZE]v_block = v[:, :, j:j+BLOCK_SIZE]# 計算注意力分數scores = torch.matmul(q_block, k_block.transpose(-2, -1))# 因果掩碼if causal and i >= j:mask = torch.triu(torch.ones_like(scores), diagonal=j-i+1)scores.masked_fill_(mask.bool(), float('-inf'))# 在線softmax更新m_block = scores.max(dim=-1, keepdim=True).valuesp_block = torch.exp(scores - m_block)l_block_new = p_block.sum(dim=-1, keepdim=True)# 更新輸出o_block = o_block * l_block + torch.matmul(p_block, v_block)l_block = l_block + l_block_newo_block = o_block / l_blockO[:, :, i:i+BLOCK_SIZE] = o_blockL[:, :, i:i+BLOCK_SIZE] = l_blockreturn O
2. 參數高效微調(PEFT)
class LoRALayer(nn.Module):"""Low-Rank Adaptation for efficient fine-tuning"""def __init__(self, in_features, out_features, rank=16, alpha=32):super().__init__()self.rank = rankself.alpha = alphaself.scaling = alpha / rank# 凍結的預訓練權重self.weight = nn.Parameter(torch.randn(out_features, in_features))self.weight.requires_grad = False# LoRA參數self.lora_A = nn.Parameter(torch.randn(rank, in_features))self.lora_B = nn.Parameter(torch.zeros(out_features, rank))# 初始化nn.init.kaiming_uniform_(self.lora_A, a=math.sqrt(5))def forward(self, x):# 原始前向傳播result = F.linear(x, self.weight)# 添加LoRAresult += (x @ self.lora_A.T @ self.lora_B.T) * self.scalingreturn result
3. 長上下文處理
class LongContextTransformer(nn.Module):"""處理超長上下文的技術"""def __init__(self, config):super().__init__()# RoPE位置編碼self.rotary_embedding = RotaryEmbedding(config.hidden_size)# 稀疏注意力模式self.attention_pattern = self.create_sparse_pattern(config.max_length)# 滑動窗口注意力self.window_size = config.window_sizedef create_sparse_pattern(self, seq_len):"""創建稀疏注意力模式"""pattern = torch.zeros(seq_len, seq_len)# 局部窗口for i in range(seq_len):start = max(0, i - self.window_size // 2)end = min(seq_len, i + self.window_size // 2)pattern[i, start:end] = 1# 全局token(每隔一定距離)stride = seq_len // 8pattern[::stride, :] = 1pattern[:, ::stride] = 1return pattern.bool()

實戰與優化 {#實戰}

模型訓練最佳實踐

1. 混合精度訓練
from torch.cuda.amp import autocast, GradScalerclass MixedPrecisionTrainer:def __init__(self, model, optimizer):self.model = modelself.optimizer = optimizerself.scaler = GradScaler()def train_step(self, batch):self.optimizer.zero_grad()# 自動混合精度with autocast():outputs = self.model(**batch)loss = outputs['loss']# 反向傳播with scalingself.scaler.scale(loss).backward()# 梯度裁剪self.scaler.unscale_(self.optimizer)torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)# 優化器步進self.scaler.step(self.optimizer)self.scaler.update()return loss.item()
2. 分布式訓練
class DistributedTrainer:def __init__(self, model, rank, world_size):# 初始化進程組dist.init_process_group(backend='nccl', rank=rank, world_size=world_size)# 模型并行self.model = nn.parallel.DistributedDataParallel(model.cuda(rank),device_ids=[rank],output_device=rank,find_unused_parameters=False)# 數據并行self.train_sampler = DistributedSampler(train_dataset,num_replicas=world_size,rank=rank)def train_epoch(self):self.train_sampler.set_epoch(epoch)  # 保證每個epoch的隨機性for batch in DataLoader(train_dataset, sampler=self.train_sampler):loss = self.train_step(batch)# 同步所有進程dist.barrier()

推理優化

1. 模型量化
class QuantizedModel:@staticmethoddef quantize_model(model, calibration_data):"""INT8量化"""model.eval()# 準備量化model.qconfig = torch.quantization.get_default_qconfig('fbgemm')torch.quantization.prepare(model, inplace=True)# 校準with torch.no_grad():for batch in calibration_data:model(batch)# 轉換為量化模型torch.quantization.convert(model, inplace=True)return model
2. KV Cache優化
class OptimizedDecoder:def __init__(self, model, max_cache_size=1024):self.model = modelself.kv_cache = {}self.max_cache_size = max_cache_sizedef generate_with_cache(self, input_ids, max_length):outputs = []for i in range(max_length):# 使用緩存的KVif i > 0:# 只計算新token的KVnew_token_id = input_ids[:, -1:]key, value = self.compute_kv(new_token_id, i)# 更新緩存self.kv_cache[i] = (key, value)else:# 初始計算keys, values = self.compute_all_kv(input_ids)self.kv_cache = {j: (keys[:, :, j], values[:, :, j]) for j in range(input_ids.size(1))}# 使用緩存的KV計算注意力output = self.attention_with_cache(input_ids[:, -1:])outputs.append(output)# 采樣下一個tokennext_token = self.sample(output)input_ids = torch.cat([input_ids, next_token], dim=1)# 緩存管理if len(self.kv_cache) > self.max_cache_size:self.evict_cache()return torch.cat(outputs, dim=1)

評估指標

1. 語言模型評估
def calculate_perplexity(model, eval_dataloader):"""計算困惑度"""model.eval()total_loss = 0total_tokens = 0with torch.no_grad():for batch in eval_dataloader:outputs = model(**batch)loss = outputs['loss']total_loss += loss.item() * batch['labels'].numel()total_tokens += batch['labels'].numel()avg_loss = total_loss / total_tokensperplexity = math.exp(avg_loss)return perplexitydef calculate_bleu(predictions, references):"""計算BLEU分數"""from nltk.translate.bleu_score import corpus_bleu# 分詞pred_tokens = [pred.split() for pred in predictions]ref_tokens = [[ref.split()] for ref in references]# 計算BLEUbleu_1 = corpus_bleu(ref_tokens, pred_tokens, weights=(1, 0, 0, 0))bleu_2 = corpus_bleu(ref_tokens, pred_tokens, weights=(0.5, 0.5, 0, 0))bleu_4 = corpus_bleu(ref_tokens, pred_tokens, weights=(0.25, 0.25, 0.25, 0.25))return {'bleu_1': bleu_1, 'bleu_2': bleu_2, 'bleu_4': bleu_4}
2. 時間序列評估
def time_series_metrics(predictions, targets):"""時間序列預測評估指標"""# MAEmae = torch.mean(torch.abs(predictions - targets))# MSEmse = torch.mean((predictions - targets) ** 2)# RMSErmse = torch.sqrt(mse)# MAPEmape = torch.mean(torch.abs((targets - predictions) / targets)) * 100# SMAPEsmape = 200 * torch.mean(torch.abs(predictions - targets) / (torch.abs(predictions) + torch.abs(targets)))return {'mae': mae.item(),'mse': mse.item(),'rmse': rmse.item(),'mape': mape.item(),'smape': smape.item()}

前沿發展 {#前沿}

最新研究方向

1. Mamba:狀態空間模型
class MambaBlock(nn.Module):"""線性復雜度的序列建模"""def __init__(self, d_model, d_state=16, d_conv=4, expand=2):super().__init__()self.d_model = d_modelself.d_state = d_stateself.d_conv = d_convself.expand = expandd_inner = int(self.expand * self.d_model)# 投影層self.in_proj = nn.Linear(d_model, d_inner * 2)# 卷積層self.conv1d = nn.Conv1d(d_inner, d_inner,kernel_size=d_conv,groups=d_inner,padding=d_conv - 1)# SSM參數self.x_proj = nn.Linear(d_inner, d_state + d_state + 1)self.dt_proj = nn.Linear(d_state, d_inner)# 輸出投影self.out_proj = nn.Linear(d_inner, d_model)def forward(self, x):"""選擇性狀態空間模型"""# 此處簡化了實現# 實際Mamba包含復雜的狀態空間計算return self.ssm(x)
2. RWKV:線性Transformer
class RWKV(nn.Module):"""Receptance Weighted Key Value - 線性復雜度的RNN式Transformer"""def __init__(self, n_embd, n_layer):super().__init__()self.blocks = nn.ModuleList([RWKVBlock(n_embd) for _ in range(n_layer)])def forward(self, x, state=None):for block in self.blocks:x, state = block(x, state)return x, state
3. Mixture of Experts (MoE)
class MoELayer(nn.Module):"""稀疏激活的專家混合層"""def __init__(self, d_model, n_experts, n_experts_per_token=2):super().__init__()self.experts = nn.ModuleList([FeedForward(d_model) for _ in range(n_experts)])self.gate = nn.Linear(d_model, n_experts)self.n_experts_per_token = n_experts_per_tokendef forward(self, x):# 計算路由概率gate_logits = self.gate(x)# Top-k路由weights, selected_experts = torch.topk(gate_logits, self.n_experts_per_token)weights = F.softmax(weights, dim=-1)# 稀疏計算results = torch.zeros_like(x)for i, expert in enumerate(self.experts):# 只對被選中的token運行專家expert_mask = (selected_experts == i).any(dim=-1)if expert_mask.any():expert_input = x[expert_mask]expert_output = expert(expert_input)# 加權組合expert_weight = weights[expert_mask, selected_experts[expert_mask] == i]results[expert_mask] += expert_output * expert_weight.unsqueeze(-1)return results

多模態模型

1. CLIP風格的視覺-語言模型
class VisionLanguageModel(nn.Module):"""對比學習的多模態模型"""def __init__(self, vision_encoder, text_encoder, projection_dim=512):super().__init__()self.vision_encoder = vision_encoderself.text_encoder = text_encoder# 投影頭self.vision_projection = nn.Linear(vision_encoder.output_dim, projection_dim)self.text_projection = nn.Linear(text_encoder.output_dim, projection_dim)# 溫度參數self.temperature = nn.Parameter(torch.ones(1) * 0.07)def forward(self, images, texts):# 編碼image_features = self.vision_encoder(images)text_features = self.text_encoder(texts)# 投影和歸一化image_embeds = F.normalize(self.vision_projection(image_features), dim=-1)text_embeds = F.normalize(self.text_projection(text_features), dim=-1)# 計算相似度logits_per_image = image_embeds @ text_embeds.T / self.temperaturelogits_per_text = text_embeds @ image_embeds.T / self.temperaturereturn logits_per_image, logits_per_text
2. 統一的多模態Transformer
class UnifiedMultiModalTransformer(nn.Module):"""處理多種模態的統一架構"""def __init__(self, config):super().__init__()# 模態特定的編碼器self.text_embedder = TextEmbedder(config)self.image_embedder = ImageEmbedder(config)self.audio_embedder = AudioEmbedder(config)# 共享Transformerself.transformer = nn.ModuleList([TransformerBlock(config) for _ in range(config.n_layers)])# 模態特定的解碼器self.text_head = TextGenerationHead(config)self.image_head = ImageGenerationHead(config)def forward(self, inputs, modality_mask):# 多模態嵌入embeddings = []if 'text' in inputs:embeddings.append(self.text_embedder(inputs['text']))if 'image' in inputs:embeddings.append(self.image_embedder(inputs['image']))if 'audio' in inputs:embeddings.append(self.audio_embedder(inputs['audio']))# 拼接所有模態x = torch.cat(embeddings, dim=1)# Transformer處理for block in self.transformer:x = block(x, modality_mask)return x

實用工具和框架

1. Hugging Face Transformers
from transformers import AutoModel, AutoTokenizer# 加載預訓練模型
model = AutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")# 使用
inputs = tokenizer("Hello world!", return_tensors="pt")
outputs = model(**inputs)
2. 自定義訓練循環
class CustomTrainer:def __init__(self, model, train_dataloader, eval_dataloader, config):self.model = modelself.train_dataloader = train_dataloaderself.eval_dataloader = eval_dataloaderself.config = config# 優化器和調度器self.optimizer = AdamW(model.parameters(), lr=config.learning_rate)self.scheduler = get_linear_schedule_with_warmup(self.optimizer,num_warmup_steps=config.warmup_steps,num_training_steps=config.total_steps)# 混合精度self.scaler = GradScaler() if config.fp16 else None# 記錄self.writer = SummaryWriter(config.log_dir)def train(self):global_step = 0for epoch in range(self.config.num_epochs):# 訓練self.model.train()for batch in tqdm(self.train_dataloader, desc=f"Epoch {epoch}"):loss = self.training_step(batch)# 記錄if global_step % self.config.log_interval == 0:self.writer.add_scalar('train/loss', loss, global_step)# 評估if global_step % self.config.eval_interval == 0:eval_metrics = self.evaluate()for key, value in eval_metrics.items():self.writer.add_scalar(f'eval/{key}', value, global_step)# 保存檢查點if global_step % self.config.save_interval == 0:self.save_checkpoint(global_step)global_step += 1def training_step(self, batch):self.optimizer.zero_grad()if self.scaler:with autocast():outputs = self.model(**batch)loss = outputs['loss']self.scaler.scale(loss).backward()self.scaler.unscale_(self.optimizer)torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)self.scaler.step(self.optimizer)self.scaler.update()else:outputs = self.model(**batch)loss = outputs['loss']loss.backward()torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)self.optimizer.step()self.scheduler.step()return loss.item()

1. 循序漸進的結構

  • 從序列模型的基礎概念開始
  • 逐步深入到每種架構的核心原理
  • 最后延伸到前沿技術和實戰應用

2. 理論與實踐結合

  • 每個概念都配有詳細的數學公式和原理解釋
  • 提供了大量可運行的Python/PyTorch代碼示例
  • 包含實際項目中的最佳實踐

3. 三大應用領域全覆蓋

  • 時間序列預測:從傳統方法到最新的Autoformer
  • 計算機視覺:從CNN+RNN到Vision Transformer
  • 大語言模型:從GPT到BERT,再到現代LLM技術

4. 實用性強

  • 包含模型訓練、優化和部署的實戰技巧
  • 提供了評估指標和調試方法
  • 介紹了主流框架和工具的使用

5. 前沿內容

  • 涵蓋了Mamba、RWKV等最新架構
  • 討論了MoE、多模態等熱門方向
  • 包含了Flash Attention、LoRA等優化技術

總結

從RNN到Transformer的深度學習序列模型,

涵蓋了:

  1. 基礎概念:序列建模的核心挑戰和解決方案
  2. 模型架構:RNN、LSTM、Transformer的詳細實現
  3. 應用領域:時間序列預測、計算機視覺、自然語言處理
  4. 實戰技巧:訓練優化、推理加速、評估方法
  5. 前沿發展:最新的研究方向和技術趨勢

學習建議

初學者路徑:

  1. 掌握RNN基礎概念和反向傳播
  2. 理解LSTM的門控機制
  3. 深入學習Transformer架構
  4. 實踐簡單的序列任務

進階路徑:

  1. 研究各種注意力機制變體
  2. 探索大規模預訓練技術
  3. 學習分布式訓練和優化
  4. 跟蹤最新論文和實現

實戰項目建議:

  1. 實現一個簡單的語言模型
  2. 構建時間序列預測系統
  3. 開發圖像描述生成應用
  4. 微調預訓練模型解決實際問題
  • 論文:Attention is All You Need, BERT, GPT系列
  • 課程:Stanford CS224N, Fast.ai
  • 框架:PyTorch, TensorFlow, JAX
  • 社區:Hugging Face, Papers with Code

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/web/95148.shtml
繁體地址,請注明出處:http://hk.pswp.cn/web/95148.shtml
英文地址,請注明出處:http://en.pswp.cn/web/95148.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

【Java進階】Java與SpringBoot線程池深度優化指南

Java與SpringBoot線程池深度優化指南Java與SpringBoot線程池深度優化指南一、Java原生線程池核心原理1. ThreadPoolExecutor 核心參數關鍵參數解析&#xff1a;2. 阻塞隊列選擇策略3. 拒絕策略對比二、SpringBoot線程池配置與優化1. 自動配置線程池2. 異步任務配置類3. 自定義異…

mysql(自寫)

Mysql介于應用和數據之間&#xff0c;通過一些設計 &#xff0c;將大量數據變成一張張像excel的數據表數據頁&#xff1a;mysql將數據拆成一個一個的數據頁索引&#xff1a;為每個頁加入頁號&#xff0c;再為每行數據加入序號&#xff0c;這個序號就是所謂的主鍵。 將每個頁的…

Nginx 502 Bad Gateway:從 upstream 日志到 FastCGI 超時復盤

Nginx 502 Bad Gateway&#xff1a;從 upstream 日志到 FastCGI 超時復盤 &#x1f31f; Hello&#xff0c;我是摘星&#xff01; &#x1f308; 在彩虹般絢爛的技術棧中&#xff0c;我是那個永不停歇的色彩收集者。 &#x1f98b; 每一個優化都是我培育的花朵&#xff0c;每一…

Dreamore AI-解讀并描繪你的夢境

本文轉載自&#xff1a;Dreamore AI-解讀并描繪你的夢境 - Hello123工具導航 ** 一、&#x1f319; 初識 Dreamore AI&#xff1a;你的智能夢境伴侶 Dreamore AI 是一款超有趣的AI 夢境解析與可視化工具&#xff0c;它巧妙地把夢境解讀和圖像生成這兩大功能融為一體。你只需要…

集合-單列集合(Collection)

List系列集合&#xff1a;添加的元素是有序、可重復、有索引的。Set系列集合&#xff1a;添加的元素是無序、不重復、無索引的。代碼&#xff1a;public class A01_CollectionDemo1 {public static void main(String[] args) {/** 注意點&#xff1a;Collection是一個接口&…

寫一個 RTX 5080 上的 cuda gemm fp16

1. cpu 計算 fp16 四則運算由于會用到cpu 的gemm 與 gpu gemm 的對比驗證&#xff0c;所以&#xff0c;這里稍微解釋一下 cpu 計算fp16 gemm 的過程。這里為了簡化理解&#xff0c;cpu 中不使用 avx 相關的 fp16 運算器&#xff0c;而是直接使用 cpu 原先的 ALU 功能。這里使用…

web滲透PHP反序列化漏洞

web滲透PHP反序列化漏洞1&#xff09;PHP反序列化漏洞反序列我們可以控制對象中的值進行攻擊O:1:"C":1:{s:3:"cmd";s:8:"ipconfig";}http://127.0.0.1/1.php?xO:1:%22C%22:1:{s:3:%22cmd%22;s:3:%22ver%22;}常見的反序列化魔術方法&#xff1a;…

FPGA學習筆記——SPI讀寫FLASH

目錄 一、任務 二、需求分析 三、Visio圖 四、具體分析 五、IP核配置 六、代碼 七、實驗現象 一、任務 實驗任務&#xff1a; 1.按下按鍵key1&#xff0c;開啟讀ID操作&#xff0c;將讀出來的ID&#xff0c;通過串口發送至PC端顯示&#xff0c;顯示格式為“讀ID:XX-XX-XX…

一句話PHP木馬——Web滲透測試中的隱形殺手

文章目錄前言什么是"一句話木馬"&#xff1f;常見變種與隱藏技巧1. 函數變種2. 加密混淆3. 變量拆分4. 特殊字符編碼上傳技巧與繞過防御常見上傳繞過技巧檢測與防御措施1. 服務器配置2. 上傳驗證3. 代碼審計4. Web應用防火墻(WAF)實戰案例分析深度思考&#xff1a;安…

房屋租賃系統|基于SpringBoot和Vue的房屋租賃系統(源碼+數據庫+文檔)

項目介紹 : SpringbootMavenMybatis PlusVue Element UIMysql 開發的前后端分離的房屋租賃系統&#xff0c;項目分為管理端和用戶端以及房主端 項目演示: 基于SpringBoot和Vue的房屋租賃系統 運行環境: 最好是java jdk 1.8&#xff0c;我們在這個平臺上運行的。其他版本理論上…

C++動態規劃——經典題目(下)

上一篇文章沒有寫全&#xff0c;這篇再補兩道題酒鬼#include<bits/stdc.h> using namespace std; int dp[110][10]{0}; int a[1010]{0}; int n,m; int main() {cin>>n;dp[0][0]0;dp[1][0]0;dp[1][1]a[1];for(int i1;i<n;i){cin>>a[i];}for(int i2;i<n;…

介紹Ansible和實施Ansible PlayBook

第一章 介紹Ansible1. ansible的特點是什么&#xff1f;a. ansible使用yaml語法&#xff0c;語言格式簡潔明了。b. ansible不需要代理&#xff0c;僅僅通過SSH遠程連接就可以控制受管主機&#xff0c;是一種非常便捷、安全的方法。c. Ansible的功能強大&#xff0c;可以利用ans…

ComfyUI驅動的流程化大體量程序開發:構建上下文隔離的穩定系統

ComfyUI驅動的流程化大體量程序開發&#xff1a;構建上下文隔離的穩定系統 在現代軟件工程中&#xff0c;隨著程序體量的不斷增長&#xff0c;上下文污染&#xff08;Context Pollution&#xff09;和狀態依賴混亂已成為導致系統不穩定、調試困難、維護成本高昂的核心問題。尤…

基于SpringBoot的協同過濾余弦函數的美食推薦系統(爬蟲Python)的設計與實現

基于SpringBootvue的協同過濾余弦函數的個性化美食(商城)推薦系統(爬蟲Python)的設計與實現 1、項目的設計初衷&#xff1a; 隨著互聯網技術的快速發展和人們生活水平的不斷提高&#xff0c;傳統的美食消費模式已經無法滿足現代消費者日益個性化和多樣化的需求。在信息爆炸的時…

機器視覺學習-day19-圖像亮度變換

1 亮度和對比度亮度&#xff1a;圖像像素的整體強度&#xff0c;亮度提高就是所有的像素加一個固定值。對比度&#xff1a;當對比度提高時&#xff0c;圖像的暗部與亮部的差值會變大。OpenCV調整圖像亮度和對比度的公式使用一個&#xff1a;代碼實踐步驟&#xff1a;圖片輸入→…

redis詳解 (最開始寫博客是寫redis 紀念日在寫一篇redis)

Redis技術 1. Redis簡介 定義與核心特性&#xff08;內存數據庫、鍵值存儲&#xff09; Redis&#xff08;Remote Dictionary Server&#xff0c;遠程字典服務&#xff09;是一個開源的、基于內存的高性能鍵值存儲數據庫&#xff0c;由 Salvatore Sanfilippo 編寫&#xff0c;用…

【MD文本編輯器Typora】實用工具推薦之——輕量級 Markdown 編輯器Typora下載安裝使用教程 辦公學習神器

本文將向大家介紹一款輕量級 Markdown 編輯器——Typora&#xff0c;并詳細說明其下載、安裝與基本使用方法。 引言&#xff1a; MD 格式文檔指的是使用 Markdown 語言編寫的文本文件&#xff0c;其文件擴展名為 .md。 Markdown 是一種由約翰格魯伯&#xff08;John Gruber&am…

Vue2+Element 初學

大致實現以上效果 一、左側自動加載菜單NavMenu.vue 菜單組件&#xff0c;簡單調整了一下菜單直接的距離&#xff0c;代碼如下&#xff1a;<template><div><template v-for"item in menus"><!-- 3、有子菜單&#xff0c;設置不同的 key 和 inde…

Shell編程知識整理

文章目錄一、Shell介紹1.1 簡介1.2 Shell解釋器二、快速入門2.1 編寫Shell腳本2.2 執行Shell腳本2.3 小結三、Shell程序&#xff1a;變量3.1 語法格式3.2 變量使用3.3 變量類型四、字符串4.1 單引號4.2 雙引號4.3 獲取字符串長度4.4 提取子字符串4.5 查找子字符串五、Shell程序…

AI與低代碼的激情碰撞:微軟Power Platform融合GPT-4實戰之旅

引言 在當今數字化飛速發展的時代,AI 與低代碼技術正成為推動企業變革的核心力量。AI 憑借其強大的數據分析、預測和決策能力,為企業提供了智能化的解決方案;而低代碼開發平臺則以其可視化、快速迭代的特性,大大降低了應用開發的門檻和成本。這兩者的結合,開啟了一場全新的…