深度學習核心:從基礎到前沿的全面解析

🧠 深度學習核心:從基礎到前沿的全面解析

🚀 探索深度學習的核心技術棧,從神經網絡基礎到最新的Transformer架構


📋 目錄

  • 🔬 神經網絡基礎:從感知機到多層網絡
  • 🖼? 卷積神經網絡(CNN):圖像識別的利器
  • 🔄 循環神經網絡(RNN/LSTM/GRU):序列數據處理
  • ? 注意力機制與Transformer架構
  • 🎯 總結與展望

🔬 神經網絡基礎:從感知機到多層網絡

🧮 感知機:神經網絡的起點

感知機是最簡單的神經網絡模型,由Frank Rosenblatt在1957年提出。它模擬了生物神經元的基本功能。

import numpy as np
import matplotlib.pyplot as pltclass Perceptron:def __init__(self, learning_rate=0.01, n_iterations=1000):self.learning_rate = learning_rateself.n_iterations = n_iterationsdef fit(self, X, y):# 初始化權重和偏置self.weights = np.zeros(X.shape[1])self.bias = 0for _ in range(self.n_iterations):for idx, x_i in enumerate(X):# 計算線性輸出linear_output = np.dot(x_i, self.weights) + self.bias# 激活函數(階躍函數)y_predicted = self.activation_function(linear_output)# 更新權重和偏置update = self.learning_rate * (y[idx] - y_predicted)self.weights += update * x_iself.bias += updatedef predict(self, X):linear_output = np.dot(X, self.weights) + self.biaspredictions = self.activation_function(linear_output)return predictionsdef activation_function(self, x):return np.where(x >= 0, 1, 0)# 示例:AND門實現
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 0, 0, 1])  # AND門真值表perceptron = Perceptron(learning_rate=0.1, n_iterations=10)
perceptron.fit(X, y)print("AND門預測結果:")
for i in range(len(X)):prediction = perceptron.predict(X[i].reshape(1, -1))print(f"輸入: {X[i]}, 預測: {prediction[0]}, 實際: {y[i]}")

🏗? 多層感知機(MLP)

多層感知機通過增加隱藏層,解決了單層感知機無法處理非線性問題的局限性。

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScalerclass MLP(nn.Module):def __init__(self, input_size, hidden_sizes, output_size, dropout_rate=0.2):super(MLP, self).__init__()layers = []prev_size = input_size# 構建隱藏層for hidden_size in hidden_sizes:layers.extend([nn.Linear(prev_size, hidden_size),nn.ReLU(),nn.BatchNorm1d(hidden_size),nn.Dropout(dropout_rate)])prev_size = hidden_size# 輸出層layers.append(nn.Linear(prev_size, output_size))self.network = nn.Sequential(*layers)def forward(self, x):return self.network(x)# 生成示例數據
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# 數據標準化
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)# 轉換為PyTorch張量
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.LongTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.LongTensor(y_test)# 創建模型
model = MLP(input_size=20, hidden_sizes=[64, 32, 16], output_size=2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)# 訓練模型
num_epochs = 100
for epoch in range(num_epochs):model.train()optimizer.zero_grad()outputs = model(X_train_tensor)loss = criterion(outputs, y_train_tensor)loss.backward()optimizer.step()if (epoch + 1) % 20 == 0:model.eval()with torch.no_grad():test_outputs = model(X_test_tensor)_, predicted = torch.max(test_outputs.data, 1)accuracy = (predicted == y_test_tensor).sum().item() / len(y_test_tensor)print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}, Test Accuracy: {accuracy:.4f}')

🎯 激活函數詳解

激活函數為神經網絡引入非線性,是深度學習的關鍵組件。

import numpy as np
import matplotlib.pyplot as pltdef sigmoid(x):return 1 / (1 + np.exp(-np.clip(x, -500, 500)))def tanh(x):return np.tanh(x)def relu(x):return np.maximum(0, x)def leaky_relu(x, alpha=0.01):return np.where(x > 0, x, alpha * x)def swish(x):return x * sigmoid(x)def gelu(x):return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))# 繪制激活函數
x = np.linspace(-5, 5, 1000)plt.figure(figsize=(15, 10))activations = {'Sigmoid': sigmoid,'Tanh': tanh,'ReLU': relu,'Leaky ReLU': leaky_relu,'Swish': swish,'GELU': gelu
}for i, (name, func) in enumerate(activations.items(), 1):plt.subplot(2, 3, i)plt.plot(x, func(x), linewidth=2)plt.title(f'{name} Activation Function')plt.grid(True, alpha=0.3)plt.xlabel('Input')plt.ylabel('Output')plt.tight_layout()
plt.show()# 激活函數的導數(用于反向傳播)
def sigmoid_derivative(x):s = sigmoid(x)return s * (1 - s)def relu_derivative(x):return np.where(x > 0, 1, 0)def leaky_relu_derivative(x, alpha=0.01):return np.where(x > 0, 1, alpha)

🖼? 卷積神經網絡(CNN):圖像識別的利器 {#卷積神經網絡}

🔍 卷積層原理

卷積神經網絡通過卷積操作提取圖像的局部特征,具有平移不變性和參數共享的優勢。

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoaderclass ConvBlock(nn.Module):def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):super(ConvBlock, self).__init__()self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)self.bn = nn.BatchNorm2d(out_channels)self.relu = nn.ReLU(inplace=True)def forward(self, x):return self.relu(self.bn(self.conv(x)))class SimpleCNN(nn.Module):def __init__(self, num_classes=10):super(SimpleCNN, self).__init__()# 特征提取層self.features = nn.Sequential(ConvBlock(3, 32),ConvBlock(32, 32),nn.MaxPool2d(2, 2),nn.Dropout2d(0.25),ConvBlock(32, 64),ConvBlock(64, 64),nn.MaxPool2d(2, 2),nn.Dropout2d(0.25),ConvBlock(64, 128),ConvBlock(128, 128),nn.MaxPool2d(2, 2),nn.Dropout2d(0.25))# 分類器self.classifier = nn.Sequential(nn.AdaptiveAvgPool2d((1, 1)),nn.Flatten(),nn.Linear(128, 512),nn.ReLU(inplace=True),nn.Dropout(0.5),nn.Linear(512, num_classes))def forward(self, x):x = self.features(x)x = self.classifier(x)return x# 數據預處理
transform_train = transforms.Compose([transforms.RandomCrop(32, padding=4),transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])transform_test = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])# 加載CIFAR-10數據集
train_dataset = CIFAR10(root='./data', train=True, download=True, transform=transform_train)
test_dataset = CIFAR10(root='./data', train=False, download=True, transform=transform_test)train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=100, shuffle=False, num_workers=2)# 訓練函數
def train_model(model, train_loader, test_loader, num_epochs=10):device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')model.to(device)criterion = nn.CrossEntropyLoss()optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)for epoch in range(num_epochs):# 訓練階段model.train()running_loss = 0.0correct = 0total = 0for batch_idx, (data, target) in enumerate(train_loader):data, target = data.to(device), target.to(device)optimizer.zero_grad()output = model(data)loss = criterion(output, target)loss.backward()optimizer.step()running_loss += loss.item()_, predicted = output.max(1)total += target.size(0)correct += predicted.eq(target).sum().item()if batch_idx % 100 == 0:print(f'Epoch: {epoch+1}, Batch: {batch_idx}, Loss: {loss.item():.4f}')scheduler.step()# 測試階段model.eval()test_loss = 0test_correct = 0test_total = 0with torch.no_grad():for data, target in test_loader:data, target = data.to(device), target.to(device)output = model(data)test_loss += criterion(output, target).item()_, predicted = output.max(1)test_total += target.size(0)test_correct += predicted.eq(target).sum().item()train_acc = 100. * correct / totaltest_acc = 100. * test_correct / test_totalprint(f'Epoch {epoch+1}: Train Acc: {train_acc:.2f}%, Test Acc: {test_acc:.2f}%')# 創建并訓練模型
model = SimpleCNN(num_classes=10)
print("開始訓練CNN模型...")
# train_model(model, train_loader, test_loader, num_epochs=5)

🏛? 經典CNN架構

LeNet-5:CNN的先驅
class LeNet5(nn.Module):def __init__(self, num_classes=10):super(LeNet5, self).__init__()self.features = nn.Sequential(nn.Conv2d(1, 6, kernel_size=5),nn.Tanh(),nn.AvgPool2d(kernel_size=2),nn.Conv2d(6, 16, kernel_size=5),nn.Tanh(),nn.AvgPool2d(kernel_size=2))self.classifier = nn.Sequential(nn.Linear(16 * 5 * 5, 120),nn.Tanh(),nn.Linear(120, 84),nn.Tanh(),nn.Linear(84, num_classes))def forward(self, x):x = self.features(x)x = x.view(x.size(0), -1)x = self.classifier(x)return x
ResNet:殘差網絡
class ResidualBlock(nn.Module):def __init__(self, in_channels, out_channels, stride=1, downsample=None):super(ResidualBlock, self).__init__()self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)self.bn1 = nn.BatchNorm2d(out_channels)self.relu = nn.ReLU(inplace=True)self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,stride=1, padding=1, bias=False)self.bn2 = nn.BatchNorm2d(out_channels)self.downsample = downsampledef forward(self, x):identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)if self.downsample is not None:identity = self.downsample(x)out += identity  # 殘差連接out = self.relu(out)return outclass ResNet(nn.Module):def __init__(self, block, layers, num_classes=1000):super(ResNet, self).__init__()self.in_channels = 64self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)self.bn1 = nn.BatchNorm2d(64)self.relu = nn.ReLU(inplace=True)self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)self.layer1 = self._make_layer(block, 64, layers[0])self.layer2 = self._make_layer(block, 128, layers[1], stride=2)self.layer3 = self._make_layer(block, 256, layers[2], stride=2)self.layer4 = self._make_layer(block, 512, layers[3], stride=2)self.avgpool = nn.AdaptiveAvgPool2d((1, 1))self.fc = nn.Linear(512, num_classes)def _make_layer(self, block, out_channels, blocks, stride=1):downsample = Noneif stride != 1 or self.in_channels != out_channels:downsample = nn.Sequential(nn.Conv2d(self.in_channels, out_channels, kernel_size=1, stride=stride, bias=False),nn.BatchNorm2d(out_channels))layers = []layers.append(block(self.in_channels, out_channels, stride, downsample))self.in_channels = out_channelsfor _ in range(1, blocks):layers.append(block(out_channels, out_channels))return nn.Sequential(*layers)def forward(self, x):x = self.conv1(x)x = self.bn1(x)x = self.relu(x)x = self.maxpool(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)x = self.avgpool(x)x = torch.flatten(x, 1)x = self.fc(x)return x# 創建ResNet-18
def resnet18(num_classes=1000):return ResNet(ResidualBlock, [2, 2, 2, 2], num_classes)

🔄 循環神經網絡(RNN/LSTM/GRU):序列數據處理

🔗 基礎RNN

循環神經網絡專門處理序列數據,具有記憶能力。

import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as pltclass SimpleRNN(nn.Module):def __init__(self, input_size, hidden_size, output_size, num_layers=1):super(SimpleRNN, self).__init__()self.hidden_size = hidden_sizeself.num_layers = num_layersself.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)self.fc = nn.Linear(hidden_size, output_size)def forward(self, x):# 初始化隱藏狀態h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)# RNN前向傳播out, _ = self.rnn(x, h0)# 只使用最后一個時間步的輸出out = self.fc(out[:, -1, :])return out# 生成正弦波數據用于時間序列預測
def generate_sine_wave(seq_length, num_samples):X, y = [], []for _ in range(num_samples):start = np.random.uniform(0, 100)x = np.linspace(start, start + seq_length, seq_length)sine_wave = np.sin(x)X.append(sine_wave[:-1])  # 輸入序列y.append(sine_wave[-1])   # 預測目標return np.array(X), np.array(y)# 生成訓練數據
seq_length = 20
num_samples = 1000
X_train, y_train = generate_sine_wave(seq_length, num_samples)# 轉換為PyTorch張量
X_train = torch.FloatTensor(X_train).unsqueeze(-1)  # 添加特征維度
y_train = torch.FloatTensor(y_train)# 創建和訓練模型
model = SimpleRNN(input_size=1, hidden_size=50, output_size=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)# 訓練循環
num_epochs = 100
for epoch in range(num_epochs):model.train()optimizer.zero_grad()outputs = model(X_train)loss = criterion(outputs.squeeze(), y_train)loss.backward()optimizer.step()if (epoch + 1) % 20 == 0:print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.6f}')

🧠 LSTM:長短期記憶網絡

LSTM通過門控機制解決了傳統RNN的梯度消失問題。

class LSTMModel(nn.Module):def __init__(self, input_size, hidden_size, num_layers, output_size, dropout=0.2):super(LSTMModel, self).__init__()self.hidden_size = hidden_sizeself.num_layers = num_layersself.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=dropout)self.dropout = nn.Dropout(dropout)self.fc = nn.Linear(hidden_size, output_size)def forward(self, x):# 初始化隱藏狀態和細胞狀態h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)# LSTM前向傳播out, (hn, cn) = self.lstm(x, (h0, c0))# 應用dropoutout = self.dropout(out)# 使用最后一個時間步的輸出out = self.fc(out[:, -1, :])return out# 文本分類示例
class TextClassificationLSTM(nn.Module):def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers=2, dropout=0.3):super(TextClassificationLSTM, self).__init__()self.embedding = nn.Embedding(vocab_size, embedding_dim)self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, batch_first=True, dropout=dropout)self.dropout = nn.Dropout(dropout)self.fc = nn.Linear(hidden_dim, output_dim)def forward(self, x):# 詞嵌入embedded = self.embedding(x)# LSTM處理lstm_out, (hidden, cell) = self.lstm(embedded)# 使用最后一個隱藏狀態output = self.dropout(hidden[-1])output = self.fc(output)return output# 雙向LSTM
class BiLSTM(nn.Module):def __init__(self, input_size, hidden_size, num_layers, output_size):super(BiLSTM, self).__init__()self.hidden_size = hidden_sizeself.num_layers = num_layersself.lstm = nn.LSTM(input_size, hidden_size, num_layers,batch_first=True, bidirectional=True)self.fc = nn.Linear(hidden_size * 2, output_size)  # *2 因為是雙向def forward(self, x):# 初始化隱藏狀態(雙向需要 *2)h0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)c0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)out, _ = self.lstm(x, (h0, c0))# 連接前向和后向的最后輸出out = self.fc(out[:, -1, :])return out

? GRU:門控循環單元

GRU是LSTM的簡化版本,參數更少但性能相近。

class GRUModel(nn.Module):def __init__(self, input_size, hidden_size, num_layers, output_size, dropout=0.2):super(GRUModel, self).__init__()self.hidden_size = hidden_sizeself.num_layers = num_layersself.gru = nn.GRU(input_size, hidden_size, num_layers,batch_first=True, dropout=dropout)self.dropout = nn.Dropout(dropout)self.fc = nn.Linear(hidden_size, output_size)def forward(self, x):# 初始化隱藏狀態h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)# GRU前向傳播out, _ = self.gru(x, h0)# 應用dropout和全連接層out = self.dropout(out[:, -1, :])out = self.fc(out)return out# 序列到序列模型(Seq2Seq)
class Seq2SeqGRU(nn.Module):def __init__(self, input_size, hidden_size, output_size, num_layers=1):super(Seq2SeqGRU, self).__init__()# 編碼器self.encoder = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)# 解碼器self.decoder = nn.GRU(output_size, hidden_size, num_layers, batch_first=True)# 輸出層self.output_projection = nn.Linear(hidden_size, output_size)def forward(self, src, tgt=None, max_length=50):batch_size = src.size(0)# 編碼_, hidden = self.encoder(src)if self.training and tgt is not None:# 訓練時使用teacher forcingdecoder_output, _ = self.decoder(tgt, hidden)output = self.output_projection(decoder_output)else:# 推理時逐步生成outputs = []decoder_input = torch.zeros(batch_size, 1, self.output_projection.out_features)for _ in range(max_length):decoder_output, hidden = self.decoder(decoder_input, hidden)output = self.output_projection(decoder_output)outputs.append(output)decoder_input = outputoutput = torch.cat(outputs, dim=1)return output

? 注意力機制與Transformer架構

🎯 注意力機制原理

注意力機制允許模型在處理序列時關注最相關的部分。

import torch
import torch.nn as nn
import torch.nn.functional as F
import mathclass ScaledDotProductAttention(nn.Module):def __init__(self, d_model, dropout=0.1):super(ScaledDotProductAttention, self).__init__()self.d_model = d_modelself.dropout = nn.Dropout(dropout)def forward(self, query, key, value, mask=None):# 計算注意力分數scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(self.d_model)# 應用掩碼(如果提供)if mask is not None:scores = scores.masked_fill(mask == 0, -1e9)# 計算注意力權重attention_weights = F.softmax(scores, dim=-1)attention_weights = self.dropout(attention_weights)# 應用注意力權重output = torch.matmul(attention_weights, value)return output, attention_weightsclass MultiHeadAttention(nn.Module):def __init__(self, d_model, num_heads, dropout=0.1):super(MultiHeadAttention, self).__init__()assert d_model % num_heads == 0self.d_model = d_modelself.num_heads = num_headsself.d_k = d_model // num_headsself.w_q = nn.Linear(d_model, d_model)self.w_k = nn.Linear(d_model, d_model)self.w_v = nn.Linear(d_model, d_model)self.w_o = nn.Linear(d_model, d_model)self.attention = ScaledDotProductAttention(self.d_k, dropout)def forward(self, query, key, value, mask=None):batch_size = query.size(0)# 線性變換并重塑為多頭Q = self.w_q(query).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)K = self.w_k(key).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)V = self.w_v(value).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)# 應用注意力attn_output, attn_weights = self.attention(Q, K, V, mask)# 連接多頭輸出attn_output = attn_output.transpose(1, 2).contiguous().view(batch_size, -1, self.d_model)# 最終線性變換output = self.w_o(attn_output)return output, attn_weights

🏗? Transformer架構

class PositionalEncoding(nn.Module):def __init__(self, d_model, max_length=5000):super(PositionalEncoding, self).__init__()pe = torch.zeros(max_length, d_model)position = torch.arange(0, max_length, dtype=torch.float).unsqueeze(1)div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))pe[:, 0::2] = torch.sin(position * div_term)pe[:, 1::2] = torch.cos(position * div_term)pe = pe.unsqueeze(0).transpose(0, 1)self.register_buffer('pe', pe)def forward(self, x):return x + self.pe[:x.size(0), :]class TransformerBlock(nn.Module):def __init__(self, d_model, num_heads, d_ff, dropout=0.1):super(TransformerBlock, self).__init__()self.attention = MultiHeadAttention(d_model, num_heads, dropout)self.norm1 = nn.LayerNorm(d_model)self.norm2 = nn.LayerNorm(d_model)self.feed_forward = nn.Sequential(nn.Linear(d_model, d_ff),nn.ReLU(),nn.Dropout(dropout),nn.Linear(d_ff, d_model))self.dropout = nn.Dropout(dropout)def forward(self, x, mask=None):# 多頭自注意力 + 殘差連接attn_output, _ = self.attention(x, x, x, mask)x = self.norm1(x + self.dropout(attn_output))# 前饋網絡 + 殘差連接ff_output = self.feed_forward(x)x = self.norm2(x + self.dropout(ff_output))return xclass TransformerEncoder(nn.Module):def __init__(self, vocab_size, d_model, num_heads, num_layers, d_ff, max_length=5000, dropout=0.1):super(TransformerEncoder, self).__init__()self.d_model = d_modelself.embedding = nn.Embedding(vocab_size, d_model)self.pos_encoding = PositionalEncoding(d_model, max_length)self.transformer_blocks = nn.ModuleList([TransformerBlock(d_model, num_heads, d_ff, dropout)for _ in range(num_layers)])self.dropout = nn.Dropout(dropout)def forward(self, x, mask=None):# 詞嵌入 + 位置編碼x = self.embedding(x) * math.sqrt(self.d_model)x = self.pos_encoding(x)x = self.dropout(x)# 通過Transformer塊for transformer in self.transformer_blocks:x = transformer(x, mask)return x# 用于分類任務的完整Transformer模型
class TransformerClassifier(nn.Module):def __init__(self, vocab_size, d_model, num_heads, num_layers, d_ff, num_classes, max_length=512, dropout=0.1):super(TransformerClassifier, self).__init__()self.encoder = TransformerEncoder(vocab_size, d_model, num_heads, num_layers, d_ff, max_length, dropout)self.classifier = nn.Sequential(nn.Linear(d_model, d_model // 2),nn.ReLU(),nn.Dropout(dropout),nn.Linear(d_model // 2, num_classes))def forward(self, x, mask=None):# 編碼encoded = self.encoder(x, mask)# 全局平均池化pooled = encoded.mean(dim=1)# 分類output = self.classifier(pooled)return output# 創建模型示例
model = TransformerClassifier(vocab_size=10000,d_model=512,num_heads=8,num_layers=6,d_ff=2048,num_classes=2,max_length=512,dropout=0.1
)print(f"模型參數數量: {sum(p.numel() for p in model.parameters()):,}")

🎨 Vision Transformer (ViT)

將Transformer應用于計算機視覺任務。

class PatchEmbedding(nn.Module):def __init__(self, img_size=224, patch_size=16, in_channels=3, embed_dim=768):super(PatchEmbedding, self).__init__()self.img_size = img_sizeself.patch_size = patch_sizeself.num_patches = (img_size // patch_size) ** 2self.projection = nn.Conv2d(in_channels, embed_dim, kernel_size=patch_size, stride=patch_size)def forward(self, x):# x: (batch_size, channels, height, width)x = self.projection(x)  # (batch_size, embed_dim, num_patches_h, num_patches_w)x = x.flatten(2)        # (batch_size, embed_dim, num_patches)x = x.transpose(1, 2)   # (batch_size, num_patches, embed_dim)return xclass VisionTransformer(nn.Module):def __init__(self, img_size=224, patch_size=16, in_channels=3, num_classes=1000,embed_dim=768, num_heads=12, num_layers=12, mlp_ratio=4, dropout=0.1):super(VisionTransformer, self).__init__()self.patch_embedding = PatchEmbedding(img_size, patch_size, in_channels, embed_dim)num_patches = self.patch_embedding.num_patches# 類別token和位置編碼self.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))self.pos_embedding = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))# Transformer編碼器self.transformer_blocks = nn.ModuleList([TransformerBlock(embed_dim, num_heads, int(embed_dim * mlp_ratio), dropout)for _ in range(num_layers)])self.norm = nn.LayerNorm(embed_dim)self.head = nn.Linear(embed_dim, num_classes)self.dropout = nn.Dropout(dropout)def forward(self, x):batch_size = x.shape[0]# 圖像分塊和嵌入x = self.patch_embedding(x)# 添加類別tokencls_tokens = self.cls_token.expand(batch_size, -1, -1)x = torch.cat([cls_tokens, x], dim=1)# 添加位置編碼x = x + self.pos_embeddingx = self.dropout(x)# 通過Transformer塊for transformer in self.transformer_blocks:x = transformer(x)# 歸一化并分類x = self.norm(x)cls_token_final = x[:, 0]  # 使用類別tokenoutput = self.head(cls_token_final)return output# 創建ViT模型
vit_model = VisionTransformer(img_size=224,patch_size=16,in_channels=3,num_classes=1000,embed_dim=768,num_heads=12,num_layers=12
)print(f"ViT模型參數數量: {sum(p.numel() for p in vit_model.parameters()):,}")

🎯 總結與展望

📊 深度學習技術對比

技術優勢劣勢適用場景
CNN平移不變性、參數共享、局部特征提取對旋轉和縮放敏感圖像識別、計算機視覺
RNN/LSTM/GRU處理序列數據、記憶能力梯度消失、并行化困難自然語言處理、時間序列
Transformer并行化、長距離依賴、注意力機制計算復雜度高、需要大量數據機器翻譯、文本生成、多模態

🚀 未來發展趨勢

1. 模型效率優化
  • 模型壓縮:知識蒸餾、剪枝、量化
  • 輕量化架構:MobileNet、EfficientNet、DistilBERT
  • 神經架構搜索:AutoML、NAS
2. 多模態融合
  • 視覺-語言模型:CLIP、DALL-E、GPT-4V
  • 跨模態理解:圖像描述、視覺問答
  • 統一架構:通用多模態Transformer
3. 自監督學習
  • 對比學習:SimCLR、MoCo、SwAV
  • 掩碼語言模型:BERT、RoBERTa、DeBERTa
  • 生成式預訓練:GPT系列、T5

💡 實踐建議

🎯 選擇合適的架構
# 根據任務選擇模型的決策樹
def choose_model(task_type, data_type, data_size):if data_type == "image":if task_type == "classification":if data_size == "small":return "ResNet-18 或 EfficientNet-B0"else:return "ResNet-50/101 或 EfficientNet-B3/B5"elif task_type == "detection":return "YOLO 或 R-CNN 系列"elif task_type == "segmentation":return "U-Net 或 DeepLab"elif data_type == "text":if task_type == "classification":if data_size == "small":return "LSTM 或 簡單CNN"else:return "BERT 或 RoBERTa"elif task_type == "generation":return "GPT 或 T5"elif task_type == "translation":return "Transformer 或 mBART"elif data_type == "sequence":if task_type == "forecasting":return "LSTM 或 Transformer"elif task_type == "anomaly_detection":return "Autoencoder 或 LSTM-VAE"return "請提供更多信息"# 示例使用
print(choose_model("classification", "image", "large"))
print(choose_model("generation", "text", "large"))
🔧 訓練技巧
# 深度學習訓練最佳實踐
class TrainingBestPractices:@staticmethoddef setup_training():tips = {"數據預處理": ["數據歸一化/標準化","數據增強(圖像旋轉、文本回譯等)","處理類別不平衡","驗證數據質量"],"模型設計": ["使用預訓練模型","添加正則化(Dropout、BatchNorm)","合理設計網絡深度和寬度","使用殘差連接"],"訓練策略": ["學習率調度(余弦退火、步長衰減)","梯度裁剪防止梯度爆炸","早停防止過擬合","混合精度訓練加速"],"優化器選擇": ["Adam:通用選擇","AdamW:Transformer推薦","SGD+Momentum:CNN經典選擇","RAdam:魯棒的Adam變體"]}return tips# 打印訓練建議
for category, tips in TrainingBestPractices.setup_training().items():print(f"\n**{category}:**")for tip in tips:print(f"  ? {tip}")

🌟 結語

深度學習正在快速發展,從基礎的神經網絡到復雜的Transformer架構,每一項技術都在推動AI的邊界。掌握這些核心技術不僅需要理解理論原理,更需要大量的實踐經驗。


深度學習的未來充滿無限可能,讓我們一起在這個激動人心的領域中不斷探索和創新! 🚀?

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/web/88289.shtml
繁體地址,請注明出處:http://hk.pswp.cn/web/88289.shtml
英文地址,請注明出處:http://en.pswp.cn/web/88289.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

MySQL索引:數據庫的超級目錄

MySQL索引:數據庫的「超級目錄」 想象你有一本1000頁的百科全書,要快速找到某個知識點(如“光合作用”): ? 無索引:逐頁翻找 → 全表掃描(慢!)? 有索引:直接…

景觀橋 涵洞 城門等遮擋物對汽車安全性的影響數學建模和計算方法,需要收集那些數據

對高速公路景觀橋影響行車視距的安全問題進行數學建模,需要將物理幾何、動力學、概率統計和交通流理論結合起來。以下是分步驟的建模思路和關鍵模型:一、 核心建模目標 量化視距(Sight Distance, SD):計算實際可用視距…

Git 用戶名和郵箱配置指南:全局與項目級設置

查看全局配置 git config --global user.name # 查看全局name配置 git config --global user.email # 查看全局email配置 git config --global --list # 查看所有全局配置查看當前項目配置 git config user.name # 查看當前項目name配置 git config user.email # 查看當前項目…

視頻序列和射頻信號多模態融合算法Fusion-Vital解讀

視頻序列和射頻信號多模態融合算法Fusion-Vital解讀概述模型整體流程視頻幀時間差分歸一化TSM模塊視頻序列特征融合模塊跨模態特征融合模塊概述 最近看了Fusion-Vital的視頻-射頻(RGB-RF)融合Transformer模型。記錄一下,對于實際項目中的多模…

frp內網穿透下創建FTP(解決FTP“服務器回應不可路由的地址。使用服務器地址替代”錯誤)

使用寶塔面板,點擊FTP,下載Pure-FTPd插件 點擊Pure-FTPd插件,修改配置文件,找到PassivePortRange, 修改ftp被動端口范圍為39000 39003,我們只需要4個被動端口即可,多了不好在內網穿透frp的配置文件中增加…

STM32控制四自由度機械臂(SG90舵機)(硬件篇)(簡單易復刻)

1.前期硬件準備 2s鋰電池一個(用于供電),stm32f103c8t6最小系統板一個(主控板),兩個搖桿(用于搖桿模式),四個電位器(用于示教器模式)&#xff0c…

華為OD機試_2025 B卷_最差產品獎(Python,100分)(附詳細解題思路)

題目描述 A公司準備對他下面的N個產品評選最差獎, 評選的方式是首先對每個產品進行評分,然后根據評分區間計算相鄰幾個產品中最差的產品。 評選的標準是依次找到從當前產品開始前M個產品中最差的產品,請給出最差產品的評分序列。 輸入描述 第…

飛算JavaAI:重塑Java開發效率的智能引擎

飛算JavaAI:重塑Java開發效率的智能引擎 一、飛算JavaAI核心價值 飛算JavaAI是全球首款專注Java語言的智能開發助手,由飛算數智科技(深圳)有限公司研發。它通過AI大模型技術實現: 全流程自動化:從需求分析→軟件設計→代碼生成一氣呵成工程級代碼輸出:生成包含配置類、…

Java和Go各方面對比:現代編程語言的深度分析

Java和Go各方面對比:現代編程語言的深度分析 引言 在當今的軟件開發領域,選擇合適的編程語言對項目的成功至關重要。Java作為一門成熟的面向對象語言,已經在企業級開發中占據主導地位超過25年。而Go(Golang)作為Google…

CloudCanal:一款企業級實時數據同步、遷移工具

CloudCanal 是一款可視化的數據同步、遷移工具,可以幫助企業構建高質量數據管道,具備實時高效、精確互聯、穩定可拓展、一站式、混合部署、復雜數據轉換等優點。 應用場景 CloudCanal 可以幫助企業實現以下數據應用場景: 數據同步&#xff…

如何發現 Redis 中的 BigKey?

如何發現 Redis 中的 BigKey? Redis 因其出色的性能,常被用作緩存、消息隊列和會話存儲。然而,在 Redis 的使用過程中,BigKey 是一個不容忽視的問題。BigKey 指的是存儲了大量數據或包含大量成員的鍵。它們不僅會占用大量內存&…

Golang讀取ZIP壓縮包并顯示Gin靜態html網站

Golang讀取ZIP壓縮包并顯示Gin靜態html網站Golang讀取ZIP壓縮包并顯示Gin靜態html網站1. 讀取ZIP壓縮包2. 解壓并保存靜態文件3. 設置Gin靜態文件服務基本靜態文件服務使用StaticFS更精細控制單個靜態文件服務4. 完整實現示例5. 高級優化內存映射優化使用Gin-Static中間件6. 部…

參數列表分類法:基本參數與擴展參數的設計模式

摘要 本文提出了我設計的一種新的函數參數設計范式——參數列表分類法,將傳統的"單一參數列表"擴展為"多參數列表協同"模式。通過引入"基本參數列表"和"擴展參數列表"的概念,為復雜對象構建提供了更靈活、更具表…

Ajax之核心語法詳解

Ajax之核心語法詳解一、Ajax的核心原理與優勢1.1 什么是Ajax?1.2 Ajax的優勢二、XMLHttpRequest:Ajax的核心對象2.1 XHR的基本使用流程2.2 核心屬性與事件解析2.2.1 readyState:請求狀態2.2.2 status:HTTP狀態碼2.2.3 響應數據屬性…

ArcGIS 打開 nc 降雨量文件

1. 打開ArcToolbox,依次打開 多維工具 → 創建 NetCDF 柵格圖層,將 nc 文件拖入 輸入 NetCDF 文件輸入框,確認 X維度(經度)、Y維度(經度) 的變量名是否正確,點擊 確定。圖 1 加載nc文…

01-elasticsearch-搭個簡單的window服務-ik分詞器-簡單使用

1、elasticsearch下載地址 如果是其他版本可以嘗試修改鏈接中的版本信息下載 https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-windows-x86_64.zip 2、ik分詞器下載地址 ik分詞器下載的所有版本地址:Index of: analysis-ik/stable/…

[數據結構與算法] 優先隊列 | 最小堆 C++

下面是關于 C 中 std::priority_queue 的詳細說明,包括初始化、用法和常見的應用場景。什么是 priority_queue? priority_queue(優先隊列)是 C 標準庫中的一個容器適配器。它和普通隊列(queue)最大的不同在…

零基礎入門物聯網-遠程門禁開關:硬件介紹

一、成品展示 遠程門禁最終效果 二、項目介紹 整個項目主要是實際使用案例為主,根據自己日常生活中用到物聯網作品為原型,通過項目實例快速理解。項目分為兩部分:制作體驗和深入學習。 制作體驗部分 會提供所有項目資料及制作說明文檔&a…

軟件系統國產化改造開發層面,達夢(DM)數據庫改造問題記錄

本系統前(vue)后端(java spring boot)為列子,數據庫由MySQL--->DM(達夢),中間件為中創的國產化相關軟件,如tomcat、nginx、redis等。重點講數據庫及代碼端的更改,中間件在服務端以…

N8N與Dify:自動化與AI的完美搭配

“N8N”和“Dify”這兩個工具徹底理清楚,它們其實是兩個定位完全不同的開源平臺,各自擅長解決不同類型的問題,但也能協同工作。以下是詳細說明:1. n8n:工作流自動化平臺定位:n8n 是一個專注于跨系統連接與任…