根據 Fall 2023 學期的官方課程日程,這里是 CS?285 全課程的 Lecture 大綱及內容摘要,詳細對應周次和主題,方便你快速定位每節課要點、相關作業與視頻資源 🎯
官方課程地址
YouTobe 視頻地址
blibli視頻(帶中文字幕)
📅 CS?285 Fall 2023 全課程Lecture大綱
周次 | Lecture & 主題 | 內容摘要 |
---|---|---|
Week?1 | Lecture?1: Introduction & Course Overview | 課程介紹、RL基本背景、工業/研究趨勢分析 |
Week?2 | Lecture?2: Supervised Learning of Behaviors (Imitation Learning) | 行為克隆、DAgger、離線與在線模仿學習任務一(HW1) |
Lecture?3: PyTorch Tutorial | PyTorch基本用法,streamlined training pipeline | |
Week?3 | Lecture?4: Introduction to Reinforcement Learning | MDP、策略、價值函數基礎、Monte Carlo采樣 |
Week?4 | Lecture?5: Policy Gradients | REINFORCE算法、Likelihood-Ratio、本質推導、方差縮減 |
Lecture?6: Actor–Critic Algorithms | 基于 critic 的 actor-critic,G?AE,實例代碼講解 | |
Week?5 | Lecture?7: Value Function Methods | TD λ、bootstrapping、策略評估手段 |
Lecture?8: Deep RL with Q?Functions | DQN、experience replay、target network、訓練穩定化 | |
Week?6 | Lecture?9: Advanced Policy Gradients | TRPO/PPO核心算法、KL約束、優勢估計與實現細節 |
Lecture?10: Optimal Control & Planning | 基于控制理論的導航/規劃方法(MPC)、線性系統控制 | |
Week?7 | Lecture?11: Model-Based Reinforcement Learning | 模型學習與模擬、預測模型結構與樣本效率 |
Lecture?12: Model-Based Policy Learning | 模型下的策略學習(包括DDP, iLQR等) | |
Week?8 | Lecture?13: Exploration I | 探索策略基本形式:ε-greedy, UCB, entropy bonus |
Lecture?14: Exploration II | Count-based、curiosity-driven、隨機網絡蒸餾 | |
Week?9 | Lecture?15: Offline Reinforcement Learning I | Offline RL 介紹,批訓練挑戰,BMIST等 |
Lecture?16: Offline Reinforcement Learning II | OOD泛化、約束優化、安全保障 | |
Week?10 | Lecture?17: Reinforcement Learning Theory Basics | 收斂性分析、樣本復雜度、策略優化幾何 |
Lecture?18: Variational Inference & Generative Models | VI基礎,control-as-inference鏈接 | |
Week?11 | Lecture?19: Connection between Inference and Control | 逆強化學習、最大熵控制、POMDP關系 |
Lecture?20: Inverse Reinforcement Learning | IRL核心算法:MaxEnt IRL、GAIL等 | |
Week?12 | Guest Lectures | 來自學術/工業專家專題分享(如 RLHF、DPO、Statistical RL) |
Week?13 | Lecture?21: RL with Sequence Models & Language Models | 序列RL、seq2seq RL、LLM 調優初探 |
Lecture?22: Meta-Learning and Transfer Learning | Meta-RL、跨任務泛化、Prompt調優、DPO & RLHFGuest | |
Week?14 | Lecture?23: Challenges & Open Problems | RL前沿挑戰:長期依賴、安全、公平性、效用函數等 |
作業對應:
作業github
- HW1 → Lecture?2 / 3
- HW2 → Lecture?5 / 6
- HW3 → Lecture?7–12
- HW4 → Lecture?11–18
- HW5 → Lecture?13–20