cs285學習筆記（一）：課程總覽

根據 Fall 2023 學期的官方課程日程，這里是 CS?285 全課程的 Lecture 大綱及內容摘要，詳細對應周次和主題，方便你快速定位每節課要點、相關作業與視頻資源 🎯

官方課程地址

YouTobe 視頻地址

blibli視頻(帶中文字幕)

📅 CS?285 Fall 2023 全課程Lecture大綱

周次	Lecture & 主題	內容摘要
Week?1	Lecture?1: Introduction & Course Overview	課程介紹、RL基本背景、工業/研究趨勢分析
Week?2	Lecture?2: Supervised Learning of Behaviors (Imitation Learning)	行為克隆、DAgger、離線與在線模仿學習任務一（HW1）
	Lecture?3: PyTorch Tutorial	PyTorch基本用法，streamlined training pipeline
Week?3	Lecture?4: Introduction to Reinforcement Learning	MDP、策略、價值函數基礎、Monte Carlo采樣
Week?4	Lecture?5: Policy Gradients	REINFORCE算法、Likelihood-Ratio、本質推導、方差縮減
	Lecture?6: Actor–Critic Algorithms	基于 critic 的 actor-critic，G?AE，實例代碼講解
Week?5	Lecture?7: Value Function Methods	TD λ、bootstrapping、策略評估手段
	Lecture?8: Deep RL with Q?Functions	DQN、experience replay、target network、訓練穩定化
Week?6	Lecture?9: Advanced Policy Gradients	TRPO/PPO核心算法、KL約束、優勢估計與實現細節
	Lecture?10: Optimal Control & Planning	基于控制理論的導航/規劃方法（MPC）、線性系統控制
Week?7	Lecture?11: Model-Based Reinforcement Learning	模型學習與模擬、預測模型結構與樣本效率
	Lecture?12: Model-Based Policy Learning	模型下的策略學習（包括DDP, iLQR等）
Week?8	Lecture?13: Exploration I	探索策略基本形式：ε-greedy, UCB, entropy bonus
	Lecture?14: Exploration II	Count-based、curiosity-driven、隨機網絡蒸餾
Week?9	Lecture?15: Offline Reinforcement Learning I	Offline RL 介紹，批訓練挑戰，BMIST等
	Lecture?16: Offline Reinforcement Learning II	OOD泛化、約束優化、安全保障
Week?10	Lecture?17: Reinforcement Learning Theory Basics	收斂性分析、樣本復雜度、策略優化幾何
	Lecture?18: Variational Inference & Generative Models	VI基礎，control-as-inference鏈接
Week?11	Lecture?19: Connection between Inference and Control	逆強化學習、最大熵控制、POMDP關系
	Lecture?20: Inverse Reinforcement Learning	IRL核心算法：MaxEnt IRL、GAIL等
Week?12	Guest Lectures	來自學術/工業專家專題分享（如 RLHF、DPO、Statistical RL）
Week?13	Lecture?21: RL with Sequence Models & Language Models	序列RL、seq2seq RL、LLM 調優初探
	Lecture?22: Meta-Learning and Transfer Learning	Meta-RL、跨任務泛化、Prompt調優、DPO & RLHFGuest
Week?14	Lecture?23: Challenges & Open Problems	RL前沿挑戰：長期依賴、安全、公平性、效用函數等

作業對應：

作業github

HW1 → Lecture?2 / 3
HW2 → Lecture?5 / 6
HW3 → Lecture?7–12
HW4 → Lecture?11–18
HW5 → Lecture?13–20

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/914491.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/914491.shtml
英文地址，請注明出處：http://en.pswp.cn/news/914491.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！