LLMs之ICL：《Bayesian scaling laws for in-context learning》翻譯與解讀

導讀：這篇論文的核心議題是理解和建模大型語言模型（LLM）的上下文學習（ICL）能力。文章從貝葉斯學習的角度出發，提出了一套新的貝葉斯縮放定律來解釋和預測ICL的表現。

>> 背景痛點：上下文學習（ICL）是LLM的一種強大能力，無需額外訓練即可執行復雜任務，但現有研究對ICL性能與上下文示例數量之間的關系（ICL曲線）缺乏清晰的解釋和預測模型。

● 無法準確預測ICL曲線的形狀，這阻礙了對多樣本ICL策略的有效性評估、預測潛在的對齊失敗（例如多樣本越獄攻擊），以及確定抑制LLM不良行為所需微調的程度。

● 現有研究對ICL的底層學習機制存在多種假設（貝葉斯學習、梯度下降等），缺乏統一的理論框架。

● 后訓練方法（如微調）在提高LLM安全性方面效果有限，ICL容易使被抑制的行為重新出現，這需要更深入的理解。

>> 具體的解決方案：論文提出了一套貝葉斯縮放定律來建模ICL曲線。該定律基于以下假設：ICL近似于貝葉斯學習器。通過貝葉斯定理，該定律將預測準確率與上下文示例數量聯系起來，并包含可解釋的參數，用于表示任務先驗、學習效率和每個示例的概率。

>> 核心思路步驟：

● 貝葉斯模型的建立：將ICL建模為一個貝葉斯模型，包含符號集、任務集、任務先驗概率分布和似然函數。

● 貝葉斯定理的應用：利用貝葉斯定理更新任務后驗概率，隨著上下文示例數量的增加，后驗概率收斂到最可能的任務。

● ICL曲線的推導：推導出一個函數形式的貝葉斯縮放定律，該定律將上下文示例數量與下一個示例的預期概率聯系起來。

● 模型簡化和效率系數的引入：為了降低參數數量并考慮示例長度和信息量的影響，對原始定律進行了簡化，引入了ICL效率系數K。

● 參數綁定策略：為了減少無法觀測的參數數量，提出了兩種參數綁定策略：基于采樣和基于評分，降低了模型復雜度。

>> 優勢：

● 更高的精度：實驗結果表明，貝葉斯縮放定律在ICL曲線的插值和外推方面，都優于現有的基于冪律的縮放定律。

● 可解釋性：該定律的參數具有可解釋性，可以對任務先驗、學習效率和每個示例的概率進行分析，從而深入理解LLM的內部機制。

>> 結論和觀點：

● 貝葉斯縮放定律能夠有效地描述和預測LLM的ICL行為，無論是在人工合成的簡單數據集上，還是在真實世界的大型LLM和數據集上。

● 后訓練方法（如監督微調和偏好學習強化學習）主要影響任務先驗，而對模型對每個任務的知識影響較小，尤其是在模型規模較大的情況下。

● ICL能力隨模型規模的增加而增強，學習效率也更高。

● 指令微調降低了有害行為的任務先驗概率，但未能阻止多樣本越獄攻擊，說明單純的指令微調可能不足以提高LLM的安全性。

● 雖然論文結果支持LLM進行貝葉斯推理的觀點，但這并不構成嚴格的證明。LLM在真實世界中可能只近似地遵循貝葉斯行為。

總而言之，這篇論文提供了一種新的視角來理解和建模LLM的上下文學習能力，并提出了一種具有更高精度和可解釋性的貝葉斯縮放定律。該定律為研究和改進LLM的安全性以及對齊問題提供了有價值的工具。

《Bayesian scaling laws for in-context learning》翻譯與解讀

Abstract

1、Introduction

7、Conclusion

《Bayesian scaling laws for in-context learning》翻譯與解讀

地址	論文地址：https://arxiv.org/abs/2410.16531
時間	2024年10月21日，最新日期2024年11月2日
作者	斯坦福大學

Abstract

In-context learning (ICL) is a powerful technique for getting language models to perform complex tasks with no training updates. Prior work has established strong correlations between the number of in-context examples provided and the accuracy of the model's predictions. In this paper, we seek to explain this correlation by showing that ICL approximates a Bayesian learner. This perspective gives rise to a family of novel Bayesian scaling laws for ICL. In experiments with \mbox{GPT-2} models of different sizes, our scaling laws exceed or match existing scaling laws in accuracy while also offering interpretable terms for task priors, learning efficiency, and per-example probabilities. To illustrate the analytic power that such interpretable scaling laws provide, we report on controlled synthetic dataset experiments designed to inform real-world studies of safety alignment. In our experimental protocol, we use SFT to suppress an unwanted existing model capability and then use ICL to try to bring that capability back (many-shot jailbreaking). We then experiment on real-world instruction-tuned LLMs using capabilities benchmarks as well as a new many-shot jailbreaking dataset. In all cases, Bayesian scaling laws accurately predict the conditions under which ICL will cause the suppressed behavior to reemerge, which sheds light on the ineffectiveness of post-training at increasing LLM safety.

上下文學習（ICL）是一種強大的技術，可以讓語言模型在無需更新訓練的情況下執行復雜的任務。先前的工作已經證明，提供的上下文示例的數量與模型預測準確性的相關性很強。在這篇論文中，我們試圖通過證明ICL近似于貝葉斯學習者來解釋這種相關性。這種觀點產生了一系列新穎的貝葉斯縮放定律，用于ICL。在使用不同大小的GPT-2模型的實驗中，我們的縮放定律在精度上超過了或與現有的縮放定律相匹配，同時提供了可解釋的任務先驗、學習效率和單個示例概率的術語。為了展示這些可解釋的縮放定律的分析能力，我們報告了旨在為現實世界中的安全對齊研究提供信息的受控合成數據實驗。在我們的實驗協議中，我們使用SFT來抑制不想要的現有模型能力，然后使用ICL嘗試恢復該能力（多示例越獄）。然后，我們在使用能力基準以及一個新的多示例越獄數據集的現實世界指令調整LLM上進行實驗。在所有情況下，貝葉斯縮放定律都能準確預測ICL何時會導致被抑制的行為重新出現，這有助于闡明在提高LLM安全性方面，后訓練方法的無效性。

1、Introduction

Large language models (LLMs) can infer how to perform a task given only demonstrations and without additional training updates. This capability is known as in-context learning (ICL; Brown et al., 2020; Dong et al., 2022). Under ICL, task performance generally increases with the number of demonstrations, though the precise relationship between these two quantities is unclear. We call this relationship the ICL curve and seek to model it. Being able to predict the shape of the ICL curve would help us decide whether to do many-shot ICL Agarwal et al. (2024) after testing only few-shot performance, predict potential alignment failures under many-shot jailbreaking (Anil et al., 2024), and decide how much fine-tuning we need in order to suppress ICL of undesirable behaviours.

The learning algorithm underlying ICL has been characterised as Bayesian by Xie et al. (2022) and many later works (section 2). Drawing on this line of research, we use Bayes’ theorem to derive a family of Bayesian scaling laws for ICL (section 3) which model the ICL curve of an ideal Bayesian learner.

大型語言模型（LLMs）可以在僅提供示例的情況下，無需額外的訓練更新來推斷如何執行任務。這種能力被稱為上下文無關學習（ICL；Brown et al.， 2020; Dong et al.， 2022）。在ICL的情況下，隨著示例數量的增加，任務性能通常會提高，盡管這兩個量之間的確切關系尚不清楚。我們稱這種關系為ICL曲線，并試圖對其進行建模。能夠預測ICL曲線的形狀將有助于我們決定是否在僅測試了少量示例性能后進行ICL，預測在進行大量ICL解鎖時可能出現的對齊失敗（Anil et al.， 2024），并決定為了抑制不需要的行為的ICL需要進行多少微調。

ICL背后的學習算法已被Xie等人（2022）和其他許多后續工作（第2節）歸類為貝葉斯算法。借鑒這一研究線，我們使用貝葉斯定理推導出一組貝葉斯縮放定律（第3節），用于建模理想貝葉斯學習者的ICL曲線。

To evaluate the performance of our Bayesian laws, we model the ICL curve for gpt2 models trained on simple synthetic data following Xie et al. (2022) as well as real-world LLMs tested on standard benchmarks (section 4.1). Compared to the power laws proposed by Anil et al. (2024), our Bayesian laws achieve lower error rates on both interpolation and extrapolation of the ICL curve, while also providing interpretable parameters for the prior over tasks, the efficiency of ICL, and per-example probabilities under different tasks. In our second set of experiments (section 4.2), we present a case study using our Bayesian laws to model how post-training affects ICL of favoured and disfavoured behaviours. On toy models, we find that smaller amounts of post-training strongly change the prior over tasks but not the model’s knowledge of each task, and the amount of post-training needed to suppress ICL of disfavoured tasks increases with scale.

Finally, we present experiments on real-world LLMs ranging from 1B to 405B parameters (section 5). Our laws accurately predict the ICL behaviour of several models on both capabilities and safety benchmarks and a new many-shot jailbreaking dataset we introduce. We then compare Llama 3.1 8B Base and Instruct using one of our Bayesian scaling laws (section 5.2) and find that alignment merely reduces the prior probability of harmful behaviour but not its learnability under ICL. Our work thus introduces a tool for interpreting the task knowledge of LLMs using purely behavioural observations, which we hope is valuable for improving LLM alignment.

為了評估我們提出的貝葉斯定律的性能，我們按照Xie等人（2022）的方法以及對標準基準測試（第4.1節）進行測試的實際LLM模型，對gpt2模型在簡單合成數據上的ICL曲線進行了建模。與Anil等人（2024）提出的冪定律相比，我們的貝葉斯定律在ICL曲線的插值和外推方面具有更低的誤差率，同時為任務的先驗、ICL的效率以及不同任務下的每例概率提供了可解釋的參數。在第二組實驗（第4.2節）中，我們通過使用我們的貝葉斯定律來研究后訓練如何影響偏好和不偏好的行為的ICL。在玩具模型上，我們發現較小量的后訓練會強烈改變任務的先驗，但不會改變模型對每個任務的知識，并且抑制不偏好任務的ICL所需的后訓練量隨規模的增加而增加。最后，我們在從1B到405B參數的真實世界LLM上進行了實驗（第5節）。我們的定律準確地預測了幾種模型在能力和安全性基準上的ICL行為，以及我們引入的一個新的多示例越獄數據集。然后，我們使用其中一個貝葉斯縮放定律（第5.2節）將Llama 3.1 8B Base和Instruct進行比較，發現對齊只會降低有害行為的先驗概率，但在ICL下不會降低其可學習性。因此，我們的工作引入了一種僅基于行為觀察來解釋LLM任務知識的工具，我們希望這對改進LLM對齊是有價值的。

7、Conclusion

In this paper, we combined two questions to make progress at understanding ICL: (1) what scaling law best describes ICL, and (2) is ICL Bayesian? We showed that Bayesian assumptions naturally lead to a scaling law for ICL, and that Bayesian scaling laws are a great fit for both ICL behaviour by small LMs trained on controlled synthetic data, as well as LLMs trained on natural language. Using a Bayesian formulation gave us interpretable parameters for the prior, learning efficiency, and task-conditional probabilities, which can help us understand how model behaviour changes under alignment. We use these to show how ICL ability varies at different model scales, understand how finetuning harms knowledge of disfavoured distributions, and compare base and instruction-tuned LLMs. We are confident that further progress on understanding ICL is possible through the empirical science of scaling laws.

在這篇論文中，我們將兩個問題結合起來，以更好地理解ICL：

(1)描述ICL的最佳標度定律是什么？(2)ICL是貝葉斯的嗎？

我們證明了貝葉斯假設自然地導致了ICL的標度定律，并且貝葉斯標度定律非常適合由受控合成數據訓練的小型LM以及由自然語言訓練的LLM的ICL行為。采用貝葉斯形式使我們能夠解釋先驗、學習效率和任務條件概率等可解釋的參數，這有助于我們理解模型行為在對齊時的變化。我們使用這些參數來展示ICL能力在不同模型規模下的變化情況，了解微調如何損害對不受歡迎分布的了解，并比較基礎LLM和基于指令的LLM。我們相信，通過標度定律的實證科學，可以進一步推進對ICL的理解。