通義智文開源QwenLong-L1: 邁向長上下文大推理模型的強化學習

在這里插入圖片描述

🎉 動態

2025年5月26日: 🔥 我們正式發布🤗QwenLong-L1-32B——首個采用強化學習訓練、專攻長文本推理的LRM模型。在七項長文本文檔問答基準測試中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗艦LRM,達到與Claude-3.7-Sonnet-Thinking持平的水準,展現了當前最先進長文本推理模型的領先實力。

2025年5月26日: 🔥 我們同步開源🤗DocQA-RL-1.6K專項強化學習數據集,包含1,600道涵蓋數學演算、邏輯推理和多跳推理等領域的文檔問答題目。

📚 簡介

在本研究中,我們提出了QwenLong-L1,這是一種新穎的強化學習(RL)框架,旨在促進LRM從短上下文熟練度向穩健的長上下文泛化過渡。在我們的初步實驗中,我們展示了短上下文和長上下文推理RL訓練動態之間的差異。

在這里插入圖片描述

我們的框架通過強化學習訓練中的漸進式上下文擴展,增強了短上下文語言推理模型(LRM)的性能。該框架包含三個核心組件:用于初始化穩健策略的預熱監督微調(SFT)階段;通過課程引導的強化學習階段實現從短上下文到長上下文的穩定適應;以及難度感知的回溯采樣機制,通過動態調整各階段訓練復雜度來激勵策略探索。我們整合了包括GRPO和DAPO在內的最新強化學習算法,結合基于規則和基于模型的二元結果獎勵混合函數,以平衡精確率與召回率。在策略優化過程中,通過戰略性利用群體相對優勢,引導LRM學習對實現穩健長上下文錨定和卓越推理能力至關重要的有效推理模式。

在這里插入圖片描述

🎯 模型發布

我們發布了🤗 QwenLong-L1-32B,這是首個通過強化學習訓練、專為長文本推理設計的長上下文語言推理模型。在七項長文本文檔問答基準測試中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗艦語言推理模型,達到與Claude-3.7-Sonnet-Thinking相當的水準,展現出當前最先進語言推理模型中的領先性能。

以下是評估結果。

在這里插入圖片描述

🛠? 要求

# Create the conda environment
conda create -n qwenlongl1 python==3.10
conda activate qwenlongl1# Install requirements
pip3 install -r requirements.txt# Install verl
cd verl
pip3 install -e .# Install vLLM
pip3 install vllm==0.7.3 # Install flash-attn
pip3 install flash-attn --no-build-isolation

🚀 快速入門

以下是如何使用 🤗 Transformers 運行該模型:

from transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "Tongyi-Zhiwen/QwenLong-L1-32B"# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
)# prepare the model input
template = """Please read the following text and answer the question below.<text>
$DOC$
</text>$Q$Format your response as follows: "Therefore, the answer is (insert answer here)"."""
context = "<YOUR_CONTEXT_HERE>" 
question = "<YOUR_QUESTION_HERE>"
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completion
generated_ids = model.generate(**model_inputs,max_new_tokens=10000,temperature=0.7,top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try:# rindex finding 151649 (</think>)index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:index = 0thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")print("thinking content:", thinking_content)
print("content:", content)

🗂? 數據集

為了構建一個具有挑戰性的可驗證長文本推理強化學習數據集,我們開發了🤗DocQA-RL-1.6K,該數據集包含跨三個推理領域的1600個文檔問答問題:

(1) 數學推理:我們使用DocMath數據集中的600道問題,這些問題要求對財務報告等長專業文檔進行數值推理。對于DocMath數據集,我們從其驗證集中每個子集抽取75%條目用于訓練,25%用于評估;

(2) 邏輯推理:我們采用DeepSeek-R1合成了600道多選題,這些問題需要對我們精選的法律、金融、保險和生產領域真實文檔進行邏輯分析;

(3) 多跳推理:我們從MultiHopRAG選取200個樣本,從Musique選取200個樣本,重點關注跨文檔推理。

請下載以下數據集并放入./datasets/目錄用于訓練和評估。

強化學習訓練數據:🤗DocQA-RL-1.6K

評估數據:🤗docmath、🤗frames、🤗longbench

💻 訓練

我們為單階段強化學習訓練提供了基于DAPO的基礎演示代碼。

首先,我們應該啟動一個本地驗證器。

export CUDA_VISIBLE_DEVICES=0vllm serve "Qwen/Qwen2.5-1.5B-Instruct" \--host 0.0.0.0 \--port 23547

然后,我們開始使用4個節點進行強化學習訓練。

export PROJ_DIR="<YOUR_PROJ_DIR_HERE>"
export MASTER_IP="<YOUR_MASTER_IP_HERE>" # ray master ip
export NNODES=4 # total GPU nodes
export NODE_RANK=${RANK} # rank of current node
export PORT=6382
export WANDB_API_KEY="<YOUR_WANDB_API_KEY_HERE>"
export WANDB_PROJECT="QwenLong-L1"
export LLM_JUDGE=Y # 'Y': LLM JUDGE, 'N': RULE BASED
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
# verifier
export VERIFIER_PATH="Qwen/Qwen2.5-1.5B-Instruct"
export VERIFIER_HOST="<YOUR_VERIFIER_HOST_HERE>"
export VERIFIER_PORT="23547"ray_start_retry() {while true; doray start --address="${MASTER_IP}:${PORT}"if [ $? -eq 0 ]; thenbreakfiecho "Failed to connect to master, retrying in 5 seconds..."sleep 5done
}check_ray_status() {until ray status >/dev/null 2>&1; doecho "Waiting for Ray cluster to be ready..."sleep 5done
}if [ "$RANK" == "0" ]; thenecho "Starting HEAD node..."ray start --head --port=${PORT}check_ray_statusecho "Ray head node started successfully"elseecho "Starting WORKER node..."ray_start_retrycheck_ray_statusecho "Successfully joined Ray cluster"
fiif [ "$RANK" == "0" ]; thenbash ${PROJ_DIR}/scripts/rl_4nodes_dapo.sh 2>&1 | tee ${PROJ_DIR}/logs/rl_log_$(date +%Y%m%d_%H%M%S).txt &
elsesleep 30d
fiwait

實踐演示

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigmodel_name = "Tongyi-Zhiwen/QwenLong-L1-32B"quantization_config = BitsAndBytesConfig(load_in_4bit=True)# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto",quantization_config=quantization_config,
)# prepare the model input
template = """Please read the following text and answer the question below.<text>
$DOC$
</text>$Q$Format your response as follows: "Therefore, the answer is (insert answer here)"."""# 填充上下文和問題
context = """
Renewable energy sources are crucial for addressing climate change and reducing dependence on fossil fuels. Solar power is one of the most abundant and widely available renewable energy sources. It converts sunlight directly into electricity using photovoltaic (PV) cells or indirectly through concentrated solar power (CSP) systems.Wind energy is another rapidly growing renewable source. Wind turbines capture the kinetic energy of moving air and convert it into electrical energy. Onshore wind farms are more common, but offshore wind farms are becoming increasingly popular due to stronger and more consistent wind resources.Hydroelectric power is generated by harnessing the energy of flowing water in rivers or dams. It is one of the oldest and most established renewable energy technologies, providing a reliable and flexible source of electricity.Biomass energy uses organic materials such as wood, agricultural waste, and dedicated energy crops to produce heat, electricity, or biofuels. It is considered renewable because the carbon dioxide released during combustion is offset by the carbon dioxide absorbed during the growth of the biomass feedstock.Geothermal energy taps into the Earth's internal heat to generate electricity or provide direct heating and cooling. It is a reliable and consistent energy source, particularly in regions with high geothermal activity.The transition to renewable energy is driven by several factors, including environmental concerns, energy security, and technological advancements. However, challenges remain, such as the intermittency of solar and wind power, high initial costs, and the need for energy storage solutions.Despite these challenges, the global renewable energy market is expanding rapidly. Many countries have set ambitious renewable energy targets, and investments in renewable energy technologies continue to grow. The International Renewable Energy Agency (IRENA) projects that renewable energy could account for up to 86% of global electricity generation by 2050 if current trends continue.
"""question = """
What are the main types of renewable energy sources mentioned in the text, and what are their respective advantages?
"""# 構建提示
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completion
generated_ids = model.generate(**model_inputs,max_new_tokens=10000,temperature=0.7,top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try:# rindex finding 151649 (</think>)index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:index = 0thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")print("thinking content:", thinking_content)
print("content:", content)
輸出:
thinking content: Okay, let me try to figure this out. The question is asking for the main types of renewable energy sources mentioned in the text and their respective advantages. First, I need to go through the text again to identify each renewable energy source and note down their advantages.Starting with the first paragraph, it mentions solar power. The text says solar power uses photovoltaic cells or CSP systems. The advantage here is that it's abundant and widely available. Then, wind energy is next. The text talks about onshore and offshore wind farms. The advantage for wind is that offshore farms have stronger and more consistent winds, which I think makes them more reliable.Hydroelectric power is mentioned next. It's described as one of the oldest and most established, providing reliable and flexible electricity. So the advantage here is reliability and flexibility. Biomass energy uses organic materials and is considered renewable because the CO2 released is offset by the growth of the feedstock. The advantage here is that it's renewable in terms of carbon balance.Geothermal energy is next, using the Earth's internal heat. The advantage is that it's reliable and consistent, especially in areas with high geothermal activity. Wait, the question is about the main types and their advantages. Let me list them out:1. Solar power: Abundant and widely available.
2. Wind energy: Stronger and more consistent winds offshore.
3. Hydroelectric power: Reliable and flexible.
4. Biomass energy: Carbon neutrality (offset by growth).
5. Geothermal energy: Reliable and consistent.I think that's all the main types mentioned. The text also mentions challenges like intermittency for solar and wind, but the question is about advantages, so I should focus on the positive aspects each has. I need to make sure I didn't miss any. Let me check the text again.Yes, the text lists solar, wind, hydroelectric, biomass, and geothermal. Each has their specific advantages as I noted. So the answer should list each type with their respective advantages.
</think>
content: Therefore, the answer is Solar power (abundant and widely available), wind energy (stronger and consistent offshore winds), hydroelectric power (reliable and flexible), biomass energy (carbon neutrality through growth of feedstock), and geothermal energy (reliable and consistent internal heat).

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/diannao/84943.shtml
繁體地址,請注明出處:http://hk.pswp.cn/diannao/84943.shtml
英文地址,請注明出處:http://en.pswp.cn/diannao/84943.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

學習如何設計大規模系統,為系統設計面試做準備!

前言 在當今快速發展的技術時代&#xff0c;系統設計能力已成為衡量一名軟件工程師專業素養的重要標尺。隨著云計算、大數據、人工智能等領域的興起&#xff0c;構建高性能、可擴展且穩定的系統已成為企業成功的關鍵。然而&#xff0c;對于許多工程師而言&#xff0c;如何有效…

Python生成ppt(python-pptx)N問N答(如何繪制一個沒有背景的矩形框;如何繪制一個沒有背景的矩形框)

文章目錄 [toc]1. **如何安裝python-pptx庫&#xff1f;**2. **如何創建一個空白PPT文件&#xff1f;**3. **如何添加幻燈片并設置布局&#xff1f;**4. **如何添加文本內容&#xff1f;**5. **如何插入圖片&#xff1f;**6. **如何設置動畫和轉場效果&#xff1f;**9. **如何繪…

命令模式,觀察者模式,狀態模式,享元模式

什么是命令模式&#xff1f; 核心思想是將原本直接調用的方法封裝為對象&#xff08;如AttackCommand&#xff09;&#xff0c;對象包含??執行邏輯??和??上下文信息??&#xff08;如目標、參數&#xff09;。比如&#xff0c;玩家的按鍵操作被封裝成一個命令對象&#…

Window Server 2019--07 PKI、SSL網站與郵件安全

了解PKI、SSL技術的核心原理掌握PKI架構服務器配置掌握證書管理與應用 公鑰基礎設施&#xff08;Public Key Infrastructure&#xff0c;PKI&#xff09;是一個完整的頒發、吊銷、管理數字證書的系統&#xff0c;是支持認證、加密、完整性和可追究性服務的基礎設施。PKI通過第…

從C++編程入手設計模式2——工廠模式

從C編程入手設計模式 工廠模式 ? 我們馬上就要迎來我們的第二個創建型設計模式&#xff1a;工廠方法模式&#xff08;Factory Method Pattern&#xff09;。換而言之&#xff0c;我們希望使用一個這樣的接口&#xff0c;使用其他手段而不是直接創建的方式&#xff08;說的有…

MySQL、PostgreSQL、Oracle 區別詳解

MySQL、PostgreSQL、Oracle 區別詳解 一、基礎架構對比 1.1 數據庫類型 MySQL:關系型數據庫(支持NoSQL插件如MySQL Document Store)PostgreSQL:對象-關系型數據庫(支持JSON等半結構化數據)Oracle:多模型數據庫(關系型+文檔+圖+空間等)關鍵結論:PostgreSQL在數據類型…

window11系統 使用GO語言建立TDengine 連接

目錄 1、安裝GCC、TDengine-client 1、github下載mingw64 軟件包 2、解壓指定目錄、配置環境變量 3、檢驗gcc是否安裝成功 4、安裝TDengine-client 2、配置go環境變量 3、配置Goland 系統變量、重啟Goland&#xff08;該軟件自己也有系統變量&#xff0c;有時候會和win…

VR 賦能病毒分離鑒定:開啟微觀探索新視界

在大眾認知里&#xff0c;VR 技術往往與沉浸式游戲體驗、虛擬社交緊密相連&#xff0c;讓人仿佛置身于奇幻的虛擬世界中&#xff0c;感受著科技帶來的奇妙娛樂享受。而病毒分離鑒定&#xff0c;聽起來則是一個充滿專業性與嚴肅性的科學領域&#xff0c;它關乎病毒的研究、疾病的…

Azure Devops pipeline 技巧和最佳實踐

1. 如何顯示release pipeline ? 解決方法: 登錄devops, 找到organization - pipeline - setting下的Disable creation of classic release pipelines,禁用該選項。 然后在project - pipeline - setting,禁用Disable creation of classic release pipelines 現在可以看到r…

GPU的通信技術

GPU 之間直接通信主要采用了以下幾種技術1&#xff1a; GPUDirect P2P&#xff1a;NVIDIA 開發的技術&#xff0c;用于單機上的 GPU 間高速通信。在沒有該技術時&#xff0c;GPU 間數據交換需先通過 CPU 和 PCIe 總線復制到主機固定的共享內存&#xff0c;再復制到目標 GPU&…

重新測試deepseek Jakarta EE 10編程能力

聽說deepseek做了一個小更新&#xff0c;我重新測試了一下Jakarta EE 10編程能力&#xff1b;有點進步&#xff0c;遺漏的功能比以前少了。 采用Jakarta EE 10 編寫員工信息表維護表&#xff0c;包括員工查詢與搜索、員工列表、新增員工、刪除員工&#xff0c;修改員工&#xf…

?Windows 11 安裝 Miniconda 與 Jupyter 全流程指南?

?一、Miniconda 安裝與配置? 1. 下載安裝程序 ?訪問官網?&#xff1a;打開 Miniconda 官網&#xff0c;下載 ?Python 3.x 版本的 Windows 64 位安裝包?。?安裝路徑選擇?&#xff1a; 推薦路徑&#xff1a;D:\Miniconda3&#xff08;避免使用中文路徑和空格&#xff0…

RuoYi前后端分離框架集成手機短信驗證碼(一)之后端篇

一、背景 本項目基于RuoYi 3.8.9前后端分離框架構建,采用Spring Security實現系統權限管理。作為企業級應用架構的子模塊,系統需要與頂層項目實現用戶數據無縫對接(以手機號作為統一用戶標識),同時承擔用戶信息采集的重要職能。為此,我們在保留原有賬號密碼登錄方式的基…

Java ThreadLocal 應用指南:從用戶會話到數據庫連接的線程安全實踐

ThreadLocal 提供了一種線程局部變量&#xff08;thread-local variables&#xff09;的機制&#xff0c;這意味著每個訪問該變量的線程都會擁有其自己獨立的、初始化的變量副本。這確保了線程之間不會共享數據&#xff0c;也避免了因共享數據而可能產生的競爭條件和同步問題&a…

GitCode鏡像門法律分析:PL協議在中國的司法實踐

本文以2022年引發廣泛爭議的GitCode開源代碼鏡像事件為研究對象&#xff0c;系統分析公共許可證&#xff08;Public License&#xff0c;PL&#xff09;在中國法律體系下的適用性挑戰。通過研究中國法院近五年涉及GPL、Apache、MIT等主流協議的21個司法案例&#xff0c;揭示開源…

Rider崩潰問題終極解決指南

JetBrains Rider 2025.1.2 頻繁崩潰問題解決指南 問題描述&#xff1a; 編輯器頻繁自動崩潰&#xff0c;任務管理器顯示大量 Git for Windows 進程被啟動。 原因分析&#xff1a; 這是 Rider 的自動版本控制功能導致的。當檢測到代碼變更時&#xff0c;編輯器會不斷嘗試啟動 …

4 串電池保護芯片創芯微CM1341-DAT使用介紹

特性 專用于 4 串鋰/鐵/鈉電池的保護芯片&#xff0c;內置有高精度電壓檢測電路和電流檢測電路。通過檢測各節電池的電壓、充放電電流及溫度等信息&#xff0c;實現電池過充電、過放電、均衡、斷線、低壓禁充、放電過電流、短路、充電過電流和過溫保護等功能&#xff0c;放電過…

煤礦電液控制器-底座傾角傳感器4K型護套連接器ZE0703-09(100)

煤礦電液控制器作為井下自動化開采的核心設備&#xff0c;其可靠性直接關系到生產安全與效率。在眾多關鍵組件中&#xff0c;底座傾角傳感器4K型護套連接器ZE0703-09&#xff08;100&#xff09;憑借獨特設計成為保障系統穩定運行的"神經末梢"&#xff0c;其技術特性…

Vue計算屬性與監視

在Vue.js中&#xff0c;處理復雜的邏輯和數據依賴關系是構建高效、可維護的前端應用的關鍵。Vue提供了兩種強大的工具來幫助我們實現這一點&#xff1a;計算屬性&#xff08;Computed Properties&#xff09; 和 偵聽器&#xff08;Watchers&#xff09;。本文將深入探討這兩者…

基于RT-Thread的STM32F4開發第七講——RTC(硬件、軟件)

提示&#xff1a;文章寫完后&#xff0c;目錄可以自動生成&#xff0c;如何生成可參考右邊的幫助文檔 文章目錄 前言一、RT-Thread工程創建1.硬件RTC配置2.軟件RTC配置3.RTC鬧鐘配置 總結 前言 本章是基于RT-Thread studio實現RTC硬件和軟件下的日歷時鐘功能&#xff0c;開發板…