🎉 動態
2025年5月26日: 🔥 我們正式發布🤗QwenLong-L1-32B——首個采用強化學習訓練、專攻長文本推理的LRM模型。在七項長文本文檔問答基準測試中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗艦LRM,達到與Claude-3.7-Sonnet-Thinking持平的水準,展現了當前最先進長文本推理模型的領先實力。
2025年5月26日: 🔥 我們同步開源🤗DocQA-RL-1.6K專項強化學習數據集,包含1,600道涵蓋數學演算、邏輯推理和多跳推理等領域的文檔問答題目。
📚 簡介
在本研究中,我們提出了QwenLong-L1,這是一種新穎的強化學習(RL)框架,旨在促進LRM從短上下文熟練度向穩健的長上下文泛化過渡。在我們的初步實驗中,我們展示了短上下文和長上下文推理RL訓練動態之間的差異。
我們的框架通過強化學習訓練中的漸進式上下文擴展,增強了短上下文語言推理模型(LRM)的性能。該框架包含三個核心組件:用于初始化穩健策略的預熱監督微調(SFT)階段;通過課程引導的強化學習階段實現從短上下文到長上下文的穩定適應;以及難度感知的回溯采樣機制,通過動態調整各階段訓練復雜度來激勵策略探索。我們整合了包括GRPO和DAPO在內的最新強化學習算法,結合基于規則和基于模型的二元結果獎勵混合函數,以平衡精確率與召回率。在策略優化過程中,通過戰略性利用群體相對優勢,引導LRM學習對實現穩健長上下文錨定和卓越推理能力至關重要的有效推理模式。
🎯 模型發布
我們發布了🤗 QwenLong-L1-32B,這是首個通過強化學習訓練、專為長文本推理設計的長上下文語言推理模型。在七項長文本文檔問答基準測試中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗艦語言推理模型,達到與Claude-3.7-Sonnet-Thinking相當的水準,展現出當前最先進語言推理模型中的領先性能。
以下是評估結果。
🛠? 要求
# Create the conda environment
conda create -n qwenlongl1 python==3.10
conda activate qwenlongl1# Install requirements
pip3 install -r requirements.txt# Install verl
cd verl
pip3 install -e .# Install vLLM
pip3 install vllm==0.7.3 # Install flash-attn
pip3 install flash-attn --no-build-isolation
🚀 快速入門
以下是如何使用 🤗 Transformers 運行該模型:
from transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "Tongyi-Zhiwen/QwenLong-L1-32B"# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
)# prepare the model input
template = """Please read the following text and answer the question below.<text>
$DOC$
</text>$Q$Format your response as follows: "Therefore, the answer is (insert answer here)"."""
context = "<YOUR_CONTEXT_HERE>"
question = "<YOUR_QUESTION_HERE>"
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completion
generated_ids = model.generate(**model_inputs,max_new_tokens=10000,temperature=0.7,top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try:# rindex finding 151649 (</think>)index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:index = 0thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")print("thinking content:", thinking_content)
print("content:", content)
🗂? 數據集
為了構建一個具有挑戰性的可驗證長文本推理強化學習數據集,我們開發了🤗DocQA-RL-1.6K,該數據集包含跨三個推理領域的1600個文檔問答問題:
(1) 數學推理:我們使用DocMath數據集中的600道問題,這些問題要求對財務報告等長專業文檔進行數值推理。對于DocMath數據集,我們從其驗證集中每個子集抽取75%條目用于訓練,25%用于評估;
(2) 邏輯推理:我們采用DeepSeek-R1合成了600道多選題,這些問題需要對我們精選的法律、金融、保險和生產領域真實文檔進行邏輯分析;
(3) 多跳推理:我們從MultiHopRAG選取200個樣本,從Musique選取200個樣本,重點關注跨文檔推理。
請下載以下數據集并放入./datasets/
目錄用于訓練和評估。
強化學習訓練數據:🤗DocQA-RL-1.6K
評估數據:🤗docmath、🤗frames、🤗longbench
💻 訓練
我們為單階段強化學習訓練提供了基于DAPO的基礎演示代碼。
首先,我們應該啟動一個本地驗證器。
export CUDA_VISIBLE_DEVICES=0vllm serve "Qwen/Qwen2.5-1.5B-Instruct" \--host 0.0.0.0 \--port 23547
然后,我們開始使用4個節點進行強化學習訓練。
export PROJ_DIR="<YOUR_PROJ_DIR_HERE>"
export MASTER_IP="<YOUR_MASTER_IP_HERE>" # ray master ip
export NNODES=4 # total GPU nodes
export NODE_RANK=${RANK} # rank of current node
export PORT=6382
export WANDB_API_KEY="<YOUR_WANDB_API_KEY_HERE>"
export WANDB_PROJECT="QwenLong-L1"
export LLM_JUDGE=Y # 'Y': LLM JUDGE, 'N': RULE BASED
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
# verifier
export VERIFIER_PATH="Qwen/Qwen2.5-1.5B-Instruct"
export VERIFIER_HOST="<YOUR_VERIFIER_HOST_HERE>"
export VERIFIER_PORT="23547"ray_start_retry() {while true; doray start --address="${MASTER_IP}:${PORT}"if [ $? -eq 0 ]; thenbreakfiecho "Failed to connect to master, retrying in 5 seconds..."sleep 5done
}check_ray_status() {until ray status >/dev/null 2>&1; doecho "Waiting for Ray cluster to be ready..."sleep 5done
}if [ "$RANK" == "0" ]; thenecho "Starting HEAD node..."ray start --head --port=${PORT}check_ray_statusecho "Ray head node started successfully"elseecho "Starting WORKER node..."ray_start_retrycheck_ray_statusecho "Successfully joined Ray cluster"
fiif [ "$RANK" == "0" ]; thenbash ${PROJ_DIR}/scripts/rl_4nodes_dapo.sh 2>&1 | tee ${PROJ_DIR}/logs/rl_log_$(date +%Y%m%d_%H%M%S).txt &
elsesleep 30d
fiwait
實踐演示
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigmodel_name = "Tongyi-Zhiwen/QwenLong-L1-32B"quantization_config = BitsAndBytesConfig(load_in_4bit=True)# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto",quantization_config=quantization_config,
)# prepare the model input
template = """Please read the following text and answer the question below.<text>
$DOC$
</text>$Q$Format your response as follows: "Therefore, the answer is (insert answer here)"."""# 填充上下文和問題
context = """
Renewable energy sources are crucial for addressing climate change and reducing dependence on fossil fuels. Solar power is one of the most abundant and widely available renewable energy sources. It converts sunlight directly into electricity using photovoltaic (PV) cells or indirectly through concentrated solar power (CSP) systems.Wind energy is another rapidly growing renewable source. Wind turbines capture the kinetic energy of moving air and convert it into electrical energy. Onshore wind farms are more common, but offshore wind farms are becoming increasingly popular due to stronger and more consistent wind resources.Hydroelectric power is generated by harnessing the energy of flowing water in rivers or dams. It is one of the oldest and most established renewable energy technologies, providing a reliable and flexible source of electricity.Biomass energy uses organic materials such as wood, agricultural waste, and dedicated energy crops to produce heat, electricity, or biofuels. It is considered renewable because the carbon dioxide released during combustion is offset by the carbon dioxide absorbed during the growth of the biomass feedstock.Geothermal energy taps into the Earth's internal heat to generate electricity or provide direct heating and cooling. It is a reliable and consistent energy source, particularly in regions with high geothermal activity.The transition to renewable energy is driven by several factors, including environmental concerns, energy security, and technological advancements. However, challenges remain, such as the intermittency of solar and wind power, high initial costs, and the need for energy storage solutions.Despite these challenges, the global renewable energy market is expanding rapidly. Many countries have set ambitious renewable energy targets, and investments in renewable energy technologies continue to grow. The International Renewable Energy Agency (IRENA) projects that renewable energy could account for up to 86% of global electricity generation by 2050 if current trends continue.
"""question = """
What are the main types of renewable energy sources mentioned in the text, and what are their respective advantages?
"""# 構建提示
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completion
generated_ids = model.generate(**model_inputs,max_new_tokens=10000,temperature=0.7,top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try:# rindex finding 151649 (</think>)index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:index = 0thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")print("thinking content:", thinking_content)
print("content:", content)
輸出:
thinking content: Okay, let me try to figure this out. The question is asking for the main types of renewable energy sources mentioned in the text and their respective advantages. First, I need to go through the text again to identify each renewable energy source and note down their advantages.Starting with the first paragraph, it mentions solar power. The text says solar power uses photovoltaic cells or CSP systems. The advantage here is that it's abundant and widely available. Then, wind energy is next. The text talks about onshore and offshore wind farms. The advantage for wind is that offshore farms have stronger and more consistent winds, which I think makes them more reliable.Hydroelectric power is mentioned next. It's described as one of the oldest and most established, providing reliable and flexible electricity. So the advantage here is reliability and flexibility. Biomass energy uses organic materials and is considered renewable because the CO2 released is offset by the growth of the feedstock. The advantage here is that it's renewable in terms of carbon balance.Geothermal energy is next, using the Earth's internal heat. The advantage is that it's reliable and consistent, especially in areas with high geothermal activity. Wait, the question is about the main types and their advantages. Let me list them out:1. Solar power: Abundant and widely available.
2. Wind energy: Stronger and more consistent winds offshore.
3. Hydroelectric power: Reliable and flexible.
4. Biomass energy: Carbon neutrality (offset by growth).
5. Geothermal energy: Reliable and consistent.I think that's all the main types mentioned. The text also mentions challenges like intermittency for solar and wind, but the question is about advantages, so I should focus on the positive aspects each has. I need to make sure I didn't miss any. Let me check the text again.Yes, the text lists solar, wind, hydroelectric, biomass, and geothermal. Each has their specific advantages as I noted. So the answer should list each type with their respective advantages.
</think>
content: Therefore, the answer is Solar power (abundant and widely available), wind energy (stronger and consistent offshore winds), hydroelectric power (reliable and flexible), biomass energy (carbon neutrality through growth of feedstock), and geothermal energy (reliable and consistent internal heat).