一個超強的推理增強大模型,開源了,本地部署

大家好，我是 Ai 學習的老章

前幾天介紹了MOE 模型先驅 Mistral 開源的代碼 Agent 大模型——mistralai/Devstral-Small-2505

今天一起看看 Mistral 最新開源的推理大模型——Magistral

Magistral 簡介

Mistral 公司推出了首個推理模型 Magistral 及自研可擴展強化學習 (RL) 流程。團隊采用自下而上的方法，完全基于自有模型和基礎設施構建，不依賴現有實現或其他模型的 RL 軌跡。

Magistral 強化編碼與開發用例：相比非推理模型，它通過涉及外部工具或 API 的序列化多步驟操作，顯著提升項目規劃、后端架構、前端設計和數據工程能力。

Mistral 的技術棧探索了純 RL 訓練大語言模型的極限，開發出強制模型使用特定推理語言的方法，并證實僅用文本數據的強化學習能保持初始模型大部分能力。這種方法還能維持或提升多模態、指令遵循和函數調用能力。

1. 純強化學習訓練：從頭開始通過強化學習（RL only）訓練的 Mistral Small 24B 2.推理軌跡微調：基于 Magistral Medium 生成的推理軌跡微調的 Mistral Small 24B,3.最終版 Magistral Small：在 Magistral Medium 軌跡微調基礎上進一步強化學習優化的 Mistral Small 24B

設計理念是像人類一樣縝密思考，同時具備跨專業領域的知識儲備、可追蹤驗證的透明推理流程，以及深度的多語言適應能力。

Magistral 特性

與通用模型不同，Magistral 針對多步邏輯進行了微調，提高了可解釋性，并以用戶語言提供可追溯的思維過程。
Magistral 基于 Mistral Small 3.1（2503）構建，?增強了推理能力
Magistral 提供兩種版本：Magistral Small（240 億參數開源版）,Magistral Medium（企業版）
Magistral Small 融合了來自 Magistral Medium 的冷啟動數據
Magistral Small 參數量 24B, 可本地部署，量化后能適配單張 RTX 4090 顯卡或 32GB 內存的 MacBook
Magistral 上下文窗口 128k ，?但超過?40k?后性能可能下降,官方建議將模型最大長度設置為 40k

Magistral 測評數據

Magistral Medium 只用 24B 參數秒殺 DeepSeek-V3，某些領域 (GPQA Diamond) 可以和 DeepSeek-R1 掰手腕，不過應該是舊版 R1，如果跟 R1-0528 比，那還是差這檔次呢

Mistral 也雞賊，拿去刷榜的是企業版 (Medium),開源版數據就沒那么全了

注:GPQA Diamond 是 GPQA 數據集的子集。GPQA 數據集包含 448 道由生物學、物理學和化學領域專家編寫的高質量選擇題，而 Diamond 子集是其中質量最高的部分，包含 198 條結果，其選取的是兩個專家均答對且至少 2/3 非專家答錯的問題，這些問題具有很高的難度。

Model	AIME24 pass@1	AIME25 pass@1	GPQA Diamond	Livecodebench (v5)
Magistral Medium 模型	73.59%	64.95%	70.83%	59.36%
Magistral Small 模型	70.68%	62.76%	68.18%	55.84%

Medium 比 Small 強了 2 個百分點的樣子

另：看論文中，Magistral 對中文相對沒那么友好，畢竟法國公司。不過拿去寫代碼應該問題不大，Livecodebench (v5) 上強于 V3 一大截

Magistral Small 部署

截至發文 modelscope.com 尚未更新模型文件，網絡不佳的同學可以坐等一下:https://www.modelscope.cn/models/mistralai/

網絡暢通就去huggingface:https://huggingface.co/mistralai/Magistral-Small-2506

模型文件 50GB，感覺至少需要 4 張 4090 才能啟動

啟動模型：

# 需要升級到最新版:
pip install -U vllm --extra-index-url [https://wheels.vllm.ai/0.9.1rc1](https://t.co/kuf2vI0hva "https://wheels.vllm.ai/0.9.1rc1") --torch-backend=auto
vllm serve mistralai/Magistral-Small-2506 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2

量化版對顯卡的要求至少可以打個對折起步

比如Ollama上量化后模型文件只有14GB

Magistral 量化版匯總：

llama.cpp：https://huggingface.co/mistralai/Magistral-Small-2506_gguf
lmstudio（llama.cpp, MLX）：https://lmstudio.ai/models/mistralai/magistral-small
ollama?(llama.cpp):?https://ollama.com/library/magistral
unsloth?(llama.cpp):?https://huggingface.co/unsloth/Magistral-Small-2506-GGUF

Magistral 使用

官方有該模型的最佳參數：

top_p: 0.95
temperature: 0.7
max_tokens: 40960

我在論文中還看到了史上最簡潔的系統提示詞

A user will ask you to solve a task. You should first draft your thinking process (inner
monologue) until you have derived the final answer. Afterwards, write a self-contained
summary of your thoughts (i.e. your summary should be succinct but contain all the critical
steps you needed to reach the conclusion). You should use Markdown and Latex to format
your response. Write both your thoughts and summary in the same language as the task
posed by the user.
Your thinking process must follow the template below:
<think>
Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual
and as long as you want until you are confident to generate a correct answer.
</think>
Here, provide a concise summary that reflects your reasoning and presents a clear final
answer to the user.
Problem:
{problem}

雖然簡介，但是也包括了一個系統提示詞的所有結構：

雙階段思考：
- 第一階段：要求模型在Thought Process標簽內進行詳細的思考過程（內部獨白）
- 第二階段：在標簽外提供簡潔但完整的總結和最終答案
思考可見化：
- 這種設計讓用戶能夠看到模型的"思考過程"，增加透明度
- 類似于"思考鏈"(Chain-of-Thought) 提示技術，但更加結構化
格式要求：
- 要求使用 Markdown 和 LaTeX 進行格式化，適合數學和科學問題的展示
- 強調結構化輸出，使回答更加清晰易讀
語言適應：
- 要求模型使用與用戶提問相同的語言回答，增強用戶體驗
問題占位符：{problem}是一個占位符，將被實際問題替換

最后就是官方建議的聊天模板：

<s>[SYSTEM_PROMPT]system_promptA user will ask you to solve a task. You should first draft your thinking process (inner monologue) until you have derived the final answer. Afterwards, write a self-contained summary of your thoughts (i.e. your summary should be succinct but contain all the critical steps you needed to reach the conclusion). You should use Markdown to format your response. Write both your thoughts and summary in the same language as the task posed by the user. NEVER use \boxed{} in your response.Your thinking process must follow the template below:
<think>
Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate a correct answer.
</think>Here, provide a concise summary that reflects your reasoning and presents a clear final answer to the user. Don't mention that this is a summary.Problem:[/SYSTEM_PROMPT][INST]user_message[/INST]<think>
reasoning_traces
</think>
assistant_response</s>[INST]user_message[/INST]

其他資源

試用:https://chat.mistral.ai/chat
論文:https://mistral.ai/static/research/magistral.pdf
API:http://console.mistral.ai/

制作不易，如果這篇文章覺得對你有用，可否點個關注。給我個三連擊：點贊、轉發和在看。若可以再給我加個🌟，謝謝你看我的文章，我們下篇再見！

搭建完美的寫作環境：工具篇（12 章）
圖解機器學習 - 中文版（72 張 PNG）
ChatGPT、大模型系列研究報告（50 個 PDF）
108 頁 PDF 小冊子：搭建機器學習開發環境及 Python 基礎?
116 頁 PDF 小冊子：機器學習中的概率論、統計學、線性代數?
史上最全！371 張速查表，涵蓋 AI、ChatGPT、Python、R、深度學習、機器學習等

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/84511.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/84511.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/84511.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！