Mistral AI開源 Magistral-Small-2507

宣布Magistral——Mistral AI推出的首款推理模型，專精于垂直領域、具備透明化特性與多語言推理能力。

最優秀的人類思維并非線性——它穿梭于邏輯、洞見、不確定性與發現之間。推理型語言模型讓我們得以將復雜思考和深度理解交由AI增強或代勞，提升了人類處理需要精確逐步推敲分析問題的能力。

但這一領域仍處于萌芽階段。早期思維模型存在諸多已知局限：缺乏針對垂直領域問題的專業深度、透明度不足、以及目標語言環境下的推理不連貫等。

今天我們激動地宣布通過Magistral模型為AI研究做出最新貢獻——這是我們首個推理模型。Magistral同步推出開源版與企業版，其設計理念是：以人類熟悉的思維方式進行深度推理論證，同時兼具跨專業領域的知識儲備、可追蹤驗證的透明推理過程，以及深度的多語言適應能力。

亮點

在這里插入圖片描述

Magistral Small 1.1

基于Mistral Small 3.1（2503版本）開發，額外增強推理能力，通過Magistral Medium軌跡進行監督微調并疊加強化學習，最終形成這款高效的小型推理模型，參數量達240億。

Magistral Small支持本地部署，經量化后可適配單張RTX 4090顯卡或32GB內存的MacBook設備運行。

Magistral Small 1.1 版本應提供與基準測試結果中 Magistral Small 1.0 相近的性能表現。

本次更新包含以下特性：

更優化的語氣與模型行為表現。您將體驗到更出色的 LaTeX 和 Markdown 格式處理能力，以及對簡單通用提示的更簡潔回答。
模型陷入無限生成循環的概率顯著降低。
新增 [THINK] 和 [/THINK] 特殊標記用于封裝推理內容。該設計既便于解析思維軌跡，也能有效避免提示中出現’[THINK]'字符串時引發混淆。
推理提示詞現已整合至系統提示模板中。

主要特點

推理能力：能夠在給出答案前進行長鏈式推理追蹤。
多語言支持：支持數十種語言，包括英語、法語、德語、希臘語、印地語、印尼語、意大利語、日語、韓語、馬來語、尼泊爾語、波蘭語、葡萄牙語、羅馬尼亞語、俄語、塞爾維亞語、西班牙語、土耳其語、烏克蘭語、越南語、阿拉伯語、孟加拉語、中文及波斯語。
Apache 2.0許可證：開放許可協議，允許商業及非商業用途的修改和使用。
上下文窗口：128k上下文窗口，但超過40k后性能可能下降。因此建議將模型最大長度設置為40k。

基準測試結果

Model	AIME24 pass@1	AIME25 pass@1	GPQA Diamond	Livecodebench (v5)
Magistral Medium 1.1	72.03%	60.99%	71.46%	59.35%
Magistral Medium 1.0	73.59%	64.95%	70.83%	59.36%
Magistral Small 1.1	70.52%	62.03%	65.78%	59.17%
Magistral Small 1.0	70.68%	62.76%	68.18%	55.84%

采樣參數

請確保使用：

top_p: 0.95
temperature: 0.7
max_tokens: 40960

基礎聊天模板

為獲得最佳效果，我們強烈建議包含以下系統提示詞（可根據具體使用場景進行編輯調整）：

請先通過思維過程（內心獨白）構思，直至形成最終回復。使用Markdown格式撰寫回復，數學公式請用LaTeX表示。思維過程和回復內容需與輸入語言保持一致。

思維過程必須遵循以下模板：[THINK]您的思考或草稿內容，如同在草稿紙上演算習題。可隨意采用非正式表達并充分展開，直至有把握生成最終回復。語言需與輸入保持一致。[/THINK]此處提供完整的最終回復內容。

[THINK]和[/THINK]是必須保持原樣的特殊標記符。

請務必以mistral-common作為權威參考。下方提供支持mistral-common的庫示例。

根據使用場景和需求，您可選擇在多輪對話中保留推理痕跡，或僅保留助手最終回復內容。

使用方法

該模型可與以下框架配合使用：

推理

vllm（推薦）: 參見下文
transformers: 參見下文

此外，社區還準備了量化版本的模型，可與以下框架配合使用（按字母順序排序）：

llama.cpp: https://huggingface.co/mistralai/Magistral-Small-2507-GGUF
lmstudio (llama.cpp, MLX): GGUF, MLX-bf16, MLX-8bit, MLX-6bit, MLX-4bit

訓練

支持通過以下工具進行微調（按字母順序排列）：

axolotl: https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/magistral
unsloth: https://docs.unsloth.ai/basics/magistral

vLLM（推薦）

我們建議配合使用vLLM庫部署生產級推理流水線。

安裝指南

請確保安裝最新版vLLM代碼庫：

pip install -U vllm \--pre \--extra-index-url https://wheels.vllm.ai/nightly

該操作會自動安裝mistral_common >= 1.8.2版本。

版本驗證命令：

python -c "import mistral_common; print(mistral_common.__version__)"

您也可以直接使用現成的 Docker 鏡像或從 Docker Hub 獲取。

按如下方式啟動模型服務：

vllm serve mistralai/Magistral-Small-2507 --reasoning-parser mistral --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2

按如下方式測試模型連通性：

from typing import Any
from openai import OpenAI
from huggingface_hub import hf_hub_download# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"TEMP = 0.7
TOP_P = 0.95
MAX_TOK = 40_960client = OpenAI(api_key=openai_api_key,base_url=openai_api_base,
)models = client.models.list()
model = models.data[0].iddef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:file_path = hf_hub_download(repo_id=repo_id, filename=filename)with open(file_path, "r") as file:system_prompt = file.read()index_begin_think = system_prompt.find("[THINK]")index_end_think = system_prompt.find("[/THINK]")return {"role": "system","content": [{"type": "text", "text": system_prompt[:index_begin_think]},{"type": "thinking","thinking": system_prompt[index_begin_think + len("[THINK]") : index_end_think],"closed": True,},{"type": "text","text": system_prompt[index_end_think + len("[/THINK]") :],},],}SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")query = "Write 4 sentences, each with at least 8 words. Now make absolutely sure that every sentence has exactly one word less than the previous sentence."
# or try out other queries
# query = "Exactly how many days ago did the French Revolution start? Today is June 4th, 2025."
# query = "Think about 5 random numbers. Verify if you can combine them with addition, multiplication, subtraction or division to 133"
# query = "If it takes 30 minutes to dry 12 T-shirts in the sun, how long does it take to dry 33 T-shirts?"messages = [SYSTEM_PROMPT,{"role": "user", "content": query}
]
stream = client.chat.completions.create(model=model,messages=messages,stream=True,temperature=TEMP,top_p=TOP_P,max_tokens=MAX_TOK,
)print("client: Start streaming chat completions...:\n")
printed_reasoning_content = False
answer = []for chunk in stream:reasoning_content = Nonecontent = None# Check the content is reasoning_content or contentif hasattr(chunk.choices[0].delta, "reasoning_content"):reasoning_content = chunk.choices[0].delta.reasoning_contentelif hasattr(chunk.choices[0].delta, "content"):content = chunk.choices[0].delta.contentif reasoning_content is not None:if not printed_reasoning_content:printed_reasoning_content = Trueprint("Start reasoning:\n", end="", flush=True)print(reasoning_content, end="", flush=True)elif content is not None:# Extract and print the contentif not reasoning_content and printed_reasoning_content:answer.extend(content)print(content, end="", flush=True)if answer:print("\n\n=============\nAnswer\n=============\n")print("".join(answer))
else:print("\n\n=============\nNo Answer\n=============\n")print("No answer was generated by the model, probably because the maximum number of tokens was reached.")# client: Start streaming chat completions...:
#
# Start reasoning:
# First, I need to write ...
# ...
#
#
# =============
# Answer
# =============
# 
# Here are four sentences where each has at least 8 words, and each subsequent sentence has exactly one word less than the previous one:# 1. The quick brown fox jumps over the lazy dog and rests.
# 2. The lazy dog rests under the big shady tree peacefully.
# 3. The big shady tree provides ample shade during summer.
# 4. The tree's leaves are very lush and green.

Transformers

請確保安裝最新版本的Transformers代碼：

pip install git+https://github.com/huggingface/transformers

同時請確保安裝 mistral_common >= 1.8.2：

pip install --upgrade mistral-common

檢查

python -c "import mistral_common; print(mistral_common.__version__)"

現在你可以在Magistral中使用Transformers了：

from typing import Any
import torchfrom huggingface_hub import hf_hub_download
from transformers import AutoModelForCausalLM, AutoTokenizerTEMP = 0.7
TOP_P = 0.95
MAX_TOK = 40_960def load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:file_path = hf_hub_download(repo_id=repo_id, filename=filename)with open(file_path, "r") as file:system_prompt = file.read()index_begin_think = system_prompt.find("[THINK]")index_end_think = system_prompt.find("[/THINK]")return {"role": "system","content": [{"type": "text", "text": system_prompt[:index_begin_think]},{"type": "thinking","thinking": system_prompt[index_begin_think + len("[THINK]") : index_end_think],"closed": True,},{"type": "text","text": system_prompt[index_end_think + len("[/THINK]") :],},],}model_id = "mistralai/Magistral-Small-2507"
SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
query = "Think about 5 random numbers. Verify if you can combine them with addition, multiplication, subtraction or division to 133."
# or try out other queries
# query = "Exactly how many days ago did the French Revolution start? Today is June 4th, 2025."
# query = "Write 4 sentences, each with at least 8 words. Now make absolutely sure that every sentence has exactly one word less than the previous sentence."
# query = "If it takes 30 minutes to dry 12 T-shirts in the sun, how long does it take to dry 33 T-shirts?"tokenizer = AutoTokenizer.from_pretrained(model_id, tokenizer_type="mistral", use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto"
)input_ids = tokenizer.apply_chat_template([SYSTEM_PROMPT,{"role": "user", "content": query},],
)output = model.generate(input_ids=torch.tensor([input_ids], device=model.device),pad_token_id=tokenizer.pad_token_id,eos_token_id=tokenizer.eos_token_id,temperature=TEMP,top_p=TOP_P,do_sample=True,max_new_tokens=MAX_TOK,
)[0]decoded_output = tokenizer.decode(output[len(input_ids) :])
print(decoded_output)# [THINK]Alright, I need to think of 5 random numbers first. Let's say I pick the numbers 5, 10, 2, 7, and 3.
# 
# Now, I need to see if I can combine these numbers using addition, multiplication, subtraction, or division to get 133.
# ...
# ...
# ...
# But if we're to find any five numbers that can be combined to make 133, then yes, such sets exist, like the one demonstrated above.[/THINK]Yes, it is possible to combine some sets of five random numbers to make 133 using basic arithmetic operations. For example, the numbers 13, 10, 1, 2, and 3 can be combined as follows to make 133:
# 
# \[ (13 \times 10) + (3 \times (2 - 1)) = 130 + 3 = 133 \]
# 
# However, not all sets of five random numbers can be combined in this way to make 133. For instance, with the numbers 5, 10, 2, 7, and 3, it is not possible to combine them using the allowed operations to get exactly 133.
# 
# Therefore, the ability to combine five random numbers to make 133 depends on the specific numbers chosen.
# 
# $133 = (13 \times 10) + (3 \times (2 - 1))$</s>