開源界迎來重磅核彈！月之暗面開源了自家最新模型 K2

1. 模型簡介

Kimi K2 是一款尖端專家混合（MoE）語言模型，激活參數量達320億，總參數量突破1萬億。該模型采用Muon優化器訓練，在前沿知識、推理和編程任務中展現出卓越性能，同時針對智能體能力進行了精細化優化。

核心特性

超大規模訓練：基于15.5萬億token預訓練1萬億參數MoE模型，全程保持訓練穩定性
MuonClip優化器：將Muon優化器應用于前所未有的規模，開發新型優化技術解決擴展過程中的穩定性問題
智能體能力：專為工具調用、邏輯推理和自主問題解決設計

模型變體

Kimi-K2-Base：基礎模型，為希望完全掌控微調和定制解決方案的研究者與開發者提供堅實的起點。
Kimi-K2-Instruct：經過后訓練的模型，最適合即插即用的通用聊天及代理體驗。它屬于無需長思考的反射級模型。

2. 模型概述


架構	專家混合模型 (MoE)
總參數量	1萬億
激活參數量	320億
層數 (含全連接層)	61
全連接層數量	1
注意力隱藏層維度	7168
MoE隱藏層維度 (單專家)	2048
注意力頭數量	64
專家總數	384
單token選用專家數	8
共享專家數量	1
詞表大小	16萬
上下文長度	12萬8千
注意力機制	多層注意力
激活函數	SwiGLU

3. 評估結果

指令模型評估結果

Benchmark	Metric	^{Kimi K2 Instruct}	^{DeepSeek-V3-0324}	^{Qwen3-235B-A22B ^{(non-thinking)}}	^{Claude Sonnet 4 ^{(w/o extended thinking)}}	^{Claude Opus 4 ^{(w/o extended thinking)}}	^GPT-4.1	^{Gemini 2.5 Flash Preview (05-20)}
Coding Tasks
LiveCodeBench v6 ^{(Aug 24 - May 25)}	Pass@1	53.7	46.9	37.0	48.5	47.4	44.7	44.7
OJBench	Pass@1	27.1	24.0	11.3	15.3	19.6	19.5	19.5
MultiPL-E	Pass@1	85.7	83.1	78.2	88.6	89.6	86.7	85.6
SWE-bench Verified ^{(Agentless Coding)}	Single Patch	51.8	36.6	39.4	50.2	53.0	40.8	32.6
SWE-bench Verified ^{(Agentic Coding)}	Single Attempt (Acc)	65.8	38.8	34.4	72.7^*	72.5^*	54.6	—
SWE-bench Verified ^{(Agentic Coding)}	Multiple Attempts (Acc)	71.6	—	—	80.2	79.4^*	—	—
SWE-bench Multilingual ^{(Agentic Coding)}	Single Attempt (Acc)	47.3	25.8	20.9	51.0	—	31.5	—
TerminalBench	Inhouse Framework (Acc)	30.0	—	—	35.5	43.2	8.3	—
TerminalBench	Acc	25.0	16.3	6.6	—	—	30.3	16.8
Aider-Polyglot	Acc	60.0	55.1	61.8	56.4	70.7	52.4	44.0
Tool Use Tasks
Tau2 retail	Avg@4	70.6	69.1	57.0	75.0	81.8	74.8	64.3
Tau2 airline	Avg@4	56.5	39.0	26.5	55.5	60.0	54.5	42.5
Tau2 telecom	Avg@4	65.8	32.5	22.1	45.2	57.0	38.6	16.9
AceBench	Acc	76.5	72.7	70.5	76.2	75.6	80.1	74.5
Math & STEM Tasks
AIME 2024	Avg@64	69.6	59.4^*	40.1^*	43.4	48.2	46.5	61.3
AIME 2025	Avg@64	49.5	46.7	24.7^*	33.1^*	33.9^*	37.0	46.6
MATH-500	Acc	97.4	94.0^*	91.2^*	94.0	94.4	92.4	95.4
HMMT 2025	Avg@32	38.8	27.5	11.9	15.9	15.9	19.4	34.7
CNMO 2024	Avg@16	74.3	74.7	48.6	60.4	57.6	56.6	75.0
PolyMath-en	Avg@4	65.1	59.5	51.9	52.8	49.8	54.0	49.9
ZebraLogic	Acc	89.0	84.0	37.7^*	73.7	59.3	58.5	57.9
AutoLogi	Acc	89.5	88.9	83.3	89.8	86.1	88.2	84.1
GPQA-Diamond	Avg@8	75.1	68.4^*	62.9^*	70.0^*	74.9^*	66.3	68.2
SuperGPQA	Acc	57.2	53.7	50.2	55.7	56.5	50.8	49.6
Humanity's Last Exam ^{(Text Only)}	-	4.7	5.2	5.7	5.8	7.1	3.7	5.6
General Tasks
MMLU	EM	89.5	89.4	87.0	91.5	92.9	90.4	90.1
MMLU-Redux	EM	92.7	90.5	89.2	93.6	94.2	92.4	90.6
MMLU-Pro	EM	81.1	81.2^*	77.3	83.7	86.6	81.8	79.4
IFEval	Prompt Strict	89.8	81.1	83.2^*	87.6	87.4	88.0	84.3
Multi-Challenge	Acc	54.1	31.4	34.0	46.8	49.0	36.4	39.5
SimpleQA	Correct	31.0	27.7	13.2	15.9	22.8	42.3	23.3
Livebench	Pass@1	76.4	72.4	67.6	74.8	74.6	69.8	67.8

^{? 加粗表示全球最佳，下劃線表示開源最佳。}
^{? 標記有 * 的數據點直接取自模型的技術報告或博客。}
^{? 除SWE-bench Verified (Agentless)外，所有指標均在8k輸出標記長度下進行評估。SWE-bench Verified (Agentless)則限制在16k輸出標記長度。}
^{? Kimi K2在SWE-bench Verified測試中的單次嘗試補丁（無需測試時計算）通過率達到了65.8%（使用bash/編輯器工具）。在相同條件下，其在SWE-bench Multilingual測試中的單次通過率為47.3%。此外，我們報告了利用并行測試時計算的SWE-bench Verified測試結果（71.6%），即通過采樣多個序列并通過內部評分模型選擇最佳方案。}
^{?為確保評估的穩定性，我們在AIME、HMMT、CNMO、PolyMath-en、GPQA-Diamond、EvalPlus、Tau2上采用了avg@k方法。}
^{? 由于評估成本過高，部分數據點已被省略。}

基礎模型評估結果

Benchmark	Metric	Shot	Kimi K2 Base	Deepseek-V3-Base	Qwen2.5-72B	Llama 4 Maverick
General Tasks
MMLU	EM	5-shot	87.8	87.1	86.1	84.9
MMLU-pro	EM	5-shot	69.2	60.6	62.8	63.5
MMLU-redux-2.0	EM	5-shot	90.2	89.5	87.8	88.2
SimpleQA	Correct	5-shot	35.3	26.5	10.3	23.7
TriviaQA	EM	5-shot	85.1	84.1	76.0	79.3
GPQA-Diamond	Avg@8	5-shot	48.1	50.5	40.8	49.4
SuperGPQA	EM	5-shot	44.7	39.2	34.2	38.8
Code Tasks
LiveCodeBench v6	Pass@1	1-shot	26.3	22.9	21.1	25.1
EvalPlus	Pass@1	-	80.3	65.6	66.0	65.5
Mathematics Tasks
MATH	EM	4-shot	70.2	60.1	61.0	63.0
GSM8k	EM	8-shot	92.1	91.7	90.4	86.3
Chinese Tasks
C-Eval	EM	5-shot	92.5	90.0	90.9	80.9
CSimpleQA	Correct	5-shot	77.6	72.1	50.5	53.5

^{? 在本研究中，我們僅評估開源預訓練模型。由于Qwen3-235B-A22B的基準檢查點在我們研究時尚未開源，因此我們報告了Qwen2.5-72B的結果。}
^{? 所有模型均采用相同的評估協議進行測試。}

4. 部署說明

[!注意]
您可以通過 https://platform.moonshot.ai 訪問Kimi K2的API服務，我們提供了兼容OpenAI/Anthropic規范的API接口。

其中Anthropic兼容API的溫度參數映射關系為real_temperature = request_temperature * 0.6，以更好地適配現有應用程序。

我們的模型檢查點采用block-fp8格式存儲，您可以在Huggingface平臺獲取。

當前推薦在以下推理引擎上運行Kimi-K2模型：

vLLM
SGLang
KTransformers
TensorRT-LLM

關于vLLM和SGLang的部署示例，請參閱模型部署指南。

5. 模型使用

聊天補全

本地推理服務啟動后，您可以通過聊天端點與之交互：

def simple_chat(client: OpenAI, model_name: str):messages = [{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},{"role": "user", "content": [{"type": "text", "text": "Please give a brief self-introduction."}]},]response = client.chat.completions.create(model=model_name,messages=messages,stream=False,temperature=0.6,max_tokens=256)print(response.choices[0].message.content)

[!注意]
Kimi-K2-Instruct 的推薦溫度為 temperature = 0.6。
如無特殊要求，上述系統提示是良好的默認設置。

工具調用

Kimi-K2-Instruct 具備強大的工具調用能力。
啟用功能需在每次請求中傳入可用工具列表，模型將自主決定調用時機與方式。

以下示例展示了端到端的天氣工具調用流程：

# Your tool implementation
def get_weather(city: str) -> dict:return {"weather": "Sunny"}# Tool schema definition
tools = [{"type": "function","function": {"name": "get_weather","description": "Retrieve current weather information. Call this when the user asks about the weather.","parameters": {"type": "object","required": ["city"],"properties": {"city": {"type": "string","description": "Name of the city"}}}}
}]# Map tool names to their implementations
tool_map = {"get_weather": get_weather
}def tool_call_with_client(client: OpenAI, model_name: str):messages = [{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},{"role": "user", "content": "What's the weather like in Beijing today? Use the tool to check."}]finish_reason = Nonewhile finish_reason is None or finish_reason == "tool_calls":completion = client.chat.completions.create(model=model_name,messages=messages,temperature=0.6,tools=tools,          # tool list defined abovetool_choice="auto")choice = completion.choices[0]finish_reason = choice.finish_reasonif finish_reason == "tool_calls":messages.append(choice.message)for tool_call in choice.message.tool_calls:tool_call_name = tool_call.function.nametool_call_arguments = json.loads(tool_call.function.arguments)tool_function = tool_map[tool_call_name]tool_result = tool_function(**tool_call_arguments)print("tool_result:", tool_result)messages.append({"role": "tool","tool_call_id": tool_call.id,"name": tool_call_name,"content": json.dumps(tool_result)})print("-" * 100)print(choice.message.content)