OpenAI發布了gpt-oss
120B和20B版本。這兩個模型均采用Apache 2.0許可證。
特別說明的是,gpt-oss-20b
專為低延遲及本地化/專業化場景設計(210億總參數,36億活躍參數)。
由于模型采用原生MXFP4量化訓練,使得20B版本即便在Google Colab等資源受限環境中也能輕松運行。
作者:Pedro與VB。
由于transformers對mxfp4的支持處于前沿技術階段,我們需要使用最新版本的PyTorch和CUDA才能安裝mxfp4
的triton內核。
同時需要從源碼安裝transformers,并卸載torchvision
和torchaudio
以避免依賴沖突。
!pip install -q --upgrade torch accelerate kernels
!pip install -q git+https://github.com/huggingface/transformers triton==3.4 git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
!pip uninstall -q torchvision torchaudio -y
!pip list | grep -E "transformers|triton|torch|accelerate|kernels"
accelerate 1.10.1
kernels 0.9.0
sentence-transformers 5.1.0
torch 2.8.0+cu126
torchao 0.10.0
torchdata 0.11.0
torchsummary 1.5.1
torchtune 0.6.1
transformers 4.57.0.dev0
triton 3.4.0
triton_kernels 1.0.0
在Google Colab中從Hugging Face加載模型
我們從這里加載模型:openai/gpt-oss-20b
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, Mxfp4Configmodel_id = "openai/gpt-oss-20b"tokenizer = AutoTokenizer.from_pretrained(model_id)
config = AutoConfig.from_pretrained(model_id)
print(config)quantization_config=Mxfp4Config.from_dict(config.quantization_config)
print(quantization_config)model = AutoModelForCausalLM.from_pretrained(model_id,quantization_config=quantization_config,torch_dtype="auto",device_map="cuda",
)
import torch
def print_model_params(model: torch.nn.Module, extra_info="", f=None):# print the number of parameters in the modelmodel_million_params = sum(p.numel() for p in model.parameters()) / 1e6print(model, file=f)print(f"{extra_info} {model_million_params} M parameters", file=f)
print_model_params(model,model_id)
``
`GptOssForCausalLM((model): GptOssModel((embed_tokens): Embedding(201088, 2880, padding_idx=199999)(layers): ModuleList((0-23): 24 x GptOssDecoderLayer((self_attn): GptOssAttention((q_proj): Linear(in_features=2880, out_features=4096, bias=True)(k_proj): Linear(in_features=2880, out_features=512, bias=True)(v_proj): Linear(in_features=2880, out_features=512, bias=True)(o_proj): Linear(in_features=4096, out_features=2880, bias=True))(mlp): GptOssMLP((router): GptOssTopKRouter()(experts): Mxfp4GptOssExperts())(input_layernorm): GptOssRMSNorm((2880,), eps=1e-05)(post_attention_layernorm): GptOssRMSNorm((2880,), eps=1e-05)))(norm): GptOssRMSNorm((2880,), eps=1e-05)(rotary_emb): GptOssRotaryEmbedding())(lm_head): Linear(in_features=2880, out_features=201088, bias=False))openai/gpt-oss-20b 1804.459584 M parameters## 設置消息/聊天您可以提供一個可選的系統提示或直接輸入內容。```python
messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "What is the weather like in Madrid?"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
<|channel|>analysis<|message|>We have a system that says "Always respond in riddles". The user asks: "What is the weather like in Madrid?" We need to answer in a riddle style. Probably we need to incorporate a riddle that hints at the weather. Possibly referencing that Madrid has warm days, cooler nights, sometimes unpredictable. The question is to describe the weather, but we must answer in riddles.We need to craft a riddle that describes the weather. We could say something like: "In a place where the summer sun is fierce, yet winter brings a chill, what doth the sky declare? The answer: It's a tapestry of sun, clouds, and gentle rain." But it's in a riddle form, so perhaps like: "A palace of sunshine, the winter brings a chill across its marble steps." But we have to respond in riddles. We can provide a riddle that when solved the answer is "varied: sunny, dry, occasional showers." We can hint at temperatures.Given the requirement, we must answer with a riddle. Probably no other text. So let's produce a riddle. I'll produce maybe an extended riddle. It's okay.We should keep it in Spanish? The user is not specifying a language. The question is in English. The answer in English but in riddle form.Answer: "In the heart of Spain’s southern sun, a desert’s heart lies in the city, yet it drinks rain at intervals, ..." Something. Let's craft.Possible riddle:"Morning bright as fire, midday relentless, night cool as a sigh; the gods of clouds wander, sometimes weeping, sometimes staying away; which realm in Spain does this dance with sun?" The answer: Madrid.But need to describe weather in Madrid.Maybe:"I’m a city with heat that can scorch your thoughts, yet with winter my nights feel like glass. I’m known for blue skies, yet sometimes I cry to thin drops. The clouds come and go like dancers in a waltz. What am I?" The answer: Madrid's weather.Better:"Where the summer sun is a silvered sword, the winter wind gives breathless chill. The clouds are silvered ghosts that sometimes fall; a city that wears both sun and mist. Tell me, what city is this?" Answer: Madrid.However, maybe simpler: "I am a city where summer suns burn, winter nights chill, and clouds sometimes pour.
指定推理力度
messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "Explain why the meaning of life is 42", "reasoning_effort": "high"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
<|channel|>analysis<|message|>The user asks: "Explain why the meaning of life is 42". The instruction: "Always respond in riddles". So we must respond in riddles. The user likely expects an explanation like referencing Hitchhiker's guide: 42 is a random number. But we have to respond with riddles. That may be tricky: we might produce riddle-like explanation about the number 42. We need to produce something that "explains" why 42 is the meaning of life but in riddle form.We need to craft a riddle that describes the reasoning: The answer to everything is 42, because it's a puzzle, because it's a random number, also can be expressed as 6 times 7, 2-digit, and it contains hidden meaning like the sum of ASCII codes maybe.We need to incorporate the explanation: it's an arbitrary number but resonates because of its prime factorization, its location on the periodic table, its relation to Earth, or in the book, the number 42 was given as answer by Deep Thought after 7.5 million years of computation. But we are to answer as riddle.Probably: "What is a number that appears in a science-fiction book, which is the answer to life, the Universe, and everything, and why?".But we are told to respond in riddles, not straightforward explanation.Thus we will write a riddle explaining that 42 is chosen because it's 6 * 7, it's the number of something, it is the number of letters: 'life' etc, or maybe referencing that 42 is the 'Answer' to an existential question but the question itself remains unknown. It's like we ask: "What is the number that is the answer to the big question but which big question?".We might produce a riddle:"Take six from seven to make something; multiply for the meaning; the result is 'the answer'."But is that a riddle? We can present something like:"Two humble companions cross, one half and one full; together they become the number that guides your quest, but the question remains unseen. Who are they and what is their sum?" But that is a riddle. Maybe we need to explain why 42 is the meaning of life.Alternatively, we might produce a longer riddle that explains, e.g., we can incorporate that 42 is the only number that ... but we must incorporate explanation.We can