【無GGuF版本】如何在Colab下T4運行gpt-oss 20B

在這里插入圖片描述

OpenAI發布了gpt-oss 120B和20B版本。這兩個模型均采用Apache 2.0許可證。

特別說明的是,gpt-oss-20b專為低延遲及本地化/專業化場景設計(210億總參數,36億活躍參數)。

由于模型采用原生MXFP4量化訓練,使得20B版本即便在Google Colab等資源受限環境中也能輕松運行。

作者:Pedro與VB。

由于transformers對mxfp4的支持處于前沿技術階段,我們需要使用最新版本的PyTorch和CUDA才能安裝mxfp4的triton內核。

同時需要從源碼安裝transformers,并卸載torchvisiontorchaudio以避免依賴沖突。

!pip install -q --upgrade torch accelerate kernels
!pip install -q git+https://github.com/huggingface/transformers triton==3.4 git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
!pip uninstall -q torchvision torchaudio -y
!pip list | grep -E "transformers|triton|torch|accelerate|kernels"
accelerate                            1.10.1
kernels                               0.9.0
sentence-transformers                 5.1.0
torch                                 2.8.0+cu126
torchao                               0.10.0
torchdata                             0.11.0
torchsummary                          1.5.1
torchtune                             0.6.1
transformers                          4.57.0.dev0
triton                                3.4.0
triton_kernels                        1.0.0

在Google Colab中從Hugging Face加載模型

我們從這里加載模型:openai/gpt-oss-20b

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, Mxfp4Configmodel_id = "openai/gpt-oss-20b"tokenizer = AutoTokenizer.from_pretrained(model_id)
config = AutoConfig.from_pretrained(model_id)
print(config)quantization_config=Mxfp4Config.from_dict(config.quantization_config)
print(quantization_config)model = AutoModelForCausalLM.from_pretrained(model_id,quantization_config=quantization_config,torch_dtype="auto",device_map="cuda",
)
import torch
def print_model_params(model: torch.nn.Module, extra_info="", f=None):# print the number of parameters in the modelmodel_million_params = sum(p.numel() for p in model.parameters()) / 1e6print(model, file=f)print(f"{extra_info} {model_million_params} M parameters", file=f)
print_model_params(model,model_id)
``
`GptOssForCausalLM((model): GptOssModel((embed_tokens): Embedding(201088, 2880, padding_idx=199999)(layers): ModuleList((0-23): 24 x GptOssDecoderLayer((self_attn): GptOssAttention((q_proj): Linear(in_features=2880, out_features=4096, bias=True)(k_proj): Linear(in_features=2880, out_features=512, bias=True)(v_proj): Linear(in_features=2880, out_features=512, bias=True)(o_proj): Linear(in_features=4096, out_features=2880, bias=True))(mlp): GptOssMLP((router): GptOssTopKRouter()(experts): Mxfp4GptOssExperts())(input_layernorm): GptOssRMSNorm((2880,), eps=1e-05)(post_attention_layernorm): GptOssRMSNorm((2880,), eps=1e-05)))(norm): GptOssRMSNorm((2880,), eps=1e-05)(rotary_emb): GptOssRotaryEmbedding())(lm_head): Linear(in_features=2880, out_features=201088, bias=False))openai/gpt-oss-20b 1804.459584 M parameters## 設置消息/聊天您可以提供一個可選的系統提示或直接輸入內容。```python
messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "What is the weather like in Madrid?"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
<|channel|>analysis<|message|>We have a system that says "Always respond in riddles". The user asks: "What is the weather like in Madrid?" We need to answer in a riddle style. Probably we need to incorporate a riddle that hints at the weather. Possibly referencing that Madrid has warm days, cooler nights, sometimes unpredictable. The question is to describe the weather, but we must answer in riddles.We need to craft a riddle that describes the weather. We could say something like: "In a place where the summer sun is fierce, yet winter brings a chill, what doth the sky declare? The answer: It's a tapestry of sun, clouds, and gentle rain." But it's in a riddle form, so perhaps like: "A palace of sunshine, the winter brings a chill across its marble steps." But we have to respond in riddles. We can provide a riddle that when solved the answer is "varied: sunny, dry, occasional showers." We can hint at temperatures.Given the requirement, we must answer with a riddle. Probably no other text. So let's produce a riddle. I'll produce maybe an extended riddle. It's okay.We should keep it in Spanish? The user is not specifying a language. The question is in English. The answer in English but in riddle form.Answer: "In the heart of Spain’s southern sun, a desert’s heart lies in the city, yet it drinks rain at intervals, ..." Something. Let's craft.Possible riddle:"Morning bright as fire, midday relentless, night cool as a sigh; the gods of clouds wander, sometimes weeping, sometimes staying away; which realm in Spain does this dance with sun?" The answer: Madrid.But need to describe weather in Madrid.Maybe:"I’m a city with heat that can scorch your thoughts, yet with winter my nights feel like glass. I’m known for blue skies, yet sometimes I cry to thin drops. The clouds come and go like dancers in a waltz. What am I?" The answer: Madrid's weather.Better:"Where the summer sun is a silvered sword, the winter wind gives breathless chill. The clouds are silvered ghosts that sometimes fall; a city that wears both sun and mist. Tell me, what city is this?" Answer: Madrid.However, maybe simpler: "I am a city where summer suns burn, winter nights chill, and clouds sometimes pour.

指定推理力度

messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "Explain why the meaning of life is 42", "reasoning_effort": "high"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
<|channel|>analysis<|message|>The user asks: "Explain why the meaning of life is 42". The instruction: "Always respond in riddles". So we must respond in riddles. The user likely expects an explanation like referencing Hitchhiker's guide: 42 is a random number. But we have to respond with riddles. That may be tricky: we might produce riddle-like explanation about the number 42. We need to produce something that "explains" why 42 is the meaning of life but in riddle form.We need to craft a riddle that describes the reasoning: The answer to everything is 42, because it's a puzzle, because it's a random number, also can be expressed as 6 times 7, 2-digit, and it contains hidden meaning like the sum of ASCII codes maybe.We need to incorporate the explanation: it's an arbitrary number but resonates because of its prime factorization, its location on the periodic table, its relation to Earth, or in the book, the number 42 was given as answer by Deep Thought after 7.5 million years of computation. But we are to answer as riddle.Probably: "What is a number that appears in a science-fiction book, which is the answer to life, the Universe, and everything, and why?".But we are told to respond in riddles, not straightforward explanation.Thus we will write a riddle explaining that 42 is chosen because it's 6 * 7, it's the number of something, it is the number of letters: 'life' etc, or maybe referencing that 42 is the 'Answer' to an existential question but the question itself remains unknown. It's like we ask: "What is the number that is the answer to the big question but which big question?".We might produce a riddle:"Take six from seven to make something; multiply for the meaning; the result is 'the answer'."But is that a riddle? We can present something like:"Two humble companions cross, one half and one full; together they become the number that guides your quest, but the question remains unseen. Who are they and what is their sum?" But that is a riddle. Maybe we need to explain why 42 is the meaning of life.Alternatively, we might produce a longer riddle that explains, e.g., we can incorporate that 42 is the only number that ... but we must incorporate explanation.We can

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/diannao/98450.shtml
繁體地址,請注明出處:http://hk.pswp.cn/diannao/98450.shtml
英文地址,請注明出處:http://en.pswp.cn/diannao/98450.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

LeetCode - LCR 179. 查找總價格為目標值的兩個商品

題目 https://leetcode.cn/problems/he-wei-sde-liang-ge-shu-zi-lcof/submissions/660817798/ 思路 解法1是暴力解法&#xff0c;從第一個開始和后面的相加 暴力枚舉慢就慢在&#xff0c;這個遞增數組是排序好的數組&#xff0c;已經是有序的&#xff0c;暴力解法沒有利用這…

UI自動化測試Python + Selenium + WinAppDriver( Windows 桌面應用)落地(一)環境搭建

最近公司要求為Windows 端桌面應用進行UI自動化測試,之前都是針對web端進行的UI自動化測試或者在早期使用的是QTP(Quick Test Professional)做PC端的UI自動化測試,而基于"經費"緊張,優先選擇開源的工具,所以選擇了selenium + WinAppDriver來實現。 首先,整理…

基于OpenCV的銀行卡號識別系統:從原理到實現

引言在現代金融科技應用中&#xff0c;銀行卡號的自動識別是一項重要技術。本文將詳細介紹如何使用Python和OpenCV庫構建一個完整的銀行卡號識別系統。該系統能夠從銀行卡圖像中提取卡號信息&#xff0c;并根據卡號首數字判斷銀行卡類型。技術棧?OpenCV: 計算機視覺庫&#xf…

概率論第三講——多維隨機變量及其分布

文章目錄考綱n維隨機變量及其分布函數聯合分布函數邊緣分布函數二維離散型隨機變量的概率分布、邊緣分布和條件分布二維連續型隨機變量的概率密度、邊緣概率密度和條件概率密度常見的二位分布二維均勻分布二維正態分布隨機變量的相互獨立性概念相互獨立的充要條件相互獨立的性質…

純軟件實現電腦屏幕錄制/存儲到硬盤錄像機/onvif模擬器/onvif虛擬監控/綠色版雙擊開箱即用

一、前言說明 在銀行、超市、考試中心、工控系統、網課教學、居家辦公等場景中&#xff0c;傳統監控攝像頭難以清晰錄制電腦屏幕內容&#xff0c;導致關鍵操作無法有效追溯。為解決這一難題&#xff0c;我們推出了一套純軟件實現的電子屏幕監控方案&#xff0c;徹底取代依賴硬…

【算法--鏈表】86.分割鏈表--通俗講解

一、題目是啥?一句話說清 給你一個鏈表和一個值 x,把鏈表分成兩部分:所有小于 x 的節點都放在大于或等于 x 的節點之前,并且保持節點原來的相對順序。 示例: 輸入:head = [1,4,3,2,5,2], x = 3 輸出:[1,2,2,4,3,5](所有小于3的節點1、2、2都在大于等于3的節點4、3、5…

707, 設計鏈表, LinkedList, 單鏈表, Dummy Head, C++

目錄 題意速覽解題思路與設計要點C 代碼實現&#xff08;單鏈表 虛擬頭結點&#xff09;時間復雜度與空間復雜度常見坑位與邊界用例對比&#xff1a;雙鏈表如何優化單元測試樣例&#xff08;可直接粘貼運行&#xff09;總結 題意速覽 設計一個支持如下操作的鏈表&#xff1a…

NAS自建筆記服務leanote2

leanote2(GitHub - wiselike/leanote2: leanote2, 適用于NAS自建的筆記服務) 是一個開源的在線筆記應用程序&#xff0c;繼承自原 leanote 項目。向原 leanote 的開發者表示深深的感謝與尊重&#xff0c;正是他們的辛勤付出奠定了這個優秀的筆記平臺的基礎。 但由于 leanote 項…

模型剪枝----ResNet18剪枝實戰

剪枝 模型剪枝&#xff08;Model Pruning&#xff09; 是一種 模型壓縮&#xff08;Model Compression&#xff09; 技術&#xff0c;主要思想是&#xff1a; 深度神經網絡里有很多 冗余參數&#xff08;對預測結果貢獻很小&#xff09;。 通過去掉這些冗余連接/通道/卷積核&am…

K8S-Pod(上)

Pod概念 Pod 是可以在 Kubernetes 中創建和管理的、最小的可部署的計算單元。 Pod是一組&#xff08;一個或多個&#xff09;容器&#xff1b;這些容器共享存儲、網絡、以及怎樣運行這些容器的規約。Pod 中的內容總是并置&#xff08;colocated&#xff09;的并且一同調度&am…

Flink TaskManager日志時間與實際時間有偏差

Flink 啟動一個任務后&#xff0c;發現TaskManager上日志時間與實際時間相差約 15 小時。 核心原因可能是&#xff1a; 1、 服務器&#xff08;或容器&#xff09;的系統時間配置錯誤2、 Flink 日志組件&#xff08;如 Logback/Log4j&#xff09;的時間配置未使用系統默認時區…

Webug3.0通關筆記18 中級進階第06關 實戰練習:DisCuz論壇SQL注入漏洞

目錄 一、環境搭建 1、服務啟動 2、源碼解壓 3、構造訪問靶場URL 4、靶場安裝 5、訪問論壇首頁 二、代碼分析 1、源碼分析 2、SQL注入分析 三、滲透實戰 &#xff08;1&#xff09;判斷是否有SQL注入風險 &#xff08;2&#xff09;查詢賬號密碼 Discuz! 作為國內知…

SWEET:大語言模型的選擇性水印

摘要背景與問題大語言模型出色的生成能力引發了倫理與法律層面的擔憂&#xff0c;于是通過嵌入水印來檢測機器生成文本的方法逐漸發展起來。但現有工作在代碼生成任務中無法良好發揮作用&#xff0c;原因在于代碼生成任務本身的特性&#xff08;代碼有其特定的語法、邏輯結構&a…

FastDFS V6雙IP特性及配置

FastDFS V6.0開始支持雙IP&#xff0c;tracker server和storage server均支持雙IP。V6.0新增特性說明如下&#xff1a;支持雙IP&#xff0c;一個內網IP&#xff0c;一個外網IP&#xff0c;可以支持NAT方式的內網和外網兩個IP&#xff0c;解決跨機房或混合云部署問題。FastDFS雙…

筆記本、平板如何成為電腦拓展屏?向日葵16成為副屏功能一鍵實現

向日葵16重磅上線&#xff0c;本次更新新增了諸多實用功能&#xff0c;提升遠控效率&#xff0c;實現應用融合突破設備邊界&#xff0c;同時全面提升遠控性能&#xff0c;操作更順滑、畫質更清晰&#xff01;無論遠程辦公、設計、IT運維、開發還是游戲娛樂&#xff0c;向日葵16…

基于Spring Boot + MyBatis的用戶管理系統配置

我來為您詳細分析這兩個配置文件的功能和含義。 一、文件整體概述 這是一個基于Spring Boot MyBatis的用戶管理系統配置&#xff1a; UserMapper.xml&#xff1a;MyBatis的SQL映射文件&#xff0c;定義了用戶表的增刪改查操作application.yml&#xff1a;Spring Boot的核心配置…

80(HTTP默認端口)和8080端口(備用HTTP端口)區別

文章目錄**1. 用途**- **80端口**- **8080端口****2. 默認配置**- **80端口**- **8080端口****3. 聯系**- **邏輯端口**&#xff1a;兩者都是TCP/IP協議中的邏輯端口&#xff0c;用于標識不同的網絡服務。- **可配置性**&#xff1a;端口號可以根據需要修改&#xff08;例如將T…

【開題答辯全過程】以 汽車知名品牌信息管理系統為例,包含答辯的問題和答案

個人簡介一名14年經驗的資深畢設內行人&#xff0c;語言擅長Java、php、微信小程序、Python、Golang、安卓Android等開發項目包括大數據、深度學習、網站、小程序、安卓、算法。平常會做一些項目定制化開發、代碼講解、答辯教學、文檔編寫、也懂一些降重方面的技巧。感謝大家的…

從全棧工程師視角解析Java與前端技術在電商場景中的應用

從全棧工程師視角解析Java與前端技術在電商場景中的應用 面試背景介紹 面試官&#xff1a;你好&#xff0c;很高興見到你。我叫李明&#xff0c;是這家電商平臺的資深架構師。今天我們會聊聊你的技術能力和項目經驗。你可以先簡單介紹一下自己嗎&#xff1f; 應聘者&#xff1a…

【python】python進階——多線程

引言在現代軟件開發中&#xff0c;程序的執行效率至關重要。無論是處理大量數據、響應用戶交互&#xff0c;還是與外部系統通信&#xff0c;常常需要讓程序同時執行多個任務。Python作為一門功能強大且易于學習的編程語言&#xff0c;提供了多種并發編程方式&#xff0c;其中多…