CogView4 文本生成圖像

flyfish

基于 CogView4Pipeline 的圖像生成程序，其主要目的是依據 JSON 文件里的文本提示信息來生成圖像，并且把生成的圖像保存到指定文件夾。
JSON 文件格式

[{"prompt": "your first prompt"},{"prompt": "your second prompt"}
]

從源代碼安裝diffusers庫

pip install git+https://github.com/huggingface/diffusers.git

分辨率: 長寬均需滿足 512px - 2048px 之間，需被32整除, 并保證最大像素數不超過 2^21 px。
精度: BF16 / FP32 (不支持FP16，會出現溢出導致純黑圖片)

import json
import torch
import os
from datetime import datetime
import random
import string
import argparse
from diffusers import CogView4Pipeline# 單例模式配置類
class ConfigSingleton:_instance = Nonedef __new__(cls):if cls._instance is None:cls._instance = super().__new__(cls)parser = argparse.ArgumentParser(description='Image generation configuration')parser.add_argument('--model_path', default='/media/models/ZhipuAI/CogView4-6B/',help='Path to the model')parser.add_argument('--height', type=int, default=1024, help='Height of the generated images')parser.add_argument('--width', type=int, default=1024, help='Width of the generated images')parser.add_argument('--guidance_scale', type=float, default=3.5,help='Guidance scale for image generation')parser.add_argument('--num_inference_steps', type=int, default=50,help='Number of inference steps')parser.add_argument('--num_images_per_prompt', type=int, default=10,help='Number of images to generate per prompt')parser.add_argument('--json_file', default='DonQuixotedelaMancha.json',help='Path to the JSON file containing prompts')parser.add_argument('--save_folder', default='generated_images',help='Folder to save the generated images')args = parser.parse_args()cls._instance.config = {"save_folder": args.save_folder,"filename_timestamp_format": "%Y%m%d_%H%M%S","random_char_count": 6,"model_path": args.model_path,"height": args.height,"width": args.width,"guidance_scale": args.guidance_scale,"num_inference_steps": args.num_inference_steps,"num_images_per_prompt": args.num_images_per_prompt,"json_file": args.json_file}os.makedirs(cls._instance.config["save_folder"], exist_ok=True)return cls._instancedef get_config(self):return self.config# 數據加載器類
class DataLoader:def __init__(self, config):self.config = configdef load_data(self):try:with open(self.config["json_file"], 'r') as f:return json.load(f)except FileNotFoundError:print(f"Error: The JSON file '{self.config['json_file']}' was not found.")return []except json.JSONDecodeError:print(f"Error: Failed to decode the JSON file '{self.config['json_file']}'.")return []# 模型管道初始化類
class PipelineInitializer:def __init__(self, config):self.config = configdef initialize_pipeline(self):pipe = CogView4Pipeline.from_pretrained(self.config["model_path"],torch_dtype=torch.bfloat16)# 將模型移到 GPU 上pipe = pipe.to("cuda")pipe.vae.enable_slicing()pipe.vae.enable_tiling()return pipe# 文件名生成器類
class FilenameGenerator:def __init__(self, config):self.config = configdef generate_unique_filename(self):timestamp = datetime.now().strftime(self.config["filename_timestamp_format"])random_chars = ''.join(random.choices(string.ascii_letters + string.digits, k=self.config["random_char_count"]))return f"{timestamp}_{random_chars}.png"# 圖像生成和保存類
class ImageGeneratorAndSaver:def __init__(self, pipe, config):self.pipe = pipeself.config = configself.filename_generator = FilenameGenerator(config)def generate_and_save_images(self, data):# 使用 CUDA 生成器generator = torch.Generator("cuda")for i, item in enumerate(data):prompt = item["prompt"]try:result = self.pipe(prompt=prompt,guidance_scale=self.config["guidance_scale"],num_images_per_prompt=self.config["num_images_per_prompt"],num_inference_steps=self.config["num_inference_steps"],width=self.config["width"],height=self.config["height"],generator=generator)for j in range(len(result.images)):filename = self.filename_generator.generate_unique_filename()file_path = os.path.join(self.config["save_folder"], filename)result.images[j].save(file_path)print(f"Saved {file_path} for prompt: {prompt}")except Exception as e:print(f"Error generating image for prompt '{prompt}': {e}")# 主類，協調各個組件
class ImageGenerationManager:def __init__(self):self.config = ConfigSingleton().get_config()self.data_loader = DataLoader(self.config)self.pipeline_initializer = PipelineInitializer(self.config)def run(self):data = self.data_loader.load_data()if not data:returnpipe = self.pipeline_initializer.initialize_pipeline()image_generator = ImageGeneratorAndSaver(pipe, self.config)image_generator.generate_and_save_images(data)if __name__ == "__main__":manager = ImageGenerationManager()manager.run()

整體流程

程序首先對配置參數進行初始化，接著加載提示信息，然后初始化模型管道，之后按照提示信息生成圖像并保存，最后將生成的圖像保存到指定文件夾。整個過程由 ImageGenerationManager 類進行協調管理。

1. 配置參數初始化

單例模式配置類 ConfigSingleton：
- 借助 argparse 模塊解析命令行參數，這些參數涵蓋了模型路徑、圖像的高度和寬度、引導比例、推理步數、每個提示生成的圖像數量、包含提示信息的 JSON 文件路徑以及保存生成圖像的文件夾路徑。
- 把解析后的參數存儲在 config 字典中，同時創建保存圖像的文件夾（若該文件夾不存在）。

2. 數據加載

數據加載器類 DataLoader：
- 接收配置信息作為輸入。
- 嘗試打開并讀取 JSON 文件，若文件不存在或者解析失敗，會輸出相應的錯誤信息并返回空列表。

3. 模型管道初始化

模型管道初始化類 PipelineInitializer：
- 接收配置信息作為輸入。
- 利用 CogView4Pipeline.from_pretrained 方法加載預訓練模型，并將其數據類型設定為 torch.bfloat16。
- 把模型移到 GPU 上（使用 pipe.to("cuda")），開啟 vae 的切片和分塊功能以減少 GPU 內存的使用。

4. 文件名生成

文件名生成器類 FilenameGenerator：
- 接收配置信息作為輸入。
- 生成獨一無二的文件名，文件名由當前時間戳和隨機字符組合而成。

5. 圖像生成與保存

圖像生成和保存類 ImageGeneratorAndSaver：
- 接收模型管道和配置信息作為輸入。
- 針對 JSON 文件中的每個提示信息：
  - 運用 torch.Generator("cuda") 創建一個 CUDA 隨機數生成器。
  - 調用模型管道的 pipe 方法生成圖像，傳入提示信息、引導比例、推理步數、圖像的高度和寬度等參數。
  - 為每張生成的圖像生成一個唯一的文件名，并將其保存到指定文件夾，同時輸出保存信息。
  - 若生成圖像時出現異常，會輸出相應的錯誤信息。

6. 主程序執行

主類 ImageGenerationManager：
- 初始化配置信息、數據加載器和模型管道初始化器。
- 調用 run 方法：
  - 加載 JSON 文件中的提示信息。
  - 若提示信息不為空，初始化模型管道。
  - 創建圖像生成和保存類的實例，并調用其 generate_and_save_images 方法生成并保存圖像。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/904883.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/904883.shtml
英文地址，請注明出處：http://en.pswp.cn/news/904883.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！