對接gemini-2.5-flash-image-preview教程
一、前置準備
1. 明確模型要求
本次對接的gemini-2.5-flash-image-preview
模型,繼承Gemini系列多模態特性,支持文本生成圖片、文本結合圖片編輯等功能。需注意該模型不支持僅輸出圖片,必須配置["TEXT", "IMAGE"]
雙模態輸出;所有生成圖片均含SynthID水印,當前支持英語、西班牙語(墨西哥)、日語、簡體中文、印地語等語言提示詞,暫不支持音頻或視頻輸入。
2. 環境配置
- 安裝基礎網絡請求工具:如Python的
requests
庫、JavaScript的axios
庫,用于向指定BaseURL發送API請求。 - 準備Base64編碼工具:若涉及圖片編輯,需將本地圖片轉為Base64格式傳入請求參數。
- 獲取Gemini API密鑰(
GEMINI_API_KEY
):用于身份驗證,需在請求頭或參數中攜帶(若BaseURL接口已集成密鑰管理,可省略此步驟)。
二、核心功能對接步驟
1. 文本生成圖片(Text-to-Image)
通過文本提示詞生成對應圖片,以下為不同編程語言實現示例,均基于指定BaseURL(http://api.aaigc.top)開發。
Python實現
import requests
import base64
from io import BytesIO
from PIL import Image# 配置基礎信息
BASE_URL = "http://api.aaigc.top"
ENDPOINT = "/v1beta/models/gemini-2.5-flash-image-preview:generateContent" # 接口端點(參考Gemini API規范,以實際為準)
API_KEY = "你的GEMINI_API_KEY" # 接口集成密鑰時可刪除# 文本提示詞
prompt = "3D渲染風格:戴禮帽、長翅膀的小豬,飛越滿是綠色植物的未來科幻城市,城市高樓林立且帶霓虹燈光"# 構造請求參數
payload = {"contents": [{"parts": [{"text": prompt}]}],"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} # 必須雙模態輸出
}# 構造請求頭
headers = {"Content-Type": "application/json","Authorization": f"Bearer {API_KEY}" # 接口集成密鑰時可刪除
}# 發送請求并處理響應
response = requests.post(f"{BASE_URL}{ENDPOINT}", json=payload, headers=headers)
response.raise_for_status()
data = response.json()# 解析文本與圖片
for part in data["candidates"][0]["content"]["parts"]:if "text" in part and part["text"]:print("模型文本回復:", part["text"])elif "inlineData" in part and part["inlineData"]["data"]:image_data = base64.b64decode(part["inlineData"]["data"])image = Image.open(BytesIO(image_data))image.save("gemini-text-to-image.png")image.show()print("圖片已保存:gemini-text-to-image.png")
JavaScript實現(Node.js環境)
const axios = require('axios');
const fs = require('fs');
const path = require('path');// 配置基礎信息
const BASE_URL = "http://api.aaigc.top";
const ENDPOINT = "/v1beta/models/gemini-2.5-flash-image-preview:generateContent";
const API_KEY = "你的GEMINI_API_KEY";// 文本提示詞
const prompt = "3D渲染風格:戴禮帽、長翅膀的小豬,飛越滿是綠色植物的未來科幻城市,城市高樓林立且帶霓虹燈光";// 構造請求參數
const payload = {"contents": [{"parts": [{"text": prompt}]}],"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
};// 構造請求頭
const headers = {"Content-Type": "application/json","Authorization": `Bearer ${API_KEY}`
};// 發送請求并處理響應
async function generateImageFromText() {try {const response = await axios.post(`${BASE_URL}${ENDPOINT}`, payload, { headers });const data = response.data;for (const part of data.candidates[0].content.parts) {if (part.text) {console.log("模型文本回復:", part.text);} else if (part.inlineData && part.inlineData.data) {const imageBuffer = Buffer.from(part.inlineData.data, 'base64');const savePath = path.join(__dirname, "gemini-text-to-image.png");fs.writeFileSync(savePath, imageBuffer);console.log(`圖片已保存:${savePath}`);}}} catch (error) {console.error("請求失敗:", error.response?.data || error.message);}
}generateImageFromText();
2. 圖片編輯(Image + Text-to-Image)
傳入Base64格式原始圖片與編輯提示詞,模型將按要求修改圖片,關鍵步驟如下:
前置操作:圖片轉Base64(Python示例)
import base64def image_to_base64(image_path):with open(image_path, "rb") as image_file:return base64.b64encode(image_file.read()).decode("utf-8")# 轉換本地圖片
original_image_path = "original-image.png"
image_base64 = image_to_base64(original_image_path)
Python編輯圖片示例
import requests
import base64
from io import BytesIO
from PIL import Image# 配置基礎信息(同文本生成圖片)
BASE_URL = "http://api.aaigc.top"
ENDPOINT = "/v1beta/models/gemini-2.5-flash-image-preview:generateContent"
API_KEY = "你的GEMINI_API_KEY"# 原始圖片Base64編碼
original_image_path = "original-image.png"
image_base64 = image_to_base64(original_image_path)# 編輯提示詞
edit_prompt = "在人物身旁添加一只白色羊駝,羊駝面向人物,整體風格與原圖保持一致(如原圖寫實,羊駝也需寫實)"# 構造請求參數
payload = {"contents": [{"parts": [{"text": edit_prompt},{"inlineData": {"mimeType": "image/png", "data": image_base64}} # 匹配圖片實際格式]}],"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
}# 構造請求頭(同文本生成圖片)
headers = {"Content-Type": "application/json","Authorization": f"Bearer {API_KEY}"
}# 發送請求并解析響應
response = requests.post(f"{BASE_URL}{ENDPOINT}", json=payload, headers=headers)
response.raise_for_status()
data = response.json()# 保存編輯后圖片
for part in data["candidates"][0]["content"]["parts"]:if "inlineData" in part and part["inlineData"]["data"]:image_data = base64.b64decode(part["inlineData"]["data"])edited_image = Image.open(BytesIO(image_data))edited_image.save("gemini-edited-image.png")edited_image.show()print("編輯后圖片已保存:gemini-edited-image.png")elif "text" in part and part["text"]:print("模型編輯說明:", part["text"])
三、常見問題與注意事項
- 僅輸出文本:需在提示詞中明確包含“生成圖片”“更新圖片”等指令,如將“添加羊駝”改為“生成添加羊駝后的圖片”。
- 生成中斷:重試請求或簡化提示詞,避免單次提示包含過多元素。
- Base64編碼錯誤:確保編碼完整(無多余空格/換行),且
mimeType
與圖片格式一致(JPG對應image/jpeg
,PNG對應image/png
)。 - 地區可用性:若提示“服務暫不可用”,需確認當前地區是否開放該模型功能,可參考BaseURL接口的地區支持說明。
四、案例
1.以下為一張卡哇伊風格的快樂小熊貼紙。背景為設定的白色,整體采用清晰輪廓和大膽配色,整個設計十分生動和吸引人
Create a [image type] for [brand/concept] with the text “[text to render]” in a [font style]. The design should be [style description], with a [color scheme].1from google import genai2from google.genai import types3from PIL import Image4from io import BytesIO56client = genai.Client()78# Generate an image from a text prompt9response = client.models.generate_content(
10 model="gemini-2.5-flash-image-preview",
11 contents="Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white.",
12)
13
14image_parts = [
15 part.inline_data.data
16 for part in response.candidates[0].content.parts
17 if part.inline_data
18]
19
20if image_parts:
21 image = Image.open(BytesIO(image_parts[0]))
22 image.save('logo_example.png')
23 image.show()
2.以下為官方生成的一位老年陶瓷藝術家的特寫柔和的金色陽光透過窗戶灑進畫面,照亮了陶土的細膩質感和老人臉上的皺紋。
A [style] sticker of a [subject], featuring [key characteristics] and a [color palette]. The design should have [line style] and [shading style]. The background must be white.1from google import genai2from google.genai import types3from PIL import Image4from io import BytesIO56client = genai.Client()78# Generate an image from a text prompt9response = client.models.generate_content(
10 model="gemini-2.5-flash-image-preview",
11 contents="A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful.",
12)
13
14image_parts = [
15 part.inline_data.data
16 for part in response.candidates[0].content.parts
17 if part.inline_data
18]
19
20if image_parts:
21 image = Image.open(BytesIO(image_parts[0]))
22 image.save('photorealistic_example.png')
23 image.show()
3.貓貓在雙子座星空下的豪華餐廳里吃香蕉。哇哦,貓貓桌子上還擺著刀叉和酒杯,餐廳里其他桌子上也有客人,真是充滿了細節。
1 from google import genai2 from google.genai import types3 from PIL import Image4 from io import BytesIO56 client = genai.Client()78 # Generate an image from a text prompt9 response = client.models.generate_content(
10 model="gemini-2.5-flash-image-preview",
11 contents="A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful.",
12 )
13
14 image_parts = [
15 part.inline_data.data
16 for part in response.candidates[0].content.parts
17 if part.inline_data
18 ]
19
20 if image_parts:
21 image = Image.open(BytesIO(image_parts[0]))
22 image.save('photorealistic_example.png')