使用 Amazon Bedrock Converse API 簡化大語言模型交互

本文將介紹如何使用 Amazon Bedrock 最新推出的 Converse API，來簡化與各種大型語言模型的交互。該 API 提供了一致的接口，可以無縫調用各種大型模型，從而消除了需要自己編寫復雜輔助功能函數的重復性工作。文中示例將展示它相比于以前針對每個模型進行獨立集成的方式，具有更簡單的實現。文中還將提供完整代碼，展示使用 Converse API 來調用 Claude 3 Sonnet 模型進行多模態圖像描述。

亞馬遜云科技開發者社區為開發者們提供全球的開發技術資源。這里有技術文檔、開發案例、技術專欄、培訓視頻、活動與競賽等。幫助中國開發者對接世界最前沿技術，觀點，和項目，并將中國優秀開發者或技術推薦給全球云社區。如果你還沒有關注/收藏，看到這里請一定不要匆匆劃過，點這里讓它成為你的技術寶庫！

為了幫助開發者快速理解新的 Converse API，我對比了在 Converse API 發布之前，開發者是如何用代碼實現調用多個大模型，并集成到統一接口的示例。通過 Converse API 示例代碼，我將展示 Converse API 是如何輕松完成簡化統一多模型交互接口的工作。最后，我還會重點分享如何使用 Converse API 調用 Claude 3 Sonnet 模型，分析兩張在美麗的中國香港拍攝的街景照片。

本文選自我于 2024 年 6 月，在 Amazon Web Services 開發者社區上發表的技術博客“Streaming Large Language Model Interactions with Amazon Bedrock Converse API”。

Converse API 之前的世界

過去，開發人員必須編寫復雜的輔助函數，來統一應付不同大語言模型之前不同的的輸入和輸出格式。例如，在 2024 年 5 月初的亞馬遜云科技香港峰會中，為了在一個文件中使用 Amazon Bedrock 調用 5-6 個不同的大語言模型，我需要編寫總共 116 行代碼來實現這個統一接口的功能。

我當時是使用 Python 語言來編寫這個函數，其它語言實現也基本類似。在沒有 Converse API 之前，開發者需要自己編寫輔助函數，調用 Amazon Bedrock 中來自不同提供商（Anthropic、Mistral、AI21、Amazon、Cohere 和 Meta 等）的不同大型語言模型。

以下我的代碼中的“invoke_model”函數接受提示詞、模型名，以及各種參數配置（例如：溫度、top-k、top-p 和停止序列等），最終得到來自指定語言模型生成的輸出文本。

我之前需要編寫的輔助函數代碼中，需要考慮來自不同模型提供商的提示詞格式要求，然后才能發送針對某些特定模型的指定輸入數據和提示詞結構。代碼如下所示：

import json
import boto3def invoke_model(client, prompt, model, accept = 'application/json', content_type = 'application/json',max_tokens  = 512, temperature = 1.0, top_p = 1.0, top_k = 200, stop_sequences = [],count_penalty = 0, presence_penalty = 0, frequency_penalty = 0, return_likelihoods = 'NONE'):# default responseoutput = ''# identify the model providerprovider = model.split('.')[0] # InvokeModelif (provider == 'anthropic'): input = {'prompt': prompt,'max_tokens_to_sample': max_tokens, 'temperature': temperature,'top_k': top_k,'top_p': top_p,'stop_sequences': stop_sequences}body=json.dumps(input)response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)response_body = json.loads(response.get('body').read())output = response_body['completion']elif (provider == 'mistral'): input = {'prompt': prompt,'max_tokens': max_tokens,'temperature': temperature,'top_k': top_k,'top_p': top_p,'stop': stop_sequences}body=json.dumps(input)response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)response_body = json.loads(response.get('body').read())results = response_body['outputs']for result in results:output = output + result['text']        elif (provider == 'ai21'): input = {'prompt': prompt, 'maxTokens': max_tokens,'temperature': temperature,'topP': top_p,'stopSequences': stop_sequences,'countPenalty': {'scale': count_penalty},'presencePenalty': {'scale': presence_penalty},'frequencyPenalty': {'scale': frequency_penalty}}body=json.dumps(input)response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)response_body = json.loads(response.get('body').read())completions = response_body['completions']for part in completions:output = output + part['data']['text']elif (provider == 'amazon'): input = {'inputText': prompt,'textGenerationConfig': {'maxTokenCount': max_tokens,'stopSequences': stop_sequences,'temperature': temperature,'topP': top_p}}body=json.dumps(input)response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)response_body = json.loads(response.get('body').read())results = response_body['results']for result in results:output = output + result['outputText']elif (provider == 'cohere'): input = {'prompt': prompt, 'max_tokens': max_tokens,'temperature': temperature,'k': top_k,'p': top_p,'stop_sequences': stop_sequences,'return_likelihoods': return_likelihoods}body=json.dumps(input)response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)response_body = json.loads(response.get('body').read())results = response_body['generations']for result in results:output = output + result['text']elif (provider == 'meta'): input = {'prompt': prompt,'max_gen_len': max_tokens,'temperature': temperature,'top_p': top_p}body=json.dumps(input)response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)response_body = json.loads(response.get('body').read())output = response_body['generation']# returnreturn output# main function
bedrock = boto3.client(service_name='bedrock-runtime'
)
model  = 'mistral.mistral-7b-instruct-v0:2'
prompt = """Human: Explain how chicken swim to an 8 year old using 2 paragraphs.Assistant:
"""
output = invoke_model(client=bedrock, prompt=prompt, model=model)
print(output)

以上代碼行數僅展示了針對這幾個模型的接口函數實現，隨著需要統一調用的不同大型模型越來越多，代碼量還會不斷增長。完整代碼如下所示：https://catalog.us-east-1.prod.workshops.aws/workshops/5501fb48-e04b-476d-89b0-43a7ecaf1595/en-US/full-day-event-fm-and-embedding/fm/03-making-things-simpler?trk=cndc-detail

使用 Converse API 的世界

以下來自亞馬遜云科技官方網站的代碼片段，展示了使用 Amazon Bedrock Converse API 調用大型語言模型的簡易性：

def generate_conversation(bedrock_client,model_id,system_text,input_text):……# Send the message.response = bedrock_client.converse(modelId=model_id,messages=messages,system=system_prompts,inferenceConfig=inference_config,additionalModelRequestFields=additional_model_fields)……

完整代碼可以參考以下鏈接：https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html#message-inference-examples?trk=cndc-detail

為了提供給開發者們一個使用 Converse API 調用大模型的完整代碼示例，特設計以下這個香港銅鑼灣街景的大模型描述任務。

代碼示例中，主要使用?converse()?方法將文本和圖像發送到 Claude 3 Sonnet 模型的示例。代碼讀入一個圖像文件，使用文本提示和圖像字節創建消息有效負載，然后打印出模型對場景的描述。另外，在這段代碼中如果要使用不同的圖像進行測試，只需更新輸入文件路徑即可。

輸入圖片為兩張照片，如下所示。這是我在撰寫這篇技術文章時，從窗戶拍攝出去的香港銅鑼灣的美麗街景照：

Causeway Bay Street View, Hong Kong (Image 1)

Causeway Bay Street View, Hong Kong (Image 2)

在核心代碼部分，我根據前面提到的亞馬遜云科技官方網站提供的示例代碼，稍做修改編寫了一個新的 generate_conversation_with_image() 函數，并在 main() 主函數的合適位置調用這個函數。完整代碼如下所示：

def generate_conversation_with_image(bedrock_client,model_id,input_text,input_image):"""Sends a message to a model.Args:bedrock_client: The Boto3 Bedrock runtime client.model_id (str): The model ID to use.input text : The input message.input_image : The input image.Returns:response (JSON): The conversation that the model generated."""logger.info("Generating message with model %s", model_id)# Message to send.with open(input_image, "rb") as f:image = f.read()message = {"role": "user","content": [{"text": input_text},{"image": {"format": 'png',"source": {"bytes": image}}}]}messages = [message]# Send the message.response = bedrock_client.converse(modelId=model_id,messages=messages)return responsedef main():"""Entrypoint for Anthropic Claude 3 Sonnet example."""logging.basicConfig(level=logging.INFO,format="%(levelname)s: %(message)s")model_id = "anthropic.claude-3-sonnet-20240229-v1:0"input_text = "What's in this image?"input_image = "IMG_1_Haowen.jpg"try:bedrock_client = boto3.client(service_name="bedrock-runtime")response = generate_conversation_with_image(bedrock_client, model_id, input_text, input_image)output_message = response['output']['message']print(f"Role: {output_message['role']}")for content in output_message['content']:print(f"Text: {content['text']}")token_usage = response['usage']print(f"Input tokens:  {token_usage['inputTokens']}")print(f"Output tokens:  {token_usage['outputTokens']}")print(f"Total tokens:  {token_usage['totalTokens']}")print(f"Stop reason: {response['stopReason']}")except ClientError as err:message = err.response['Error']['Message']logger.error("A client error occurred: %s", message)print(f"A client error occured: {message}")else:print(f"Finished generating text with model {model_id}.")if __name__ == "__main__":main()

對于銅鑼灣街景照片之一，我從 Claude 3 Sonnet 模型中獲得以下輸出結果：

為了讀者閱讀的方便，我在此處復制了這個模型的輸出結果：

Role: assistant
Text: This image shows a dense urban cityscape with numerous high-rise residential and office buildings in Hong Kong. In the foreground, there are sports facilities like a running track, soccer/football fields, and tennis/basketball courts surrounded by the towering skyscrapers of the city. The sports venues provide open green spaces amidst the densely packed urban environment. The scene captures the juxtaposition of modern city living and recreational amenities in a major metropolitan area like Hong Kong.
Input tokens:  1580
Output tokens:  103
Total tokens:  1683
Stop reason: end_turn
Finished generating text with model anthropic.claude-3-sonnet-20240229-v1:0.

對于銅鑼灣街景照片之二，我只是簡單地將代碼中的?input_image?路徑修改為新的圖像路徑。當我將該照片作為新圖像輸入到 Claude 3 Sonnet 模型時，我從 Claude 3 Sonnet 模型中獲得了以下輸出結果：

為了讀者閱讀的方便，我在此處同樣復制了這個模型的輸出結果：

Role: assistant
Text: This image shows an aerial view of a dense urban city skyline, likely in a major metropolitan area. The cityscape is dominated by tall skyscrapers and high-rise apartment or office buildings of varying architectural styles, indicating a highly developed and populous city center.In the foreground, a major highway or expressway can be seen cutting through the city, with multiple lanes of traffic visible, though the traffic appears relatively light in this particular view. There are also some pockets of greenery interspersed among the buildings, such as parks or green spaces.One notable feature is a large billboard or advertisement for the luxury brand Chanel prominently displayed on the side of a building, suggesting this is a commercial and shopping district.Overall, the image captures the concentrated urban density, modern infrastructure, and mixture of residential, commercial, and transportation elements characteristic of a major cosmopolitan city.
Input tokens:  1580
Output tokens:  188
Total tokens:  1768
Stop reason: end_turn
Finished generating text with model anthropic.claude-3-sonnet-20240229-v1:0.

小結

Amazon Bedrock 的新 Converse API 通過提供一致的接口，簡化了與大型語言模型之間的不同交互，而無需針對于特定模型編寫特定的實現。傳統方式下，開發人員需要編寫包含數百行代碼的復雜輔助函數，以統一各個模型的輸入/輸出格式。Converse API 允許使用相同的 API 無縫調用各種大語言模型，從而大大降低了代碼復雜度。

本文的代碼示例展示了 Converse API 的簡潔性，而過去的方法需要針對每個模型提供者進行獨特的集成。第二個代碼示例重點介紹了通過 Converse API 調用 Claude 3 Sonnet 模型進行圖像描述。

總體而言，Converse API 簡化了在 Amazon Bedrock 中使用不同的大型語言模型的交互過程，通過一致性的界面大幅減少了開發工作量，讓生成式 AI 應用的開發者可以更加專注于基于自己業務的獨特創新和想象力。

說明：本博客文章的封面圖像由在 Amazon Bedrock 上的 Stable Diffusion SDXL 1.0 大模型生成。

提供給 Stable Diffusion SDXL 1.0 大模型的英文提示詞如下，供參考：

“a developer sitting in the cafe, comic, graphic illustration, comic art, graphic novel art, vibrant, highly detailed, colored, 2d minimalistic”

文章來源：使用 Amazon Bedrock Converse API 簡化大語言模型交互