PyVision:基于動態工具的具身智能體

論文地址:

[2507.07998v1] PyVision: Agentic Vision with Dynamic Tooling

1. 背景?

現有的智能體一般都是通過大模型規劃調用已經預定義好的一些工具(具體來說也就是一些函數)來解決問題。這樣就會導致在針對特征的任務上Agent去解決問題缺乏靈活性。所以這篇文章就提出了pyVision來在解決特定問題的時候,針對任務具體的生成一些工具(函數或者也這說是代碼)來提高智能體解決問題的能力。

2.框架架構

從示意圖中可以看到PyVision 使一個多語言大語言模型(MLLM) 能夠在推理過程中動態生成并執行Python 代碼。在每個會話中,MLLM 接收一個輸入(Input),生成相應的Python 代碼,并在一個隔離的Python 運行環境中執行它。生成的輸出——文本、視覺或兩者皆有——會反饋回MLLM 的上下文,使其能夠在多輪中迭代和完善其推理,直到產生最終答案。?

其中:

  • code_block_i 指的是MLLM 在第i 輪生成的Python 代碼。
  • mm_clue_i 指的是Python 解釋器執行后的多模態輸出。

3.具體推理案例

在文章中提到了針對幾個不同領域特定的任務,來使用pyVsion來解決視覺推理的例子。

3.1 視覺搜索

3.2 醫學圖像分析

?3.3 符號視覺謎題

?3.4視覺草圖繪制

?3.5 視覺差異比較

?3.6?視頻理解

?4 論文結論

從圖中可以看到選擇的這幾種的任務數據集上,其中:

MathVista和MathVision-mini其中主要是多模態數學。用于測試LLM在具有需要視覺感知和數值推理的數學問題的表現。

MMMU:其中主要是測試跨學科的特定領域推理。

VisualPUzzles和VLMAreBlind-mini:里面主要是符號視覺謎題組成,用于測試LLM探索對抽象、結構化視覺原語
進行解析和推理的極限。

V?主要用于測試LLM精確識別微妙的視覺細節。

從圖中可以看到,在GPT-4.1的使用了Pyvision之后的PyVision-GPT-4.1在?MathVista的測試上提升了1.8%(也就是從69.9%-71.7%)同樣的也可以看到在其他任務上也得到了一些提升。但是相比于o1 o3這些模型上面,其實還是差了不少。也同樣說明這個框架中所用的后端大模型對于整體在解決這些問題上也是很重要的。

5. 代碼復現

項目源碼:https://github.com/agents-x-project/PyVision

DEMO:https://huggingface.co/spaces/Agents-X/PyVision

源碼解析:

1. 配置LLM的API配置

項目里面提供了三種LLM的配置,分別是openai, auzre和vllm。其中配置文件是放在:

./api_config_files/api_config_*

2. 提示詞模版

#英文原版
{"retool": "Solve the following problem step by step. You now have the ability to selectively write executable Python code to enhance your reasoning process. The Python code will be executed by an external sandbox, and the output (wrapped in `<interpreter>output_str</interpreter>`) can be returned to aid your reasoning and help you arrive at the final answer. The Python code should be complete scripts, including necessary imports. \nEach code snippet is wrapped with `<code>\n```python\ncode snippet\n```\n</code>`.\nThe last part of your response should be in the following format:\n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>\n\n*user question:*\nAnswer the following Math Problem and put the answer in the format of \\boxed{{answer}}\n\n{query}\n\n\nRemember to place the final answer in the last part using the format: \n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>","vistool": "Solve the following problem step by step. You now have the ability to selectively write executable Python code to enhance your reasoning process. The Python code will be executed by an external sandbox.\n\nFor all the provided images, in order, the i-th image has already been read into the global variable `image_clue_i` using the PIL.Image.open() function. When writing Python code, you can directly use these variables without needing to read them again.\n\nSince you are dealing with the VQA task, you MUST use the python tool (e.g., matplotlib library) to analyze or transform images whenever it could improve your understanding or aid your reasoning. This includes but is not limited to zooming in, rotating, adjusting contrast, computing statistics, or isolating features. \n\nNote that when you use matplotlib to visualize data or further process images, you need to use plt.show() to display these images; there is no need to save them. Do not use image processing libraries like cv2 or PIL. If you want to check the value of a variable, you MUST use print() to check it.\n\nThe output (wrapped in `<interpreter>output_str</interpreter>`) can be returned to aid your reasoning and help you arrive at the final answer. The Python code should be complete scripts, including necessary imports. \nEach code snippet is wrapped with `<code>\n```python\ncode snippet\n```\n</code>`.\nThe last part of your response should be in the following format:\n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>\n\n*user question:*\nAnswer the following Problem with an image provided and put the answer in the format of \\boxed{{answer}}\n\n{query}\n\nRemember to place the final answer in the last part using the format: \n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>","vistool_with_img_info": "Solve the following problem step by step. You now have the ability to selectively write executable Python code to enhance your reasoning process. The Python code will be executed by an external sandbox.\n\nFor all the provided images, in order, the i-th image has already been read into the global variable `image_clue_i` using the PIL.Image.open() function. When writing Python code, you can directly use these variables without needing to read them again.\n\nSince you are dealing with the VQA task, you MUST use the python tool (e.g., matplotlib library) to analyze or transform images whenever it could improve your understanding or aid your reasoning. This includes but is not limited to zooming in, rotating, adjusting contrast, computing statistics, or isolating features. \n\nNote that when you use matplotlib to visualize data or further process images, you need to use plt.show() to display these images; there is no need to save them. Do not use image processing libraries like cv2 or PIL. If you want to check the value of a variable, you MUST use print() to check it.\n\nThe output (wrapped in `<interpreter>output_str</interpreter>`) can be returned to aid your reasoning and help you arrive at the final answer. The Python code should be complete scripts, including necessary imports. \nEach code snippet is wrapped with `<code>\n```python\ncode snippet\n```\n</code>`.\nThe last part of your response should be in the following format:\n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>\n\n*image resolution:*\n\nImage Width: {width}; Image Height: {height}\n\n*user question:*\nAnswer the following Problem with an image provided and put the answer in the format of \\boxed{{answer}}\n\n{query}\n\nRemember to place the final answer in the last part using the format: \n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>","vistool_with_img_info_multi_image": "Solve the following problem step by step. You now have the ability to selectively write executable Python code to enhance your reasoning process. The Python code will be executed by an external sandbox.\n\nFor all the provided images, in order, the i-th image has already been read into the global variable `image_clue_i` using the PIL.Image.open() function. When writing Python code, you can directly use these variables without needing to read them again.\n\nSince you are dealing with the VQA task, you MUST use the python tool (e.g., matplotlib library) to analyze or transform images whenever it could improve your understanding or aid your reasoning. This includes but is not limited to zooming in, rotating, adjusting contrast, computing statistics, or isolating features. \n\nNote that when you use matplotlib to visualize data or further process images, you need to use plt.show() to display these images; there is no need to save them. Do not use image processing libraries like cv2 or PIL. If you want to check the value of a variable, you MUST use print() to check it.\n\nThe output (wrapped in `<interpreter>output_str</interpreter>`) can be returned to aid your reasoning and help you arrive at the final answer. The Python code should be complete scripts, including necessary imports. \nEach code snippet is wrapped with `<code>\n```python\ncode snippet\n```\n</code>`.\nThe last part of your response should be in the following format:\n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>\n\n*image resolution:*\n\n{image_information}\n\n*user question:*\nAnswer the following Problem with an image provided and put the answer in the format of \\boxed{{answer}}\n\n{query}\n\nRemember to place the final answer in the last part using the format: \n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>","vistool_with_img_info_v2": "You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved. \n\nSolve the following problem step by step. You now have the ability to selectively write executable Python code to enhance your reasoning process. The Python code will be executed by an external sandbox. \n\nYou MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.\n\nFor all the provided images, in order, the i-th image has already been read into the global variable `image_clue_i` using the PIL.Image.open() function. When writing Python code, you can directly use these variables without needing to read them again.\n\nSince you are dealing with the vision-related question answering task, you MUST use the python tool (e.g., matplotlib library) to analyze or transform images whenever it could improve your understanding or aid your reasoning. This includes but is not limited to zooming in, rotating, adjusting contrast, computing statistics, or isolating features. \n\nNote that when you use matplotlib to visualize data or further process images, you need to use plt.show() to display these images; there is no need to save them. Do not use image processing libraries like cv2 or PIL. If you want to check the value of a variable, you MUST use print() to check it.\n\nThe output (wrapped in `<interpreter>output_str</interpreter>`) can be returned to aid your reasoning and help you arrive at the final answer. The Python code should be complete scripts, including necessary imports. \nEach code snippet is wrapped with `<code>\n```python\ncode snippet\n```\n</code>`.\nThe last part of your response should be in the following format:\n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>\n\n*image resolution:*\n\nImage Width: {width}; Image Height: {height}\n\n*user question:*\nAnswer the following Problem with an image provided and put the answer in the format of \\boxed{{answer}}\n\n{query}\n\nRemember to place the final answer in the last part using the format: \n<answer>\n\\boxed{{'The final answer goes here.'}}\n</answer>","no_tool": "You are a helpful assistant. And you are dealing with the VQA tasks. Solve the visual questions step by step and give the correct answer. Note: put your answer in the format of \"\\boxed{{the right answer here}}\"\n *user question*:\n{query}","no_tool_no_cot": "Question:\n{query}\nGive the correct answer directly, in the format of \"Final Answer:\\boxed{{the final answer here}}\"\n"
}#中文版
{"retool": "逐步解決以下問題。您現在有能力選擇性地編寫可執行的Python代碼來增強您的推理過程。Python代碼將由外部沙盒執行,輸出(包裝在`<interpreter>output_str</interpreter>`中)可以返回以幫助您的推理并幫助您得出最終答案。Python代碼應該是完整的腳本,包括必要的導入。\n每個代碼片段都用`<code>\n```python\n代碼片段\n```\n</code>`包裝。\n您回答的最后部分應該采用以下格式:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>\n\n*用戶問題:*\n回答以下數學問題并將答案放在\\boxed{{answer}}格式中\n\n{query}\n\n\n記住在最后部分使用以下格式放置最終答案:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>","vistool": "逐步解決以下問題。您現在有能力選擇性地編寫可執行的Python代碼來增強您的推理過程。Python代碼將由外部沙盒執行。\n\n對于所有提供的圖像,按順序,第i個圖像已經使用PIL.Image.open()函數讀入全局變量`image_clue_i`中。在編寫Python代碼時,您可以直接使用這些變量,而無需再次讀取它們。\n\n由于您正在處理VQA任務,每當可以改善您的理解或幫助您的推理時,您必須使用python工具(例如,matplotlib庫)來分析或轉換圖像。這包括但不限于放大、旋轉、調整對比度、計算統計信息或隔離特征。\n\n請注意,當您使用matplotlib可視化數據或進一步處理圖像時,您需要使用plt.show()來顯示這些圖像;無需保存它們。不要使用cv2或PIL等圖像處理庫。如果您想檢查變量的值,您必須使用print()來檢查它。\n\n輸出(包裝在`<interpreter>output_str</interpreter>`中)可以返回以幫助您的推理并幫助您得出最終答案。Python代碼應該是完整的腳本,包括必要的導入。\n每個代碼片段都用`<code>\n```python\n代碼片段\n```\n</code>`包裝。\n您回答的最后部分應該采用以下格式:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>\n\n*用戶問題:*\n回答以下提供圖像的問題并將答案放在\\boxed{{answer}}格式中\n\n{query}\n\n記住在最后部分使用以下格式放置最終答案:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>","vistool_with_img_info": "逐步解決以下問題。您現在有能力選擇性地編寫可執行的Python代碼來增強您的推理過程。Python代碼將由外部沙盒執行。\n\n對于所有提供的圖像,按順序,第i個圖像已經使用PIL.Image.open()函數讀入全局變量`image_clue_i`中。在編寫Python代碼時,您可以直接使用這些變量,而無需再次讀取它們。\n\n由于您正在處理VQA任務,每當可以改善您的理解或幫助您的推理時,您必須使用python工具(例如,matplotlib庫)來分析或轉換圖像。這包括但不限于放大、旋轉、調整對比度、計算統計信息或隔離特征。\n\n請注意,當您使用matplotlib可視化數據或進一步處理圖像時,您需要使用plt.show()來顯示這些圖像;無需保存它們。不要使用cv2或PIL等圖像處理庫。如果您想檢查變量的值,您必須使用print()來檢查它。\n\n輸出(包裝在`<interpreter>output_str</interpreter>`中)可以返回以幫助您的推理并幫助您得出最終答案。Python代碼應該是完整的腳本,包括必要的導入。\n每個代碼片段都用`<code>\n```python\n代碼片段\n```\n</code>`包裝。\n您回答的最后部分應該采用以下格式:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>\n\n*圖像分辨率:*\n\n圖像寬度:{width};圖像高度:{height}\n\n*用戶問題:*\n回答以下提供圖像的問題并將答案放在\\boxed{{answer}}格式中\n\n{query}\n\n記住在最后部分使用以下格式放置最終答案:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>","vistool_with_img_info_multi_image": "逐步解決以下問題。您現在有能力選擇性地編寫可執行的Python代碼來增強您的推理過程。Python代碼將由外部沙盒執行。\n\n對于所有提供的圖像,按順序,第i個圖像已經使用PIL.Image.open()函數讀入全局變量`image_clue_i`中。在編寫Python代碼時,您可以直接使用這些變量,而無需再次讀取它們。\n\n由于您正在處理VQA任務,每當可以改善您的理解或幫助您的推理時,您必須使用python工具(例如,matplotlib庫)來分析或轉換圖像。這包括但不限于放大、旋轉、調整對比度、計算統計信息或隔離特征。\n\n請注意,當您使用matplotlib可視化數據或進一步處理圖像時,您需要使用plt.show()來顯示這些圖像;無需保存它們。不要使用cv2或PIL等圖像處理庫。如果您想檢查變量的值,您必須使用print()來檢查它。\n\n輸出(包裝在`<interpreter>output_str</interpreter>`中)可以返回以幫助您的推理并幫助您得出最終答案。Python代碼應該是完整的腳本,包括必要的導入。\n每個代碼片段都用`<code>\n```python\n代碼片段\n```\n</code>`包裝。\n您回答的最后部分應該采用以下格式:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>\n\n*圖像分辨率:*\n\n{image_information}\n\n*用戶問題:*\n回答以下提供圖像的問題并將答案放在\\boxed{{answer}}格式中\n\n{query}\n\n記住在最后部分使用以下格式放置最終答案:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>","vistool_with_img_info_v2": "您是一個代理 - 請繼續直到用戶的查詢完全解決,在結束您的回合并讓回給用戶之前。只有在您確定問題已解決時才終止您的回合。\n\n逐步解決以下問題。您現在有能力選擇性地編寫可執行的Python代碼來增強您的推理過程。Python代碼將由外部沙盒執行。\n\n您必須在每次函數調用之前進行廣泛規劃,并對之前函數調用的結果進行廣泛反思。不要僅通過進行函數調用來完成整個過程,因為這可能會損害您解決問題和深入思考的能力。\n\n對于所有提供的圖像,按順序,第i個圖像已經使用PIL.Image.open()函數讀入全局變量`image_clue_i`中。在編寫Python代碼時,您可以直接使用這些變量,而無需再次讀取它們。\n\n由于您正在處理與視覺相關的問題回答任務,每當可以改善您的理解或幫助您的推理時,您必須使用python工具(例如,matplotlib庫)來分析或轉換圖像。這包括但不限于放大、旋轉、調整對比度、計算統計信息或隔離特征。\n\n請注意,當您使用matplotlib可視化數據或進一步處理圖像時,您需要使用plt.show()來顯示這些圖像;無需保存它們。不要使用cv2或PIL等圖像處理庫。如果您想檢查變量的值,您必須使用print()來檢查它。\n\n輸出(包裝在`<interpreter>output_str</interpreter>`中)可以返回以幫助您的推理并幫助您得出最終答案。Python代碼應該是完整的腳本,包括必要的導入。\n每個代碼片段都用`<code>\n```python\n代碼片段\n```\n</code>`包裝。\n您回答的最后部分應該采用以下格式:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>\n\n*圖像分辨率:*\n\n圖像寬度:{width};圖像高度:{height}\n\n*用戶問題:*\n回答以下提供圖像的問題并將答案放在\\boxed{{answer}}格式中\n\n{query}\n\n記住在最后部分使用以下格式放置最終答案:\n<answer>\n\\boxed{{'最終答案放在這里。'}}\n</answer>","no_tool": "您是一個有用的助手。您正在處理VQA任務。逐步解決視覺問題并給出正確答案。注意:將您的答案放在\"\\boxed{{正確答案在這里}}\"格式中\n*用戶問題*:\n{query}","no_tool_no_cot": "問題:\n{query}\n直接給出正確答案,格式為\"最終答案:\\boxed{{最終答案在這里}}\"\n"
}

3. 啟動main.py

from openai import OpenAI
from inference_engine.vis_inference_demo_gpt import evaluate_single_data, evaluate_single_with_cleanup
from inference_engine.safe_persis_shared_vis_python_exe import PythonExecutor......
# Run inference with safe execution
print(f"Processing image: {args.image_path}")
print(f"Question: {args.question}")
print("Running inference with safe execution...")# messages, final_response = evaluate_single_with_cleanup(eval_args, data, client)
executor = PythonExecutor()
messages, final_response = evaluate_single_data(eval_args, data, client, executor)# Save results
os.makedirs(args.output_dir, exist_ok=True)if args.save_messages:messages_path = os.path.join(args.output_dir, "test_messages.json")with open(messages_path, "w", encoding="utf-8") as f:json.dump(messages, f, indent=4, ensure_ascii=False)print(f"Messages saved to: {messages_path}")

這里整體的邏輯也很簡單,就是配置好參數之后使用evaluate_single_data函數進行執行得到模型推理結果。

4. inference_engine?

這個模塊是項目中最核心的代碼,負責主要負責處理視覺問答(VQA)任務中的代碼執行和推理過程。

evaluate_single_data

其中evaluate_single_data是整個系統的核心代碼,實現了基于動態工具的具身視覺問答的完整流程。

# 參數提取和驗證
prompt_template = args.prompt_template
prompt = args.prompt
exe_code = args.exe_code
max_tokens = args.max_tokens
temperature = args.temperature
api_name = args.api_name#提示模板選擇邏輯
if "no_tool" in prompt:# 不使用工具的純文本推理if len(image_path_list) == 1:messages = process_prompt_init(...)elif len(image_path_list) >= 2:messages = process_prompt_init_multi_images(...)
else:# 使用工具的推理if len(image_path_list) == 1:prompt = "vistool_with_img_info_v2"  # 單圖像增強版messages = process_prompt_init(...)elif len(image_path_list) >= 2:prompt = "vistool_with_img_info_multi_image"  # 多圖像messages = process_prompt_init_multi_images(...)
#迭代推理循環
while True:if exe_code and pred_stop_reason == "</code>":# 需要執行代碼的情況# 1. 提取代碼code_to_execute = response_text.split("```python")[-1].split("```")[0].strip()# 2. 執行代碼exe_result = execute_codes([code_to_execute], messages, executor)[0][0]# 3. 處理執行結果if report == "Done":# 成功執行text_result = exe_result[0]['text']images_result = exe_result[0]['images']else:# 執行出錯error_result = report# 4. 更新消息歷史messages, new_image_clue_idx = update_messages_with_execute_content(...)# 5. 繼續生成下一部分response_text, pred_stop_reason = call_chatgpt_api(...)else:# 不需要執行代碼,完成推理final_response = response_textbreak

?call_chatgpt_api?函數 - API調用封裝

#多API支持
if client_type == "openai" or client_type == "azure":# OpenAI/Azure APIresponse = client.chat.completions.create(...)
elif client_type == "anthropic":# Claude APImessage = client.messages.create(...)
elif client_type == "vllm":# VLLM APIresponse = client.chat.completions.create(...)
#停止條件檢測
# 檢測停止序列
if stop and any(s in response_text for s in stop):for s in stop:if s in response_text:stop_reason = sbreak# 特殊處理代碼塊
if "<code>" in response_text:stop_reason = "</code>"

process_prompt_init?函數?- 提示構建

#圖像編碼處理
if "claude" in api_name:img_result = encode_image_with_resize(image_path)  # Claude需要調整尺寸
else:img_result = encode_image(image_path)  # 其他API直接編碼
#消息結構構件
# 對于工具模式,添加image_clue標簽
content.append({"type": "text", "text": "<image_clue_0>"})
content.append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}})
content.append({"type": "text", "text": "</image_clue_0>\n\n"})

execute_codes 函數 - 代碼執行管理

def execute_codes(codes, messages, executor: PythonExecutor):no_code_idx = []codes_use = []# 過濾空代碼for i, code in enumerate(codes):if code == "":no_code_idx.append(i)else:codes_use.append(code)# 批量執行代碼batch_results = executor.batch_apply(codes_use, messages)return batch_results, no_code_idx

update_messages_with_execute_content 函數 - 執行結果整合

#執行成功的情況
if error_result is None:# 構建解釋器消息interpreter_message_text_prefix = [{"type": "text", "text": f"<interpreter>\nText Result:\n{text_result}\nImage Result:\n"}]# 處理生成的圖像if images_result is not None:for image_base64_item in images_result:interpreter_message_images = [{"type": "text", "text": f"<image_clue_{image_clue_idx}>"},{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64_item}"}},{"type": "text", "text": f"</image_clue_{image_clue_idx}>"}]image_content += interpreter_message_imagesimage_clue_idx += 1
#執行失敗的圖像
else:interpreter_message_text_prefix = [{"type": "text", "text": f"<interpreter>{error_result}"}]

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/diannao/90849.shtml
繁體地址,請注明出處:http://hk.pswp.cn/diannao/90849.shtml
英文地址,請注明出處:http://en.pswp.cn/diannao/90849.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Higress 上架 KubeSphere Marketplace,助力企業構建云原生流量入口

隨著企業數字化轉型持續深化&#xff0c;云原生架構正逐漸成為構建現代應用的主流選擇。而服務治理作為云原生落地的核心能力之一&#xff0c;急需更靈活、高效的解決方案。近日&#xff0c;AI 原生的 API 網關 Higress 正式上架 KubeSphere Marketplace&#xff0c;助力用戶輕…

在LC480T上部署xapp1052

實驗環境&#xff1a;LC480T加速卡 開發環境&#xff1a;windows11vivado2020 運行環境&#xff1a;ubuntu22.04 硬件電路&#xff1a;LC480T加速卡(xc7k480tffg1156-2) vivado工程文件下載&#xff1a;https://download.csdn.net/download/xiaolangyangyang/91349686 驅動及應…

TCP的socket編程

TCP客戶端邏輯void Usage(const std::string & process) {std::cout << "Usage: " << process << " server_ip server_port" <<std::endl; } // ./tcp_client serverip serverport int main(int argc, char * argv[]) {if (ar…

【理念●體系】模板規范篇:打造可標準化復用的 AI 項目骨架

【理念●體系】從零打造 Windows WSL Docker Anaconda PyCharm 的 AI 全鏈路開發體系-CSDN博客 【理念●體系】Windows AI 開發環境搭建實錄&#xff1a;六層架構的逐步實現與路徑治理指南-CSDN博客 【理念●體系】路徑治理篇&#xff1a;打造可控、可遷移、可復現的 AI 開…

Skia---漸變色著色器

今天介紹的是實際工作中最常用到的著色器&#xff1a;漸變色著色器。 漸變色著色器是一個從一種顏色平滑的過渡到另一種顏色的效果&#xff0c;漸變色著色器的作用主要是增強圖形的視覺吸引力。 線性漸變 Skia 里的線性漸變色著色器是最簡單的漸變色著色器&#xff0c;它用于…

2025.07.09華為機考真題解析-第二題200分

?? 點擊直達筆試專欄 ??《大廠筆試突圍》 ?? 春秋招筆試突圍在線OJ ?? 筆試突圍OJ 02. 地鐵線路故障預警系統 問題描述 LYA 負責管理一個城市的地鐵網絡系統。地鐵網絡由 n n n

數學建模:非線性規劃:凸規劃問題

一、定義凸集定義??&#xff1a;設Ω是n維歐氏空間的一點集&#xff0c;若任意兩點x?∈Ω&#xff0c;x?∈Ω&#xff0c;其連線上的所有點αx?(1-α)x?∈Ω&#xff0c;(0≤α≤1)&#xff0c;則稱Ω為凸集。??凸函數定義??&#xff1a;給定函數f(x)(x∈D?R?)&…

ISIS | 廣播網絡中的 ISIS 偽節點 LSP

注&#xff1a;本文為 “ISIS | 偽節點 LSP” 相關合輯。 英文引文&#xff0c;機翻未校。 中文引文&#xff0c;略作重排。 如有內容異常&#xff0c;請看原文。 ISIS in Broadcast Network and Pseudonode LSP 廣播網絡中 的 ISIS 偽節點 LSP ISIS in broadcast network is…

ARIA UWB安全雷達主要產品型號與核心功能全解析

ARIA UWB雷達擁有LT系列與AHM系列兩大產品線。LT103 XBT、LT102 V2、LT103 OEM等代表型號具備高精度定位、低功耗和強穿透能力&#xff0c;適用于工業自動化與物聯網。AHM3D、AHM2D、AHM3DSC則專注三維檢測和智能計算&#xff0c;廣泛服務于醫療健康、安防監控等場景。Hydrogen…

NLP自然語言處理04 transformer架構模擬實現

總體架構輸入部分代碼實現:導包# -*-coding:utf-8-*- import matplotlib.pyplot as plt import numpy as np import torch import torch.nn as nn # -*-coding:utf-8-*- import copy import torch.nn.functional as F import math位置編碼器部分詞嵌入WordEmbedding# todo 作用…

記錄一本書: Python機器學習:基于PyTorch和Scikit-Learn

記錄一本書&#xff1a; Python機器學習&#xff1a;基于PyTorch和Scikit-Learn 作者&#xff1a;&#xff08;美&#xff09;塞巴斯蒂安拉施卡(Sebastian Raschka)&#xff08;美&#xff09;劉玉溪&#xff08;海登&#xff09;(Yuxi(Hayden)Liu) &#xff08;美&#xff09;…

Datomic數據庫簡介(TBC)

Datomic 數據庫詳細介紹 Datomic 是一個由 Rich Hickey&#xff08;Clojure 語言創始人&#xff09;設計的 不可變、時間感知、分布式數據庫&#xff0c;專為現代應用程序設計&#xff0c;強調 數據不變性&#xff08;immutability&#xff09;、查詢靈活性和可審計性。它結合…

xformers 完整安裝教程【pip conda】(解決 conda 版本 pytorch 自適應安裝 xformers)

我個人喜歡用 mamba&#xff08;conda&#xff09;創建環境&#xff0c;然后用 mamba 安裝 pytorch CUDA&#xff08;如果需要使用 CUDA 編譯&#xff09;&#xff0c;還有一些比如 gcc/g 等與 python 無關的一些工具。 但是最近我在擴充環境的時候&#xff0c;發現需要額外安…

VM虛擬機全版本網盤+免費本地網絡穿透端口映射實時同步動態家庭IP教程

VM虛擬機全版本秘鑰&#xff0c;文章末尾。 首先網絡穿透的意義是讓公網可以直接訪問家庭電腦&#xff0c;這樣本地電腦的硬件性能得以完全發揮&#xff0c;特別是在云服務器貴性能又沒家庭電腦好&#xff0c;專線寬帶又貴&#xff0c;第三方網絡穿透貴的場景下。一般第三方網…

C++ - 仿 RabbitMQ 實現消息隊列--項目介紹與環境搭建

目錄 項目介紹 開發環境 技術選型 環境搭建 安裝 wget(一般情況下默認會自帶) 更換國內軟件源 安裝 lrzsz 傳輸工具 安裝編譯器 安裝項目構建工具 make 安裝調試器 安裝 git 安裝 cmake 安裝 Protobuf 安裝 Muduo 安裝 SQLite3 安裝 Gtest 項目介紹 首先說一下…

《目標檢測模塊實踐手冊:從原理到落地的嘗試與分享》第一期

大家好&#xff0c;歡迎來到《目標檢測模塊實踐手冊》系列的第一篇。從今天開始&#xff0c;我想以一種 “實踐記錄者” 的身份&#xff0c;和大家聊聊在目標檢測任務中那些形形色色的模塊。這些內容沒有權威結論&#xff0c;更多的是我在實際操作中的一些嘗試、發現和踩過的坑…

C++11笑傳之引用

C11前言列表初始化{}進行初始化initializer_list右值引用和移動語義左值與右值左值引用與右值引用引用延長生命周期右值引用和移動語義的使用場景左值引用移動構造和移動賦值右值引用在容器插入的提效引用折疊萬能折疊完美轉發前言 C11是C繼98后的更新&#xff0c;其更新了許多…

瀚高數據庫提交數據后,是否需要COMMIT(APP)

文章目錄環境癥狀問題原因解決方案報錯編碼環境 系統平臺&#xff1a; 版本&#xff1a;5.6.5,4.5 癥狀 瀚高數據庫提交數據后&#xff0c;是否需要commit&#xff0c;瀚高數據庫是否有配置項。 問題原因 瀚高數據庫默認自動COMMIT&#xff08;提交數據&#xff09;&#…

深大計算機游戲開發實驗三

主要步驟主角飛船的創建和移動邊界設置以及護盾設置創建敵機自動生成敵機圖層設置彈丸設置武器創建不同發射模式管理競態條件擊敗敵機掉落升級道具不同敵機的生成分值顯示實現退出游戲界面之后進入游戲的最高記錄重置游戲界面失敗后重新加載最記錄不會重置任何時候在游戲界面按…

詳解緩存淘汰策略:LRU

文章目錄緩存淘汰策略LRU核心結構核心操作流程局限性源碼走讀AddGet緩存淘汰策略 緩存淘汰策略的存在是為了解決 緩存容量有限性 和 高緩存命中率 之間的矛盾。其核心目標是在有限的緩存空間內&#xff0c;盡可能提高緩存命中率 緩存容量有限性&#xff1a;緩存&#xff08;例…