FreeSwitch通過Websocket（流式雙向語音）對接AI實時語音大模型技術方案（mod_ppy_aduio

FreeSwitch通過Websocket（流式雙向語音）對接AI實時語音大模型技術方案（mod_ppy_aduio_stream）

FreeSwitch通過WebSocket對接AI實時語音大模型插件技術方案

在這里插入圖片描述

1. 方案概述

基于FreeSWITCH的實時通信能力，通過WebSocket協議橋接AI大模型服務，實現低延遲、高并發的智能語音交互系統。支持雙向語音流處理、實時ASR/TTS轉換和動態業務指令執行。
1753095153158#pic_center)

有這么方面項目需要的可聯系。https://cwn1.x3322.net:7777/down/0UgRahEtbPEa.so

類似技術參考：https://www.ddrj.com/callcenter/largemodel.html

2. 架構設計

graph LRA[FreeSWITCH] -->SIP/RTPB(WebSocket網關/SFU)B -->雙向WebSocketC(AI Gateway)C -->HTTP/GRPC StreamD(大模型服務)D -->文本/控制指令CC -->TTS音頻/指令BB -->RTP音頻A

3. 核心組件

組件	技術選型	核心功能
媒體網關	FreeSWITCH 1.10+	處理SIP呼叫、RTP音頻流、DTMF事件管理
協議橋接層	mod_websocket (ESL+自定義模塊)	音頻轉WebSocket二進制流(支持OPUS/PCM)
AI網關	Node.js/Python (Tornado)	雙向WS通信、ASR/TTS調度、會話狀態機管理
大模型接口	GRPC Stream/HTTP2 Server	流式對話處理&指令生成(200ms級響應)
ASR/TTS引擎	阿里云/訊飛/DeepSeek RTS	實時語音<=>文本轉換(<300ms延遲)
模型推理層	DeepSeek-V2/GLM-4 API	流式對話生成，支持SSML控制指令

4. 關鍵流程

4.1 語音輸入流 (User → AI)

FreeSWITCH --(RTP)–> mod_websocket --(WS Binary/OPUS)–> AI網關 --(ASR API)–> 大模型

數據封裝：
json
{
“call_id”: “call-123456”,
“seq”: 1024,
“is_final”: false,
“timestamp”: 1721541687000,
“payload”: “BASE64_OPUS”
}

4.2 AI響應流 (AI → User)

大模型 --(SSML指令)–> AI網關 --(WS控制消息)–> TTS服務 --(RTP)–> FreeSWITCH

中斷響應機制：
- DTMF #鍵觸發barge-in事件
- TTS首包到達時間<100ms

4.3 控制指令示例

json
// ASR識別結果
{“event”:“asr_result”, “text”:“查余額”, “confidence”:0.95}

// TTS響應指令
{“event”:“ai_response”, “type”:“tts”, “audio”:“chunk_123.opus”}

// 業務轉移指令
{“event”:“action”, “command”:“transfer:6001”}

5. 性能優化

音頻分片處理：80ms/幀(160采樣@16kHz)
雙緩沖ASR策略：預加載靜音語音模型加速首字響應
動態抖動緩沖：網絡延遲>150ms時自動補償
會話熱插拔：通話保持時維持AI對話上下文
熔斷機制：模型響應>2s時轉人工服務

6. 異常處理機制

故障場景	解決方案
WebSocket斷連	10秒自動重連+20秒音頻緩存
ASR識別沖突	基于時間戳的序列仲裁
模型響應超時	播放「正在思考」提示音
DTMF中斷事件	立即停止TTS并清空隊列
編碼格式不匹配	OPUS/PCM/G.711動態切換

local cjson = require "dkjson"
local pts   = require "ppytools"local ws_addr = "ws://127.0.0.1:20000"
ws_addr = "wss://127.0.0.1:12345"
--ws_addr = "wss://ai.xxx.com:12345"local records_base = "/workspace/records"local script_path = debug.getinfo(1, "S").source:sub(2)
local script_name = script_path:match("([^/\\]+)$") or "unknown"local fs_api = freeswitch.API()function fslog(msg, log_level)log_level = (log_level ~= nil) and log_level or "info"  -- 嚴格判斷nilfreeswitch.consoleLog(log_level, "[".. script_name .. "] "..msg)
endfunction main()local session_lega = sessionlocal session_lega_uuid = session_lega:get_uuid()fslog(string.format("[START][%s]\n", session_lega_uuid))session_lega:answer()local datetime_dir, records_dir = pts.create_compact_date_dir(records_base)local caller_id_number = session_lega:getVariable("caller_id_number")local destination_number = session_lega:getVariable("destination_number")fslog(string.format("session_lega_uuid: %s , caller_id_number: %s , destination_number: %s\n", session_lega_uuid, caller_id_number, destination_number))--后臺通話錄音if records_dir ~= nil then-- 啟用雙聲道錄音session_lega:setVariable("RECORD_STEREO", "true")  local records_str = string.format("bgapi uuid_record %s start %s/%s.wav 1000 0 0", session_lega_uuid, records_dir, session_lega_uuid)fslog(records_str)fs_api:executeString(records_str) --CDR自定義變量session_lega:setVariable("record_file_uri_path", string.format("%s/%s.wav", datetime_dir, session_lega_uuid))end--缺省將用戶語音數據通過二進制方式發送到AI服務器。--如果這個參數設置為true,則通過JSON格式發送。和AI服務器發給FS的JSON格式一致session_lega:setVariable("STREAM_MESSAGE_SENDJSON", "true")local con = freeswitch.EventConsumer()con:bind("CUSTOM", "mod_audio_stream::json")con:bind("CUSTOM", "mod_audio_stream::connect")con:bind("CUSTOM", "mod_audio_stream::disconnect")con:bind("CUSTOM", "mod_audio_stream::error")local start_time = os.date("%Y-%m-%d %H:%M:%S", os.time())local metadata_obj = {type = "init",sid  = session_lega_uuid,phone_number = caller_id_number,timestamp = start_time}local metadata = cjson.encode(metadata_obj)fslog("metadata:" .. metadata)local result, err = fs_api:execute("uuid_audio_stream", string.format("%s start %s mono 8k %s", session_lega_uuid, ws_addr, metadata))if result thenfslog(string.format("Function executed successfully: %s\n", result), "notice")elsefslog(string.format("Error executing function: %s\n", err), "err")endwhile session_lega:ready() dolocal event = con:pop()if event thenlocal event_uuid = event:getHeader("Unique-ID")if event_uuid == session_lega_uuid thenlocal event_name = event:getHeader("Event-Name")local event_sub = event:getHeader("Event-Subclass")local body = event:getBody()fslog(string.format("JSON executing function, Event-Subclass: %s, body: %s\n", event_sub, body))if event_sub == "mod_audio_stream::connect" then--elseif event_sub == "mod_audio_stream::disconnect" thenbreakelseif event_sub == "mod_audio_stream::json" thenlocal data = cjson.decode(body)if data.type == "sentence" and data.status == "start" thenlocal metadata_obj = {type = "sentence_callback",sentence_id  = data.sentence_id,status = "play",timestamp = os.date("%Y-%m-%d %H:%M:%S", os.time())}local metadata = cjson.encode(metadata_obj)fslog("[send_text]metadata:" .. metadata)fs_api:execute("uuid_audio_stream", string.format("%s send_text %s", session_lega_uuid, metadata))endif data.type == "streamText" thenif data.assistant thenfslog(data.assistant)endendif data.toHuman thenbreakelseif data.stop thenfslog("data stop", "err")elseif data.clear thenfslog("data clear", "err")endelseif event_sub == "mod_audio_stream::error" thenbreakelse--endendelseif session_lega thensession_lega:sleep(20)elsebreakendendend--fs_api:execute("uuid_record", string.format("%s stop", session_lega_uuid))fslog(string.format("[END][%s]\n", session_lega_uuid))
endmain()

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/915610.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/915610.shtml
英文地址，請注明出處：http://en.pswp.cn/news/915610.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！