FunASR實時多人對話語音識別、分析、端點檢測

??核心功能：FunASR是一個基礎語音識別工具包，提供多種功能，包括語音識別（ASR）、語音端點檢測（VAD）、標點恢復、語言模型、說話人驗證、說話人分離和多人對話語音識別等。FunASR提供了便捷的腳本和教程，支持預訓練好的模型的推理與微調。
項目地址: FunASR
模型倉庫: ModelScope
??????Huggingface

支持以下幾種服務部署：
Paraformer
一、FunASR離線文件轉寫服務GPU版本
??FunASR離線文件轉寫GPU軟件包，提供了一款功能強大的語音離線文件轉寫服務。擁有完整的語音識別鏈路，結合了語音端點檢測、語音識別、標點等模型，可以將幾十個小時的長音頻與視頻識別成帶標點的文字，而且支持上百路請求同時進行轉寫。輸出為帶標點的文字，含有字級別時間戳，支持ITN與用戶自定義熱詞等。服務端集成有ffmpeg，支持各種音視頻格式輸入。軟件包提供有html、python、c++、java與c#等多種編程語言客戶端，支持直接使用與進一步開發。

在這里插入圖片描述

官方推薦配置：8核vCPU，內存32G，V100，單機可以支持大約20路的請求（詳細性能測試報告、云服務試用）

快速使用：
1、docker安裝

curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh；
sudo bash install_docker.sh

2、鏡像啟動
通過下述命令拉取并啟動FunASR軟件包的docker鏡像：

sudo docker pull \registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.2.1
mkdir -p ./funasr-runtime-resources/models
sudo docker run --gpus=all -p 10098:10095 -it --privileged=true \-v $PWD/funasr-runtime-resources/models:/workspace/models \registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.2.1

3、服務端啟動
docker啟動之后，啟動 funasr-wss-server服務程序:

cd FunASR/runtime
nohup bash run_server.sh \--download-model-dir /workspace/models \--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \--model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch  \--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \--itn-dir thuduj12/fst_itn_zh \--hotword /workspace/models/hotwords.txt > log.txt 2>&1 &***服務首次啟動時會導出torchscript模型，耗時較長，請耐心等待***
# 如果您想關閉ssl，增加參數：--certfile 0
# 默認加載時間戳模型，如果您想使用nn熱詞模型進行部署，請設置--model-dir為對應模型：
#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch（時間戳）
#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404（nn熱詞）
# 如果您想在服務端加載熱詞，請在宿主機文件./funasr-runtime-resources/models/hotwords.txt配置熱詞（docker映射地址為/workspace/models/hotwords.txt）:
#   每行一個熱詞，格式(熱詞 權重)：阿里巴巴 20（注：熱詞理論上無限制，但為了兼顧性能和效果，建議熱詞長度不超過10，個數不超過1k，權重1~100）

可定制ngram（參考文檔）

客戶端測試與使用
下載客戶端測試工具目錄samples

wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz

以Python語言客戶端為例，進行說明，支持多種音頻格式輸入（.wav, .pcm, .mp3等），也支持視頻輸入(.mp4等)，以及多文件列表wav.scp輸入

python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"

客戶端用法詳解
??在服務器上完成FunASR服務部署以后，可以通過如下的步驟來測試和使用離線文件轉寫服務。目前分別支持Python、CPP、HTML、JAVA

python-client
若想直接運行client進行測試，可參考如下簡易說明，以python版本為例：

python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline \--audio_in "../audio/asr_example.wav" --output_dir "./results"--host 為FunASR runtime-SDK服務部署機器ip，默認為本機ip（127.0.0.1），如果client與服務不在同一臺服務器，需要改為部署機器ip
--port 10095 部署端口號
--mode offline表示離線文件轉寫
--audio_in 需要進行轉寫的音頻文件，支持文件路徑，文件列表wav.scp
--thread_num 設置并發發送線程數，默認為1
--ssl 設置是否開啟ssl證書校驗，默認1開啟，設置為0關閉
--hotword 熱詞文件，每行一個熱詞，格式(熱詞 權重)：阿里巴巴 20
--use_itn 設置是否使用itn，默認1開啟，設置為0關閉

cpp-client
進入samples/cpp目錄后，可以用cpp進行測試，指令如下：

./funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav--server-ip 為FunASR runtime-SDK服務部署機器ip，默認為本機ip（127.0.0.1），如果client與服務不在同一臺服務器，需要改為部署機器ip
--port 10095 部署端口號
--wav-path 需要進行轉寫的音頻文件，支持文件路徑
--hotword 熱詞文件，每行一個熱詞，格式(熱詞 權重)：阿里巴巴 20
--thread-num 設置客戶端線程數
--use-itn 設置是否使用itn，默認1開啟，設置為0關閉

Html網頁版
??在瀏覽器中打開 html/static/index.html，即可出現如下頁面，支持麥克風輸入與文件上傳，直接進行體驗
在這里插入圖片描述
Java-client

FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline

服務端用法詳解：
啟動FunASR服務

cd /workspace/FunASR/runtime
nohup bash run_server.sh \--download-model-dir /workspace/models \--model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch \--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \--itn-dir thuduj12/fst_itn_zh \--certfile  ../../../ssl_key/server.crt \--keyfile ../../../ssl_key/server.key \--hotword ../../hotwords.txt  > log.txt 2>&1 &

run_server.sh命令參數介紹

--download-model-dir 模型下載地址，通過設置model ID從Modelscope下載模型
--model-dir  modelscope model ID 或者 本地模型路徑
--vad-dir  modelscope model ID 或者 本地模型路徑
--punc-dir  modelscope model ID 或者 本地模型路徑
--lm-dir modelscope model ID 或者 本地模型路徑
--itn-dir modelscope model ID 或者 本地模型路徑
--port  服務端監聽的端口號，默認為 10095
--decoder-thread-num  服務端線程池個數(支持的最大并發路數)，**建議每路分配1G顯存，即20G顯存可配置20路并發**
--io-thread-num  服務端啟動的IO線程數
--model-thread-num  每路識別的內部線程數(控制ONNX模型的并行)，默認為 1，其中建議 decoder-thread-num*model-thread-num 等于總線程數
--certfile  ssl的證書文件，默認為：../../../ssl_key/server.crt，如果需要關閉ssl，參數設置為0
--keyfile   ssl的密鑰文件，默認為：../../../ssl_key/server.key
--hotword   熱詞文件路徑，每行一個熱詞，格式：熱詞 權重(例如:阿里巴巴 20)，如果客戶端提供熱詞，則與客戶端提供的熱詞合并一起使用，服務端熱詞全局生效，客戶端熱詞只針對對應客戶端生效。

關閉FunASR服務

# 查看 funasr-wss-server 對應的PID
ps -x | grep funasr-wss-server
kill -9 PID

修改模型及其他參數
??替換正在使用的模型或者其他參數，需先關閉FunASR服務，修改需要替換的參數，并重新啟動FunASR服務。其中模型需為ModelScope中的ASR/VAD/PUNC模型，或者從ModelScope中模型finetune后的模型。

# 例如替換ASR模型為 damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch，則如下設置參數 --model-dir--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch 
# 設置端口號 --port--port <port number>
# 設置服務端啟動的推理線程數 --decoder-thread-num--decoder-thread-num <decoder thread num>
# 設置服務端啟動的IO線程數 --io-thread-num--io-thread-num <io thread num>
# 關閉SSL證書 --certfile 0

??執行上述指令后，啟動離線文件轉寫服務。如果模型指定為ModelScope中model id，會自動從MoldeScope中下載模型

二、英文離線文件轉寫服務（CPU版本）
??英文離線文件轉寫服務部署（CPU版本），擁有完整的語音識別鏈路，可以將幾十個小時的長音頻與視頻識別成帶標點的文字，而且支持上百路請求同時進行轉寫。
??FunASR提供可一鍵本地或者云端服務器部署的英文離線文件轉寫服務，內核為FunASR已開源runtime-SDK。FunASR-runtime結合了達摩院語音實驗室在Modelscope社區開源的語音端點檢測(VAD)、Paraformer-large語音識別(ASR)、標點檢測(PUNC) 等相關能力，可以準確、高效的對音頻進行高并發轉寫。

服務器配置
官方推薦配置：

· 配置1: （X86，計算型），4核vCPU，內存8G，單機可以支持大約32路的請求
· 配置2: （X86，計算型），16核vCPU，內存32G，單機可以支持大約64路的請求
· 配置3: （X86，計算型），64核vCPU，內存128G，單機可以支持大約200路的請求

1、docker安裝

curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh；
sudo bash install_docker.sh

2、鏡像啟動

sudo docker pull \registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-en-cpu-0.1.8
mkdir -p ./funasr-runtime-resources/models
sudo docker run -p 10097:10095 -it --privileged=true \-v $PWD/funasr-runtime-resources/models:/workspace/models \registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-en-cpu-0.1.8

3、服務端啟動

cd FunASR/runtime
nohup bash run_server.sh \--download-model-dir /workspace/models \--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \--model-dir damo/speech_paraformer-large_asr_nat-en-16k-common-vocab10020-onnx  \--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx  > log.txt 2>&1 &# 如果您想關閉ssl，增加參數：--certfile 0

4、客戶端測試與使用

wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gzpython3 funasr_wss_client.py --host "127.0.0.1" --port 10097 --mode offline --audio_in "../audio/asr_example.wav"

三、中文實時語音聽寫服務（CPU版本）
在這里插入圖片描述
1、docker安裝

curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh
sudo bash install_docker.sh

2、鏡像啟動

sudo docker pull \registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.13
mkdir -p ./funasr-runtime-resources/models
sudo docker run -p 10096:10095 -it --privileged=true \-v $PWD/funasr-runtime-resources/models:/workspace/models \registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.13

3、服務端啟動

cd FunASR/runtime
nohup bash run_server_2pass.sh \--download-model-dir /workspace/models \--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \--model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx  \--online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx  \--punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \--itn-dir thuduj12/fst_itn_zh \--hotword /workspace/models/hotwords.txt > log.txt 2>&1 &# 如果您想關閉ssl，增加參數：--certfile 0
# 如果您想使用SenseVoiceSmall模型、時間戳、nn熱詞模型進行部署，請設置--model-dir為對應模型：
#   iic/SenseVoiceSmall-onnx
#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx（時間戳）
#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx（nn熱詞）
# 如果您想在服務端加載熱詞，請在宿主機文件./funasr-runtime-resources/models/hotwords.txt配置熱詞（docker映射地址為/workspace/models/hotwords.txt）:
#   每行一個熱詞，格式(熱詞 權重)：阿里巴巴 20（注：熱詞理論上無限制，但為了兼顧性能和效果，建議熱詞長度不超過10，個數不超過1k，權重1~100）
# SenseVoiceSmall-onnx識別結果中“<|zh|><|NEUTRAL|><|Speech|> ”分別為對應的語種、情感、事件信息

4、客戶端測試與使用

wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gzpython3 funasr_wss_client.py --host "127.0.0.1" --port 10096 --mode 2pass

除了之前的四種語言，還支持c#

四、中文離線文件轉寫服務（CPU版本）
官方推薦配置：

·配置1: （X86，計算型），4核vCPU，內存8G，單機可以支持大約32路的請求
·配置2: （X86，計算型），16核vCPU，內存32G，單機可以支持大約64路的請求
·配置3: （X86，計算型），64核vCPU，內存128G，單機可以支持大約200路的請求

1、docker安裝

curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh；
sudo bash install_docker.sh

2、鏡像啟動

sudo docker pull \registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.7
mkdir -p ./funasr-runtime-resources/models
sudo docker run -p 10095:10095 -it --privileged=true \-v $PWD/funasr-runtime-resources/models:/workspace/models \registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.7

3、服務端啟動

cd FunASR/runtime
nohup bash run_server.sh \--download-model-dir /workspace/models \--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \--model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx  \--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \--itn-dir thuduj12/fst_itn_zh \--hotword /workspace/models/hotwords.txt > log.txt 2>&1 &# 如果您想關閉ssl，增加參數：--certfile 0
# 如果您想使用SenseVoiceSmall模型、時間戳、nn熱詞模型進行部署，請設置--model-dir為對應模型：
#   iic/SenseVoiceSmall-onnx
#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx（時間戳）
#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx（nn熱詞）
# 如果您想在服務端加載熱詞，請在宿主機文件./funasr-runtime-resources/models/hotwords.txt配置熱詞（docker映射地址為/workspace/models/hotwords.txt）:
#   每行一個熱詞，格式(熱詞 權重)：阿里巴巴 20（注：熱詞理論上無限制，但為了兼顧性能和效果，建議熱詞長度不超過10，個數不超過1k，權重1~100）
# SenseVoiceSmall-onnx識別結果中“<|zh|><|NEUTRAL|><|Speech|> ”分別為對應的語種、情感、事件信息

部署8k模型：

cd FunASR/runtime
nohup bash run_server.sh \--download-model-dir /workspace/models \--vad-dir damo/speech_fsmn_vad_zh-cn-8k-common-onnx \--model-dir damo/speech_paraformer_asr_nat-zh-cn-8k-common-vocab8358-tensorflow1-onnx  \--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst-token8358 \--itn-dir thuduj12/fst_itn_zh \--hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

4、客戶端測試與使用

wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gzpython3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"

如何定制服務部署
??FunASR-runtime的代碼已開源，如果服務端和客戶端不能很好的滿足您的需求，您可以根據自己的需求進行進一步的開發：
c++ 客戶端
python 客戶端
自定義客戶端

安裝教程
·安裝funasr之前，確保已經安裝了下面依賴環境:

python>=3.8
torch>=1.13
torchaudio

·pip安裝

pip3 install -U funasr

·或者從源代碼安裝

git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./

如果需要使用工業預訓練模型，安裝modelscope與huggingface_hub（可選）

pip3 install -U modelscope huggingface huggingface_hub

快速開始
官方測試音頻數據（中文、英文）

可執行命令行

funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=asr_example_zh.wav

注：支持單條音頻文件識別，也支持文件列表，列表為kaldi風格wav.scp：wav_id wav_path

非實時語音識別

SenseVoicefrom funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocessmodel_dir = "iic/SenseVoiceSmall"model = AutoModel(model=model_dir,vad_model="fsmn-vad",vad_kwargs={"max_single_segment_time": 30000},device="cuda:0",
)# en
res = model.generate(input=f"{model.model_path}/example/en.mp3",cache={},language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"use_itn=True,batch_size_s=60,merge_vad=True,  #merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

參數說明：

·model_dir：模型名稱，或本地磁盤中的模型路徑。
·vad_model：表示開啟VAD，VAD的作用是將長音頻切割成短音頻，此時推理耗時包括了VAD與SenseVoice總耗時，為鏈路耗時，如果需要單獨測試SenseVoice模型耗時，可以關閉VAD模型。
·vad_kwargs：表示VAD模型配置,max_single_segment_time: 表示·vad_model最大切割音頻時長, 單位是毫秒ms。
·use_itn：輸出結果中是否包含標點與逆文本正則化。
·batch_size_s 表示采用動態batch，batch中總音頻時長，單位為秒s。
·merge_vad：是否將 vad 模型切割的短音頻碎片合成，合并后長度為·merge_length_s，單位為秒s。
·ban_emo_unk：禁用emo_unk標簽，禁用后所有的句子都會被賦與情感標簽。

Paraformerfrom funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh",  vad_model="fsmn-vad", punc_model="ct-punc", # spk_model="cam++")
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", batch_size_s=300, hotword='魔搭')
print(res)

注：hub：表示模型倉庫，ms為選擇modelscope下載，hf為選擇huggingface下載。

實時語音識別

from funasr import AutoModelchunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attentionmodel = AutoModel(model="paraformer-zh-streaming")import soundfile
import oswav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600mscache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]is_final = i == total_chunk_num - 1res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)print(res)

注：chunk_size為流式延時配置，[0,10,5]表示上屏實時出字粒度為10*60=600ms，未來信息為5*60=300ms。每次推理輸入為600ms（采樣點數為16000*0.6=960），輸出為對應文字，最后一個語音片段輸入需要設置is_final=True來強制輸出最后一個字。

語音端點檢測（非實時）

from funasr import AutoModelmodel = AutoModel(model="fsmn-vad")wav_file = f"{model.model_path}/example/vad_example.wav"
res = model.generate(input=wav_file)
print(res)

注：VAD模型輸出格式為：[[beg1, end1], [beg2, end2], .., [begN, endN]]，其中begN/endN表示第N個有效音頻片段的起始點/結束點，單位為毫秒。

語音端點檢測（實時）

from funasr import AutoModelchunk_size = 200 # ms
model = AutoModel(model="fsmn-vad")import soundfilewav_file = f"{model.model_path}/example/vad_example.wav"
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = int(chunk_size * sample_rate / 1000)cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]is_final = i == total_chunk_num - 1res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)if len(res[0]["value"]):print(res)

注：流式VAD模型輸出格式為4種情況：
[[beg1, end1], [beg2, end2], .., [begN, endN]]：同上離線VAD輸出結果。
[[beg, -1]]：表示只檢測到起始點。
[[-1, end]]：表示只檢測到結束點。
[]：表示既沒有檢測到起始點，也沒有檢測到結束點輸出結果單位為毫秒，從起始點開始的絕對時間。

標點恢復

from funasr import AutoModelmodel = AutoModel(model="ct-punc")res = model.generate(input="那今天的會就到這里吧 happy new year 明年見")
print(res)

時間戳預測

from funasr import AutoModelmodel = AutoModel(model="fa-zh")wav_file = f"{model.model_path}/example/asr_example.wav"
text_file = f"{model.model_path}/example/text.txt"
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)

情感識別

from funasr import AutoModelmodel = AutoModel(model="emotion2vec_plus_large")wav_file = f"{model.model_path}/example/test.wav"res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(res)

注：
1、支持Whisper-large-v3、Whisper-large-v3-turbo模型，多語言語音識別/翻譯/語種識別

2、Qwen-Audio與Qwen-Audio-Chat音頻文本模態大模型

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
# Copyright FunASR (https://github.com/alibaba-damo-academy/FunASR). All Rights Reserved.
#  MIT License  (https://opensource.org/licenses/MIT)# To install requirements: pip3 install -U "funasr[llm]"from funasr import AutoModelmodel = AutoModel(model="Qwen/Qwen-Audio-Chat")audio_in = "https://github.com/QwenLM/Qwen-Audio/raw/main/assets/audio/1272-128104-0000.flac"# 1st dialogue turn
prompt = "what does the person say?"
cache = {"history": None}
res = model.generate(input=audio_in, prompt=prompt, cache=cache)
print(res)# 2nd dialogue turn
prompt = 'Find the start time and end time of the word "middle classes"'
res = model.generate(input=None, prompt=prompt, cache=cache)
print(res)

3、情感識別模型（生氣/angry，開心/happy，中立/neutral，難過/sad）
emotion2vec+large，emotion2vec+base，emotion2vec+seed

4、SenseVoice 是一個基礎語音理解模型，具備多種語音理解能力，涵蓋了自動語音識別（ASR）、語言識別（LID）、情感識別（SER）以及音頻事件檢測（AED）

5、語音喚醒模型
fsmn_kws, fsmn_kws_mt, sanm_kws, sanm_kws_streaming
在這里插入圖片描述

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you needmodel = AutoModel(model="iic/speech_sanm_kws_phone-xiaoyun-commands-online",keywords="小云小云",output_dir="./outputs/debug",device='cpu',chunk_size=[4, 8, 4],encoder_chunk_look_back=0,decoder_chunk_look_back=0,)res = model.generate(input='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/KWS/pos_testset/kws_xiaoyunxiaoyun.wav')
print(res)