25.6.29 | 增加Gummy 實時/一句話語音識別 |
25.6.28 | 增加Qwen TTS本地音頻和實時播報 |
背景
準備環境
MacOS M1電腦(其他M系列芯片也可以)
為了方便python的使用環境,使用Miniconda:下載鏈接:Download Anaconda Distribution | Anaconda
安裝阿里模型的依賴庫:https://bailian.console.aliyun.com/?tab=api#/doc/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2712193.html%23f3e80b21069aa
為了配置環境變量:https://bailian.console.aliyun.com/?tab=api#/doc/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2803795.html
?
為了方便編輯代碼,下載安裝最流行的:vscode,最新版本已經有了Github Copilot免費用,記得要打開。
?為了解決使用了Miniconda的python環境,導致vscode自帶的運行環境找不到dashscope出錯的問題;
- 按下?
Cmd+Shift+P
,輸入并選擇?Python: Select Interpreter
。 - 選擇你用 miniconda 安裝 dashscope 的那個環境(比如?
miniconda3/envs/xxx
)。 - 右下角狀態欄會顯示當前環境。
- 驗證:which python
Gummy-ASR
實時識別
一句話識別
官方demo,未修改;
原理:程序啟動時,會開始錄音;錄音結束判停:經過查看代碼,和日志查看,由云端判定的),一分鐘音頻是上限。
# For prerequisites running the following sample, visit https://help.aliyun.com/document_detail/xxxxx.html
# 一句話識別能夠對一分鐘內的語音數據流(無論是從外部設備如麥克風獲取的音頻流,還是從本地文件讀取的音頻流)進行識別并流式返回結果。import pyaudio
import dashscope
from dashscope.audio.asr import *# 若沒有將API Key配置到環境變量中,需將your-api-key替換為自己的API Key
# dashscope.api_key = "your-api-key"mic = None
stream = Noneclass Callback(TranslationRecognizerCallback):def on_open(self) -> None:global micglobal streamprint("TranslationRecognizerCallback open.")mic = pyaudio.PyAudio()stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)def on_close(self) -> None:global micglobal streamprint("TranslationRecognizerCallback close.")stream.stop_stream()stream.close()mic.terminate()stream = Nonemic = Nonedef on_event(self,request_id,transcription_result: TranscriptionResult,translation_result: TranslationResult,usage,) -> None:print("request id: ", request_id)print("usage: ", usage)if translation_result is not None:print("translation_languages: ",translation_result.get_language_list(),)english_translation = translation_result.get_translation("en")print("sentence id: ", english_translation.sentence_id)print("translate to english: ", english_translation.text)if english_translation.vad_pre_end:print("vad pre end {}, {}, {}".format(transcription_result.pre_end_start_time, transcription_result.pre_end_end_time, transcription_result.pre_end_timemillis))if transcription_result is not None:print("sentence id: ", transcription_result.sentence_id)print("transcription: ", transcription_result.text)callback = Callback()translator = TranslationRecognizerChat(model="gummy-chat-v1",format="pcm",sample_rate=16000,transcription_enabled=True,translation_enabled=True,translation_target_languages=["en"],callback=callback,
)
translator.start()
print("請您通過麥克風講話體驗一句話語音識別和翻譯功能")
while True:if stream:data = stream.read(3200, exception_on_overflow=False)if not translator.send_audio_frame(data):print("sentence end, stop sending")breakelse:breaktranslator.stop()
Qwen-TTS
非實時TTS
將生成的音頻,保存到本地。文檔見https://bailian.console.aliyun.com/?tab=doc#/doc/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2879134.html&renderType=iframe
修復了tts請求時,response出錯的情況(比如API_KEY不對)
import os
import requests
import dashscopetext = "那我來給大家推薦一款T恤,這款呢真的是超級好看,這個顏色呢很顯氣質,而且呢也是搭配的絕佳單品,大家可以閉眼入,真的是非常好看,對身材的包容性也很好,不管啥身材的寶寶呢,穿上去都是很好看的。推薦寶寶們下單哦。"
response = dashscope.audio.qwen_tts.SpeechSynthesizer.call(model="qwen-tts",api_key=os.getenv("DASHSCOPE_API_KEY"),text=text,voice="Cherry",
)# ====== 開始檢查 response 是否有效 ======
print(response)
if not hasattr(response, 'output') or response.output is None:print("響應中沒有 output 字段,請檢查權限或模型是否開通")exit()if not hasattr(response.output, 'audio') or response.output.audio is None:print("響應中沒有 audio 數據,請檢查返回內容")exit()if not hasattr(response.output.audio, 'url'):print("響應中 audio 沒有 url 字段,請檢查返回結構")exit()# ====== 結束檢查 response 是否有效 ======audio_url = response.output.audio["url"]save_path = "downloaded_audio.wav" # 自定義保存路徑try:response = requests.get(audio_url)response.raise_for_status() # 檢查請求是否成功with open(save_path, 'wb') as f:f.write(response.content)print(f"音頻文件已保存至:{save_path}")
except Exception as e:print(f"下載失敗:{str(e)}")
問題1:pip install pyaudio失敗
解決方案:
(1)先安裝brew:/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)",重啟
(2)再安裝brew install portaudio
(3)再安裝pip install pyaudio
生成的TTS音頻文件,放到了本地。適合音頻生成離線播報場景,比如PPT。
不適合實時的語音交互,我們需要實時TTS。
實時TTS
按照官方demo跑就好。
我們把實時TTS封裝成函數api,并提供了api測試demo;
函數封裝代碼:qwen_play_tts.py
# coding=utf-8import os
import dashscope
import pyaudio
import time
import base64
import numpy as npdef qwen_play_tts(text, voice="Ethan", api_key=None):"""使用通義千問 TTS 進行流式語音合成并播放:param text: 合成文本:param voice: 發音人:param api_key: Dashscope API Key(可選,默認讀取環境變量)"""api_key = api_key or os.getenv("DASHSCOPE_API_KEY")if not api_key:raise ValueError("DASHSCOPE_API_KEY is not set.")p = pyaudio.PyAudio()stream = p.open(format=pyaudio.paInt16,channels=1,rate=24000,output=True)responses = dashscope.audio.qwen_tts.SpeechSynthesizer.call(model="qwen-tts",api_key=api_key,text=text,voice=voice,stream=True)for chunk in responses:audio_string = chunk["output"]["audio"]["data"]wav_bytes = base64.b64decode(audio_string)audio_np = np.frombuffer(wav_bytes, dtype=np.int16)stream.write(audio_np.tobytes())time.sleep(0.8)stream.stop_stream()stream.close()p.terminate()# 示例調用
if __name__ == "__main__":sample_text = "你好,這是一段測試語音。"qwen_play_tts(sample_text)
api測試代碼:qwen_api_test.py
from qwen_play_tts import qwen_play_ttsqwen_play_tts("這是一個函數調用測試。")