鴻蒙OSUniApp 實現的語音輸入與語音識別功能#三方框架 #Uniapp

UniApp 實現的語音輸入與語音識別功能

最近在開發跨平臺應用時，客戶要求添加語音輸入功能以提升用戶體驗。經過一番調研和實踐，我成功在UniApp項目中實現了語音輸入與識別功能，現將過程和方法分享出來，希望對有類似需求的開發者有所幫助。

為什么需要語音輸入功能？

隨著移動設備的普及，語音交互已成為一種高效的人機交流方式。與傳統的文字輸入相比，語音輸入具有以下優勢：

操作便捷：免去鍵盤敲擊，尤其適合單手操作或行走等場景
輸入高效：語音輸入速度通常快于手動輸入
提升體驗：為特定人群（如老年人、視障人士）提供便利
解放雙手：適用于駕車、做家務等無法騰出手打字的場景

在商業應用中，語音輸入可以顯著降低用戶的操作門檻，提高轉化率和用戶留存。

技術方案選型

在UniApp環境中實現語音識別，主要有三種方案：

使用原生插件：調用各平臺的原生語音識別能力
對接云服務：接入第三方語音識別API（如百度、訊飛等）
Web API：在H5平臺利用Web Speech API

經過對比和測試，我最終采用了混合方案：

在App平臺使用原生插件獲取最佳體驗
在微信小程序使用微信自帶的語音識別能力
在H5平臺嘗試使用Web Speech API，不支持時降級為云服務API

實現步驟

1. App端實現（基于原生插件）

首先需要安裝語音識別插件。我選擇了市場上比較成熟的speech-baidu插件，這是基于百度語音識別SDK封裝的UniApp插件。

安裝插件后，在manifest.json中配置：

"app-plus": {"plugins": {"speech": {"baidu": {"appid": "你的百度語音識別AppID","apikey": "你的API Key","secretkey": "你的Secret Key"}}},"distribute": {"android": {"permissions": ["<uses-permission android:name=\"android.permission.RECORD_AUDIO\"/>","<uses-permission android:name=\"android.permission.INTERNET\"/>"]}}
}

接下來創建語音識別組件：

<template><view class="voice-input-container"><view class="voice-btn" :class="{ 'recording': isRecording }"@touchstart="startRecord" @touchend="stopRecord"@touchcancel="cancelRecord"><image :src="isRecording ? '/static/mic-active.png' : '/static/mic.png'" mode="aspectFit"></image><text>{{ isRecording ? '松開結束' : '按住說話' }}</text></view><view v-if="isRecording" class="recording-tip"><text>正在聆聽...</text><view class="wave-container"><view v-for="(item, index) in waveItems" :key="index" class="wave-item":style="{ height: item + 'rpx' }"></view></view></view></view>
</template><script>
// #ifdef APP-PLUS
const speechPlugin = uni.requireNativePlugin('speech-baidu');
// #endifexport default {name: 'VoiceInput',data() {return {isRecording: false,timer: null,waveItems: [10, 15, 20, 25, 30, 25, 20, 15, 10]}},props: {lang: {type: String,default: 'zh'  // zh: 中文, en: 英文},maxDuration: {type: Number,default: 60  // 最長錄音時間，單位秒}},methods: {startRecord() {if (this.isRecording) return;// 申請錄音權限uni.authorize({scope: 'scope.record',success: () => {this.isRecording = true;this.startWaveAnimation();// #ifdef APP-PLUSspeechPlugin.start({vadEos: 3000,  // 靜音超時時間language: this.lang === 'zh' ? 'zh-cn' : 'en-us'}, (res) => {if (res.errorCode === 0) {// 識別結果this.$emit('result', res.result);} else {uni.showToast({title: `識別失敗: ${res.errorCode}`,icon: 'none'});}this.isRecording = false;this.stopWaveAnimation();});// #endif// 設置最長錄制時間this.timer = setTimeout(() => {if (this.isRecording) {this.stopRecord();}}, this.maxDuration * 1000);},fail: () => {uni.showToast({title: '請授權錄音權限',icon: 'none'});}});},stopRecord() {if (!this.isRecording) return;// #ifdef APP-PLUSspeechPlugin.stop();// #endifclearTimeout(this.timer);this.isRecording = false;this.stopWaveAnimation();},cancelRecord() {if (!this.isRecording) return;// #ifdef APP-PLUSspeechPlugin.cancel();// #endifclearTimeout(this.timer);this.isRecording = false;this.stopWaveAnimation();},// 波形動畫startWaveAnimation() {this.waveAnimTimer = setInterval(() => {this.waveItems = this.waveItems.map(() => Math.floor(Math.random() * 40) + 10);}, 200);},stopWaveAnimation() {clearInterval(this.waveAnimTimer);this.waveItems = [10, 15, 20, 25, 30, 25, 20, 15, 10];}},beforeDestroy() {this.cancelRecord();}
}
</script><style scoped>
.voice-input-container {width: 100%;
}.voice-btn {width: 200rpx;height: 200rpx;border-radius: 100rpx;background-color: #f5f5f5;display: flex;flex-direction: column;align-items: center;justify-content: center;margin: 0 auto;
}.voice-btn.recording {background-color: #e1f5fe;box-shadow: 0 0 20rpx rgba(0, 120, 255, 0.5);
}.voice-btn image {width: 80rpx;height: 80rpx;margin-bottom: 10rpx;
}.recording-tip {margin-top: 30rpx;text-align: center;
}.wave-container {display: flex;justify-content: center;align-items: flex-end;height: 80rpx;margin-top: 20rpx;
}.wave-item {width: 8rpx;background-color: #1890ff;margin: 0 5rpx;border-radius: 4rpx;transition: height 0.2s;
}
</style>

2. 微信小程序實現

微信小程序提供了原生的語音識別API，使用非常方便：

// 在小程序環境下的代碼
startRecord() {// #ifdef MP-WEIXINthis.isRecording = true;this.startWaveAnimation();const recorderManager = wx.getRecorderManager();recorderManager.onStart(() => {console.log('錄音開始');});recorderManager.onStop((res) => {this.isRecording = false;this.stopWaveAnimation();// 將錄音文件發送到微信后臺識別wx.showLoading({ title: '識別中...' });const { tempFilePath } = res;wx.uploadFile({url: 'https://api.weixin.qq.com/cgi-bin/media/voice/translatecontent',filePath: tempFilePath,name: 'media',formData: {access_token: this.accessToken,format: 'mp3',voice_id: Date.now(),lfrom: this.lang === 'zh' ? 'zh_CN' : 'en_US',lto: 'zh_CN'},success: (uploadRes) => {wx.hideLoading();const data = JSON.parse(uploadRes.data);if (data.errcode === 0) {this.$emit('result', data.result);} else {uni.showToast({title: `識別失敗: ${data.errmsg}`,icon: 'none'});}},fail: () => {wx.hideLoading();uni.showToast({title: '語音識別失敗',icon: 'none'});}});});recorderManager.start({duration: this.maxDuration * 1000,sampleRate: 16000,numberOfChannels: 1,encodeBitRate: 48000,format: 'mp3'});// #endif
},stopRecord() {// #ifdef MP-WEIXINwx.getRecorderManager().stop();// #endif// ...與App端相同的代碼...
}

需要注意的是，微信小程序的語音識別需要獲取access_token，這通常需要在后端實現并提供接口。

3. H5端實現

在H5端，我們可以利用Web Speech API來實現語音識別，當瀏覽器不支持時則降級為云服務API：

startRecord() {// #ifdef H5this.isRecording = true;this.startWaveAnimation();// 檢查瀏覽器是否支持Speech Recognitionif ('webkitSpeechRecognition' in window || 'SpeechRecognition' in window) {const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;this.recognition = new SpeechRecognition();this.recognition.lang = this.lang === 'zh' ? 'zh-CN' : 'en-US';this.recognition.continuous = false;this.recognition.interimResults = false;this.recognition.onresult = (event) => {const result = event.results[0][0].transcript;this.$emit('result', result);};this.recognition.onerror = (event) => {uni.showToast({title: `識別錯誤: ${event.error}`,icon: 'none'});};this.recognition.onend = () => {this.isRecording = false;this.stopWaveAnimation();};this.recognition.start();} else {// 不支持Web Speech API，調用云服務APIthis.useCloudSpeechAPI();}// #endif// 設置最長錄制時間this.timer = setTimeout(() => {if (this.isRecording) {this.stopRecord();}}, this.maxDuration * 1000);
},stopRecord() {// #ifdef H5if (this.recognition) {this.recognition.stop();}// #endif// ...與App端相同的代碼...
},useCloudSpeechAPI() {// 這里實現降級方案，調用后端接口進行語音識別uni.chooseFile({count: 1,type: 'file',extension: ['.mp3', '.wav'],success: (res) => {const tempFilePath = res.tempFilePaths[0];// 上傳音頻文件到后端進行識別uni.uploadFile({url: this.apiBaseUrl + '/speech/recognize',filePath: tempFilePath,name: 'audio',formData: {lang: this.lang},success: (uploadRes) => {const data = JSON.parse(uploadRes.data);if (data.code === 0) {this.$emit('result', data.result);} else {uni.showToast({title: `識別失敗: ${data.msg}`,icon: 'none'});}},complete: () => {this.isRecording = false;this.stopWaveAnimation();}});}});
}

4. 通用接口封裝

為了讓調用方便，我封裝了一個統一的API：

// 在 utils/speech.js 中
const Speech = {// 開始語音識別startRecognize(options) {const { lang = 'zh', success, fail, complete } = options;// #ifdef APP-PLUSconst speechPlugin = uni.requireNativePlugin('speech-baidu');speechPlugin.start({vadEos: 3000,language: lang === 'zh' ? 'zh-cn' : 'en-us'}, (res) => {if (res.errorCode === 0) {success && success(res.result);} else {fail && fail(res);}complete && complete();});return {stop: () => speechPlugin.stop(),cancel: () => speechPlugin.cancel()};// #endif// #ifdef MP-WEIXIN// 微信小程序實現邏輯// ...// #endif// #ifdef H5// H5實現邏輯// ...// #endif}
};export default Speech;

實戰案例：聊天應用中的語音輸入

現在，我們來看一個實際應用場景 - 在聊天應用中添加語音輸入功能：

<template><view class="chat-input-container"><view class="chat-tools"><image :src="isVoiceMode ? '/static/keyboard.png' : '/static/mic.png'" @tap="toggleInputMode"></image><image src="/static/emoji.png" @tap="showEmojiPicker"></image></view><view v-if="!isVoiceMode" class="text-input"><textareav-model="message"auto-heightplaceholder="請輸入消息...":focus="textFocus"@focus="onFocus"@blur="onBlur"></textarea></view><view v-else class="voice-input"><voice-input @result="onVoiceResult"></voice-input></view><button class="send-btn" :disabled="!message.trim()" @tap="sendMessage">發送</button></view>
</template><script>
import VoiceInput from '@/components/voice-input/voice-input.vue';export default {components: {VoiceInput},data() {return {message: '',isVoiceMode: false,textFocus: false};},methods: {toggleInputMode() {this.isVoiceMode = !this.isVoiceMode;if (!this.isVoiceMode) {this.$nextTick(() => {this.textFocus = true;});}},onVoiceResult(result) {this.message = result;this.isVoiceMode = false;},sendMessage() {if (!this.message.trim()) return;this.$emit('send', this.message);this.message = '';},onFocus() {this.textFocus = true;},onBlur() {this.textFocus = false;},showEmojiPicker() {// 顯示表情選擇器}}
};
</script><style>
.chat-input-container {display: flex;align-items: center;padding: 20rpx;border-top: 1rpx solid #eee;background-color: #fff;
}.chat-tools {display: flex;margin-right: 20rpx;
}.chat-tools image {width: 60rpx;height: 60rpx;margin-right: 20rpx;
}.text-input {flex: 1;background-color: #f5f5f5;border-radius: 10rpx;padding: 10rpx 20rpx;
}.text-input textarea {width: 100%;min-height: 60rpx;max-height: 240rpx;
}.voice-input {flex: 1;display: flex;justify-content: center;
}.send-btn {width: 140rpx;height: 80rpx;line-height: 80rpx;font-size: 28rpx;margin-left: 20rpx;padding: 0;background-color: #1890ff;color: #fff;
}.send-btn[disabled] {background-color: #ccc;
}
</style>

性能優化和注意事項

在實際開發中，我遇到了一些需要特別注意的問題：

1. 權限處理

語音識別需要麥克風權限，不同平臺的權限處理方式不同：

// 統一請求錄音權限
requestAudioPermission() {return new Promise((resolve, reject) => {// #ifdef APP-PLUSconst permissions = ['android.permission.RECORD_AUDIO'];plus.android.requestPermissions(permissions,function(e) {if (e.granted.length === permissions.length) {resolve();} else {reject(new Error('未授予錄音權限'));}},function(e) {reject(e);});// #endif// #ifdef MP-WEIXIN || MP-BAIDUuni.authorize({scope: 'scope.record',success: () => resolve(),fail: (err) => reject(err)});// #endif// #ifdef H5if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {navigator.mediaDevices.getUserMedia({ audio: true }).then(() => resolve()).catch(err => reject(err));} else {reject(new Error('瀏覽器不支持錄音功能'));}// #endif});
}

2. 流量控制

語音識別需要上傳音頻數據，在移動網絡下會消耗流量：

// 檢查網絡環境并提示用戶
checkNetwork() {uni.getNetworkType({success: (res) => {if (res.networkType === '2g' || res.networkType === '3g') {uni.showModal({title: '流量提醒',content: '當前處于移動網絡環境，語音識別可能消耗較多流量，是否繼續？',success: (confirm) => {if (confirm.confirm) {this.startSpeechRecognition();}}});} else {this.startSpeechRecognition();}}});
}

3. 性能優化

長時間語音識別會增加內存和電量消耗，需要做好優化：

// 設置最大錄音時長和自動結束
setupMaxDuration() {if (this.timer) {clearTimeout(this.timer);}this.timer = setTimeout(() => {if (this.isRecording) {uni.showToast({title: '錄音時間過長，已自動結束',icon: 'none'});this.stopRecord();}}, this.maxDuration * 1000);
}// 空閑自動停止
setupVAD() {// 監測靜音，如果用戶停止說話3秒，自動結束錄音let lastAudioLevel = 0;let silenceCounter = 0;this.vadTimer = setInterval(() => {// 獲取當前音量const currentLevel = this.getAudioLevel();if (Math.abs(currentLevel - lastAudioLevel) < 0.05) {silenceCounter++;if (silenceCounter > 30) { // 3秒 (30 * 100ms)this.stopRecord();}} else {silenceCounter = 0;}lastAudioLevel = currentLevel;}, 100);
}

增強功能：語音合成（TTS）

除了語音識別外，語音合成（Text-to-Speech）也是很有用的功能，可以將文本轉換為語音：

// 語音合成
textToSpeech(text, options = {}) {const { lang = 'zh', speed = 5, volume = 5 } = options;// #ifdef APP-PLUSconst speechPlugin = uni.requireNativePlugin('speech-baidu');return new Promise((resolve, reject) => {speechPlugin.textToSpeech({text,language: lang === 'zh' ? 'zh-cn' : 'en-us',speed,volume}, (res) => {if (res.errorCode === 0) {resolve(res);} else {reject(new Error(`語音合成失敗: ${res.errorCode}`));}});});// #endif// #ifdef H5return new Promise((resolve, reject) => {if ('speechSynthesis' in window) {const speech = new SpeechSynthesisUtterance();speech.text = text;speech.lang = lang === 'zh' ? 'zh-CN' : 'en-US';speech.rate = speed / 10;speech.volume = volume / 10;speech.onend = () => {resolve();};speech.onerror = (err) => {reject(err);};window.speechSynthesis.speak(speech);} else {reject(new Error('當前瀏覽器不支持語音合成'));}});// #endif
}

踩坑記錄與解決方案

開發過程中，我遇到了一些常見問題與解決方法，分享如下：

百度語音插件初始化失敗：檢查API密鑰配置和網絡環境，特別是HTTPS限制
H5錄音無法使用：多數瀏覽器要求必須在HTTPS環境下才能使用麥克風
識別結果不準確：嘗試調整錄音參數，如采樣率、聲道數等，或者使用更專業的噪聲抑制算法
微信小程序調用失敗：檢查access_token是否有效，注意token有效期
不同設備體驗差異大：針對低端設備優化，如減少動畫效果、降低采樣率等

我們的解決方案是進行兼容性檢測，并根據設備性能自動調整參數：

// 檢測設備性能并調整參數
detectDevicePerformance() {const platform = uni.getSystemInfoSync().platform;const brand = uni.getSystemInfoSync().brand;const model = uni.getSystemInfoSync().model;// 低端安卓設備優化if (platform === 'android') {// 特定型號的優化if (brand === 'samsung' && model.includes('SM-J')) {return {sampleRate: 8000,quality: 'low',useVAD: false // 禁用語音活動檢測，降低CPU占用};}}// 默認配置return {sampleRate: 16000,quality: 'high',useVAD: true};
}