UniApp 實現的語音輸入與語音識別功能
最近在開發跨平臺應用時,客戶要求添加語音輸入功能以提升用戶體驗。經過一番調研和實踐,我成功在UniApp項目中實現了語音輸入與識別功能,現將過程和方法分享出來,希望對有類似需求的開發者有所幫助。
為什么需要語音輸入功能?
隨著移動設備的普及,語音交互已成為一種高效的人機交流方式。與傳統的文字輸入相比,語音輸入具有以下優勢:
- 操作便捷:免去鍵盤敲擊,尤其適合單手操作或行走等場景
- 輸入高效:語音輸入速度通常快于手動輸入
- 提升體驗:為特定人群(如老年人、視障人士)提供便利
- 解放雙手:適用于駕車、做家務等無法騰出手打字的場景
在商業應用中,語音輸入可以顯著降低用戶的操作門檻,提高轉化率和用戶留存。
技術方案選型
在UniApp環境中實現語音識別,主要有三種方案:
- 使用原生插件:調用各平臺的原生語音識別能力
- 對接云服務:接入第三方語音識別API(如百度、訊飛等)
- Web API:在H5平臺利用Web Speech API
經過對比和測試,我最終采用了混合方案:
- 在App平臺使用原生插件獲取最佳體驗
- 在微信小程序使用微信自帶的語音識別能力
- 在H5平臺嘗試使用Web Speech API,不支持時降級為云服務API
實現步驟
1. App端實現(基于原生插件)
首先需要安裝語音識別插件。我選擇了市場上比較成熟的speech-baidu插件,這是基于百度語音識別SDK封裝的UniApp插件。
安裝插件后,在manifest.json中配置:
"app-plus": {"plugins": {"speech": {"baidu": {"appid": "你的百度語音識別AppID","apikey": "你的API Key","secretkey": "你的Secret Key"}}},"distribute": {"android": {"permissions": ["<uses-permission android:name=\"android.permission.RECORD_AUDIO\"/>","<uses-permission android:name=\"android.permission.INTERNET\"/>"]}}
}
接下來創建語音識別組件:
<template><view class="voice-input-container"><view class="voice-btn" :class="{ 'recording': isRecording }"@touchstart="startRecord" @touchend="stopRecord"@touchcancel="cancelRecord"><image :src="isRecording ? '/static/mic-active.png' : '/static/mic.png'" mode="aspectFit"></image><text>{{ isRecording ? '松開結束' : '按住說話' }}</text></view><view v-if="isRecording" class="recording-tip"><text>正在聆聽...</text><view class="wave-container"><view v-for="(item, index) in waveItems" :key="index" class="wave-item":style="{ height: item + 'rpx' }"></view></view></view></view>
</template><script>
// #ifdef APP-PLUS
const speechPlugin = uni.requireNativePlugin('speech-baidu');
// #endifexport default {name: 'VoiceInput',data() {return {isRecording: false,timer: null,waveItems: [10, 15, 20, 25, 30, 25, 20, 15, 10]}},props: {lang: {type: String,default: 'zh' // zh: 中文, en: 英文},maxDuration: {type: Number,default: 60 // 最長錄音時間,單位秒}},methods: {startRecord() {if (this.isRecording) return;// 申請錄音權限uni.authorize({scope: 'scope.record',success: () => {this.isRecording = true;this.startWaveAnimation();// #ifdef APP-PLUSspeechPlugin.start({vadEos: 3000, // 靜音超時時間language: this.lang === 'zh' ? 'zh-cn' : 'en-us'}, (res) => {if (res.errorCode === 0) {// 識別結果this.$emit('result', res.result);} else {uni.showToast({title: `識別失敗: ${res.errorCode}`,icon: 'none'});}this.isRecording = false;this.stopWaveAnimation();});// #endif// 設置最長錄制時間this.timer = setTimeout(() => {if (this.isRecording) {this.stopRecord();}}, this.maxDuration * 1000);},fail: () => {uni.showToast({title: '請授權錄音權限',icon: 'none'});}});},stopRecord() {if (!this.isRecording) return;// #ifdef APP-PLUSspeechPlugin.stop();// #endifclearTimeout(this.timer);this.isRecording = false;this.stopWaveAnimation();},cancelRecord() {if (!this.isRecording) return;// #ifdef APP-PLUSspeechPlugin.cancel();// #endifclearTimeout(this.timer);this.isRecording = false;this.stopWaveAnimation();},// 波形動畫startWaveAnimation() {this.waveAnimTimer = setInterval(() => {this.waveItems = this.waveItems.map(() => Math.floor(Math.random() * 40) + 10);}, 200);},stopWaveAnimation() {clearInterval(this.waveAnimTimer);this.waveItems = [10, 15, 20, 25, 30, 25, 20, 15, 10];}},beforeDestroy() {this.cancelRecord();}
}
</script><style scoped>
.voice-input-container {width: 100%;
}.voice-btn {width: 200rpx;height: 200rpx;border-radius: 100rpx;background-color: #f5f5f5;display: flex;flex-direction: column;align-items: center;justify-content: center;margin: 0 auto;
}.voice-btn.recording {background-color: #e1f5fe;box-shadow: 0 0 20rpx rgba(0, 120, 255, 0.5);
}.voice-btn image {width: 80rpx;height: 80rpx;margin-bottom: 10rpx;
}.recording-tip {margin-top: 30rpx;text-align: center;
}.wave-container {display: flex;justify-content: center;align-items: flex-end;height: 80rpx;margin-top: 20rpx;
}.wave-item {width: 8rpx;background-color: #1890ff;margin: 0 5rpx;border-radius: 4rpx;transition: height 0.2s;
}
</style>
2. 微信小程序實現
微信小程序提供了原生的語音識別API,使用非常方便:
// 在小程序環境下的代碼
startRecord() {// #ifdef MP-WEIXINthis.isRecording = true;this.startWaveAnimation();const recorderManager = wx.getRecorderManager();recorderManager.onStart(() => {console.log('錄音開始');});recorderManager.onStop((res) => {this.isRecording = false;this.stopWaveAnimation();// 將錄音文件發送到微信后臺識別wx.showLoading({ title: '識別中...' });const { tempFilePath } = res;wx.uploadFile({url: 'https://api.weixin.qq.com/cgi-bin/media/voice/translatecontent',filePath: tempFilePath,name: 'media',formData: {access_token: this.accessToken,format: 'mp3',voice_id: Date.now(),lfrom: this.lang === 'zh' ? 'zh_CN' : 'en_US',lto: 'zh_CN'},success: (uploadRes) => {wx.hideLoading();const data = JSON.parse(uploadRes.data);if (data.errcode === 0) {this.$emit('result', data.result);} else {uni.showToast({title: `識別失敗: ${data.errmsg}`,icon: 'none'});}},fail: () => {wx.hideLoading();uni.showToast({title: '語音識別失敗',icon: 'none'});}});});recorderManager.start({duration: this.maxDuration * 1000,sampleRate: 16000,numberOfChannels: 1,encodeBitRate: 48000,format: 'mp3'});// #endif
},stopRecord() {// #ifdef MP-WEIXINwx.getRecorderManager().stop();// #endif// ...與App端相同的代碼...
}
需要注意的是,微信小程序的語音識別需要獲取access_token,這通常需要在后端實現并提供接口。
3. H5端實現
在H5端,我們可以利用Web Speech API來實現語音識別,當瀏覽器不支持時則降級為云服務API:
startRecord() {// #ifdef H5this.isRecording = true;this.startWaveAnimation();// 檢查瀏覽器是否支持Speech Recognitionif ('webkitSpeechRecognition' in window || 'SpeechRecognition' in window) {const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;this.recognition = new SpeechRecognition();this.recognition.lang = this.lang === 'zh' ? 'zh-CN' : 'en-US';this.recognition.continuous = false;this.recognition.interimResults = false;this.recognition.onresult = (event) => {const result = event.results[0][0].transcript;this.$emit('result', result);};this.recognition.onerror = (event) => {uni.showToast({title: `識別錯誤: ${event.error}`,icon: 'none'});};this.recognition.onend = () => {this.isRecording = false;this.stopWaveAnimation();};this.recognition.start();} else {// 不支持Web Speech API,調用云服務APIthis.useCloudSpeechAPI();}// #endif// 設置最長錄制時間this.timer = setTimeout(() => {if (this.isRecording) {this.stopRecord();}}, this.maxDuration * 1000);
},stopRecord() {// #ifdef H5if (this.recognition) {this.recognition.stop();}// #endif// ...與App端相同的代碼...
},useCloudSpeechAPI() {// 這里實現降級方案,調用后端接口進行語音識別uni.chooseFile({count: 1,type: 'file',extension: ['.mp3', '.wav'],success: (res) => {const tempFilePath = res.tempFilePaths[0];// 上傳音頻文件到后端進行識別uni.uploadFile({url: this.apiBaseUrl + '/speech/recognize',filePath: tempFilePath,name: 'audio',formData: {lang: this.lang},success: (uploadRes) => {const data = JSON.parse(uploadRes.data);if (data.code === 0) {this.$emit('result', data.result);} else {uni.showToast({title: `識別失敗: ${data.msg}`,icon: 'none'});}},complete: () => {this.isRecording = false;this.stopWaveAnimation();}});}});
}
4. 通用接口封裝
為了讓調用方便,我封裝了一個統一的API:
// 在 utils/speech.js 中
const Speech = {// 開始語音識別startRecognize(options) {const { lang = 'zh', success, fail, complete } = options;// #ifdef APP-PLUSconst speechPlugin = uni.requireNativePlugin('speech-baidu');speechPlugin.start({vadEos: 3000,language: lang === 'zh' ? 'zh-cn' : 'en-us'}, (res) => {if (res.errorCode === 0) {success && success(res.result);} else {fail && fail(res);}complete && complete();});return {stop: () => speechPlugin.stop(),cancel: () => speechPlugin.cancel()};// #endif// #ifdef MP-WEIXIN// 微信小程序實現邏輯// ...// #endif// #ifdef H5// H5實現邏輯// ...// #endif}
};export default Speech;
實戰案例:聊天應用中的語音輸入
現在,我們來看一個實際應用場景 - 在聊天應用中添加語音輸入功能:
<template><view class="chat-input-container"><view class="chat-tools"><image :src="isVoiceMode ? '/static/keyboard.png' : '/static/mic.png'" @tap="toggleInputMode"></image><image src="/static/emoji.png" @tap="showEmojiPicker"></image></view><view v-if="!isVoiceMode" class="text-input"><textareav-model="message"auto-heightplaceholder="請輸入消息...":focus="textFocus"@focus="onFocus"@blur="onBlur"></textarea></view><view v-else class="voice-input"><voice-input @result="onVoiceResult"></voice-input></view><button class="send-btn" :disabled="!message.trim()" @tap="sendMessage">發送</button></view>
</template><script>
import VoiceInput from '@/components/voice-input/voice-input.vue';export default {components: {VoiceInput},data() {return {message: '',isVoiceMode: false,textFocus: false};},methods: {toggleInputMode() {this.isVoiceMode = !this.isVoiceMode;if (!this.isVoiceMode) {this.$nextTick(() => {this.textFocus = true;});}},onVoiceResult(result) {this.message = result;this.isVoiceMode = false;},sendMessage() {if (!this.message.trim()) return;this.$emit('send', this.message);this.message = '';},onFocus() {this.textFocus = true;},onBlur() {this.textFocus = false;},showEmojiPicker() {// 顯示表情選擇器}}
};
</script><style>
.chat-input-container {display: flex;align-items: center;padding: 20rpx;border-top: 1rpx solid #eee;background-color: #fff;
}.chat-tools {display: flex;margin-right: 20rpx;
}.chat-tools image {width: 60rpx;height: 60rpx;margin-right: 20rpx;
}.text-input {flex: 1;background-color: #f5f5f5;border-radius: 10rpx;padding: 10rpx 20rpx;
}.text-input textarea {width: 100%;min-height: 60rpx;max-height: 240rpx;
}.voice-input {flex: 1;display: flex;justify-content: center;
}.send-btn {width: 140rpx;height: 80rpx;line-height: 80rpx;font-size: 28rpx;margin-left: 20rpx;padding: 0;background-color: #1890ff;color: #fff;
}.send-btn[disabled] {background-color: #ccc;
}
</style>
性能優化和注意事項
在實際開發中,我遇到了一些需要特別注意的問題:
1. 權限處理
語音識別需要麥克風權限,不同平臺的權限處理方式不同:
// 統一請求錄音權限
requestAudioPermission() {return new Promise((resolve, reject) => {// #ifdef APP-PLUSconst permissions = ['android.permission.RECORD_AUDIO'];plus.android.requestPermissions(permissions,function(e) {if (e.granted.length === permissions.length) {resolve();} else {reject(new Error('未授予錄音權限'));}},function(e) {reject(e);});// #endif// #ifdef MP-WEIXIN || MP-BAIDUuni.authorize({scope: 'scope.record',success: () => resolve(),fail: (err) => reject(err)});// #endif// #ifdef H5if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {navigator.mediaDevices.getUserMedia({ audio: true }).then(() => resolve()).catch(err => reject(err));} else {reject(new Error('瀏覽器不支持錄音功能'));}// #endif});
}
2. 流量控制
語音識別需要上傳音頻數據,在移動網絡下會消耗流量:
// 檢查網絡環境并提示用戶
checkNetwork() {uni.getNetworkType({success: (res) => {if (res.networkType === '2g' || res.networkType === '3g') {uni.showModal({title: '流量提醒',content: '當前處于移動網絡環境,語音識別可能消耗較多流量,是否繼續?',success: (confirm) => {if (confirm.confirm) {this.startSpeechRecognition();}}});} else {this.startSpeechRecognition();}}});
}
3. 性能優化
長時間語音識別會增加內存和電量消耗,需要做好優化:
// 設置最大錄音時長和自動結束
setupMaxDuration() {if (this.timer) {clearTimeout(this.timer);}this.timer = setTimeout(() => {if (this.isRecording) {uni.showToast({title: '錄音時間過長,已自動結束',icon: 'none'});this.stopRecord();}}, this.maxDuration * 1000);
}// 空閑自動停止
setupVAD() {// 監測靜音,如果用戶停止說話3秒,自動結束錄音let lastAudioLevel = 0;let silenceCounter = 0;this.vadTimer = setInterval(() => {// 獲取當前音量const currentLevel = this.getAudioLevel();if (Math.abs(currentLevel - lastAudioLevel) < 0.05) {silenceCounter++;if (silenceCounter > 30) { // 3秒 (30 * 100ms)this.stopRecord();}} else {silenceCounter = 0;}lastAudioLevel = currentLevel;}, 100);
}
增強功能:語音合成(TTS)
除了語音識別外,語音合成(Text-to-Speech)也是很有用的功能,可以將文本轉換為語音:
// 語音合成
textToSpeech(text, options = {}) {const { lang = 'zh', speed = 5, volume = 5 } = options;// #ifdef APP-PLUSconst speechPlugin = uni.requireNativePlugin('speech-baidu');return new Promise((resolve, reject) => {speechPlugin.textToSpeech({text,language: lang === 'zh' ? 'zh-cn' : 'en-us',speed,volume}, (res) => {if (res.errorCode === 0) {resolve(res);} else {reject(new Error(`語音合成失敗: ${res.errorCode}`));}});});// #endif// #ifdef H5return new Promise((resolve, reject) => {if ('speechSynthesis' in window) {const speech = new SpeechSynthesisUtterance();speech.text = text;speech.lang = lang === 'zh' ? 'zh-CN' : 'en-US';speech.rate = speed / 10;speech.volume = volume / 10;speech.onend = () => {resolve();};speech.onerror = (err) => {reject(err);};window.speechSynthesis.speak(speech);} else {reject(new Error('當前瀏覽器不支持語音合成'));}});// #endif
}
踩坑記錄與解決方案
開發過程中,我遇到了一些常見問題與解決方法,分享如下:
- 百度語音插件初始化失敗:檢查API密鑰配置和網絡環境,特別是HTTPS限制
- H5錄音無法使用:多數瀏覽器要求必須在HTTPS環境下才能使用麥克風
- 識別結果不準確:嘗試調整錄音參數,如采樣率、聲道數等,或者使用更專業的噪聲抑制算法
- 微信小程序調用失敗:檢查access_token是否有效,注意token有效期
- 不同設備體驗差異大:針對低端設備優化,如減少動畫效果、降低采樣率等
我們的解決方案是進行兼容性檢測,并根據設備性能自動調整參數:
// 檢測設備性能并調整參數
detectDevicePerformance() {const platform = uni.getSystemInfoSync().platform;const brand = uni.getSystemInfoSync().brand;const model = uni.getSystemInfoSync().model;// 低端安卓設備優化if (platform === 'android') {// 特定型號的優化if (brand === 'samsung' && model.includes('SM-J')) {return {sampleRate: 8000,quality: 'low',useVAD: false // 禁用語音活動檢測,降低CPU占用};}}// 默認配置return {sampleRate: 16000,quality: 'high',useVAD: true};
}
總結與展望
通過本文,我們探討了在UniApp中實現語音輸入與識別功能的多種方案,并提供了具體的代碼實現。這些實現方案已在實際項目中得到驗證,能夠滿足大多數應用場景的需求。
語音技術在移動應用中的重要性不斷提升,未來可以探索更多高級功能:
- 離線語音識別:降低網絡依賴,提高響應速度
- 多語言支持:增加更多語言的識別能力
- 聲紋識別:通過語音實現用戶身份驗證
- 情感分析:從語音中識別用戶情緒
希望本文對你在UniApp中實現語音功能有所幫助!如有問題歡迎在評論區交流討論。
參考資料
- UniApp官方文檔
- 百度語音識別API文檔
- Web Speech API