语音识别 HarmonyOS SDK
更新时间:2024-12-31
1. 文档说明
文档名称 | 语音识别集成文档 |
---|---|
所属平台 | HarmonyOS |
提交日期 | 2024-12-30 |
概述 | 本文档是百度语音开放平台HarmonyOS SDK的用户指南,描述了短语音识别、长语音识别等相关接口的使用说明。SDK内部均为采用流式协议,即用户边说边处理。区别于Restapi需要上传整个录音文件。 |
2. 版本说明
名称 | 版本号 |
---|---|
语音识别 | 1.0.0 |
系统支持 | HarmonyOS 5.0.0(APILevel 12)+ |
架构支持 | arm64-v8a,armeabi-v7a |
3. SDK说明
3.1. 开发包说明
文件名称 | 说明 |
---|---|
doc/Baidu_ASR_SDK_Harmony_Manual.md | 本文档 |
har | 语音识别SDK har库 |
BaiduAsrDemo | 开发示例 |
4. 在线识别调用流程
4.1 初始化 SDK
需要传递三个参数,分别为context上下文,产品pid,cuid。
示例代码
Text
1SpeechEventManager.getInstance().initSdk(
2 getContext(this) as common.UIAbilityContext, // context上下文
3 "1234", // 产品pid
4 "cuid" // cuid 可选,非必填
5)
4.2 识别
4.2.1 启动识别
this.listener 和 callback 均是回调,用于传入不同形式的回调,业务方按需使用,仅传一个即可
Text
1let startParams: StartParamsAsr = new StartParamsAsr()
2startParams.pid = 123 as number // 识别环境 - pid:非必填,默认1537
3startParams.authInfo = {ak: "apikey", sk: "secretkey"} // 鉴权参数
4startParams.asrType = SpeechAsrType.SHORT // SpeechAsrType.SHORT: 单次; SpeechAsrType.TOUCH: 长按 SpeechAsrType.MULTI: 全双工; SpeechAsrType.TRANSLITERATE: 长语音转写
5startParams.earlyReturn = 1 // 是否打开提前返回
6startParams.acceptAudioVolume = true // 是否接收音量回调
7
8
9interface Result {
10 word: string[];
11 confident: number[];
12}
13
14interface RecognizeParams {
15 err_no: number;
16 result: Result;
17 asr_align_begin: number;
18 asr_align_end: number;
19 raf: number;
20 early_return_duration_frame: number;
21 corpus_no: number;
22 sn: string;
23 force_align_result: string;
24 confidence_status: number;
25 product_id: number;
26 product_line: string;
27 other_params: string;
28 result_type: string;
29 speak_speed: number;
30 voice_power: number;
31}
32
33SpeechEventManager.getInstance().startAsr(startParams, this.asrListener,
34 (asrState: string, params: string = '', audioData: ArrayBuffer) => {
35 let showMsg = "====" + asrState + ", " + params
36 let msg = "Asr callback: " + showMsg
37 LogUtil.d(msg)
38
39 if (asrState === SpeechAsrState.ASR_AUDIO_DATA) {
40 } else if (asrState === SpeechAsrState.ASR_AUDIO_VOLUME_LEVEL) {
41 } else if (asrState === SpeechAsrState.ASR_READY) {
42
43 } else if (asrState === SpeechAsrState.ASR_PARTIAL) { // 中间结果
44 const parsedResponse = JSON.parse(params) as RecognizeParams
45 console.log("Partial result: " + parsedResponse?.result?.word[0])
46 } else if (asrState === SpeechAsrState.ASR_FINAL) { // 最终结果
47 } else if (asrState === SpeechAsrState.ASR_TTS) { // tts
48 } else if (asrState === SpeechAsrState.ASR_THIRD) { // 三方数据
49 }else if (asrState === SpeechAsrState.ASR_FINISH) { // 识别结束
50 const parsedResponse = JSON.parse(params) as RecognizeParams
51 console.log("Final result: " + parsedResponse?.result?.word[0])
52 } else if (asrState === SpeechAsrState.ASR_EXIT) {
53 }
54 })
4.2.1.1 回调样例
Text
1asr.start, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
2asr.ready, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
3asr.begin, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
4asr.partial, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "err_no": 0, "best_result": "我"}
5asr.partial, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "best_result": "我放"}
6asr.partial, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "best_result": "播放音"}
7asr.partial, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "best_result": "播放音乐"}
8asr.end, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
9asr.final_result, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "best_result": "播放音乐。"}
10asr.finish, {"sn":"a6eb51fb-d44a-465b-9129-9408ae4d7df5","err_no":0,"err":{"errorCode":0,"desc":"Speech Recognize success."}}
11asr.exit, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
12
13识别失败:
14asr.final_result, {"sn": "162a2ef1-4551-41e9-aac3-93496500b409", "err_no": -3005, "err_msg": "asr server not find effective speech"}
4.2.1.1.1 asr.finish
Text
1{
2 sn: string,
3 err_no: number,
4 err: {
5 errcode: number,
6 desc: string
7 }
8}
9
10sn:asr.finish对应的query的sn。
11err_no:错误码,正常识别结束为0
12err:{
13 errcode:错误码,和err_no一致
14 desc:错误描述
15}
4.2.2 动态设置参数
Text
1let configParams: ConfigParamsAsr = new ConfigParamsAsr()
2configParams.enableLongPress = true
3SpeechEventManager.getInstance().configAsr(configParams)
4.2.3 停止识别
Text
1SpeechEventManager.getInstance().stopAsr()
4.2.4 取消识别
Text
1SpeechEventManager.getInstance().cancelAsr()
4.3. 错误码映射
错误事件 | 鸿蒙错误码 | 对应安卓事件 | 安卓错误码 | 描述 |
---|---|---|---|---|
ERROR_VAD_NO_SPEECH | 1001 | ERROR_AUDIO_VAD_NO_SPEECH | 3101 | 没有检测到说话开始 |
ERROR_VAD_INIT_ERROR | 1002 | ERROR_AUDIO_VAD_INCORRECT | 3100 | VAD初始化失败 |
ERROR_NETWORK_FAIL_CONNECT | 2001 | ERROR_NETWORK_FAIL_CONNECT | 2000 | 网络连接失败 |
ERROR_NETWORK_LINK_DOWN | 2002 | ~ | ~ | 网络连接断开(识别中,系统ws触发close回调) |
ERROR_NETWORK_ERROR | 2100 | ERROR_NETWORK_NOT_AVAILABLE | 2100 | 网络错误 |
ERROR_AUDIO_RECORDER_OPEN | 3001 | ERROR_AUDIO_RECORDER_OPEN | 3001 | 录音机打开失败 |
ERROR_USER_CANCEL | 7002 | ERROR_EMPTY_RESULT | 7002 | 用户调用exitAsr |
ERROR_AUDIO_RECORDER_NO_PERMISSION | 9001 | ERROR_NO_RECORD_PERMISSION | 9001 | 没有录音机权限 |
4.4 Debug功能
4.4.1 日志打印
Text
1// 设置关闭日志。
2LogUtil.isLog = false;
3// 设置打开日志。
4LogUtil.isLog = true;
4.4.2 debug音频保存
Text
1// 设置保存路径,即可保存debug音频
2fileRootPath = this.context.filesDir
3ConfigUtil.setDebugAudioPath(fileRootPath)