當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

AXE模式隐私号基于语音流分析的用户接听识别方案

發布時間：2023/12/31 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 AXE模式隐私号基于语音流分析的用户接听识别方案小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

背景

在使用AXE模式隱私號外呼用戶時發現幾家隱私號服務提供商并不是都有接通回調可以設置
所以需要設置通用的用戶接聽識別方案(錄音和播報歡迎語等場景)

目的

在接入語音模型訓練之前通過波形準確識別嘟嘟嘟和彩鈴覆蓋90%以上的case

調研

VAD:
語音活動檢測(Voice Activity Detection,VAD)又稱語音端點檢測,語音邊界檢測。目的是從聲音信號流里識別和消除長時間的靜音期，以達到在不降低業務質量的情況下節省話路資源的作用，它是IP電話應用的重要組成部分。靜音抑制可以節省寶貴的帶寬資源，可以有利于減少用戶感覺到的端到端的時延。

TarsosDSP:
git 地址: https://github.com/JorenSix/TarsosDSP
TarsosDSP is a Java library for audio processing. Its aim is to provide an easy-to-use interface to practical music processing algorithms implemented, as simply as possible, in pure Java and without any other external dependencies. The library tries to hit the sweet spot between being capable enough to get real tasks done but compact and simple enough to serve as a demonstration on how DSP algorithms works. TarsosDSP features an implementation of a percussion onset detector and a number of pitch detection algorithms: YIN, the Mcleod Pitch method and a “Dynamic Wavelet Algorithm Pitch Tracking” algorithm. Also included is a Goertzel DTMF decoding algorithm, a time stretch algorithm (WSOLA), resampling, filters, simple synthesis, some audio effects, and a pitch shifting algorithm.

回鈴音:
表示被叫用戶處于被振鈴狀態，采用頻率為450±25Hz的交流電源，發送電平為-10±3dBm，它是5s斷續的信號音，即1s送，4s斷，與振鈴音一致。

彩鈴音:
連續不間斷的音樂波形

思路

根據對波形的分析從左到右分為三段分別為

“請輸入四位分機號以#號鍵結束”

“振鈴嘟嘟嘟”

“用戶說話”

所以目的分為三步
4. 跳過特定時長繞過輸入分機號的播報
5. 對沉默后的第一段活躍做檢測去匹配彩鈴特征或者嘟聲特征
6. 找到跳出特征的時刻就是用戶接聽的時刻

代碼實現

使用TarsosDSP提供的靜音檢測能力和頻率識別能力
注意要自己引入一下依賴 tarsos包在上面調研的tarsos介紹的git地址里

調用:

public static void main (String[] args){PickUp pickUp = new PickUp("xxx.wav", 8000, 16, 1000, 4500);pickUp.start();System.exit(-1);}

PickUp:

package xxx;import be.tarsos.dsp.AudioDispatcher; import be.tarsos.dsp.AudioEvent; import be.tarsos.dsp.AudioProcessor; import be.tarsos.dsp.SilenceDetector; import be.tarsos.dsp.io.TarsosDSPAudioFloatConverter; import be.tarsos.dsp.io.TarsosDSPAudioFormat; import be.tarsos.dsp.io.UniversalAudioInputStream; import be.tarsos.dsp.pitch.PitchDetectionHandler; import be.tarsos.dsp.pitch.PitchDetectionResult; import be.tarsos.dsp.pitch.PitchProcessor; import java.io.*; import java.util.concurrent.ConcurrentLinkedQueue;public class PickUp {public enum RingbackType {UNCHECK,DU_NORMALITY,DU_OTHER,SONG;}private ConcurrentLinkedQueue<byte[]> audioQueue = new ConcurrentLinkedQueue<byte[]>();private boolean isFinishReadFile = false; // 是否讀取完文件private String filePath;private String fileName;private int readLength = 1600; // 100ms音頻的字節數private int noinputTimeout = 1000; //跳過開始多少msprivate int silenceMaxTimes = 10; // 以100ms為單位檢測連續的多少次靜音private float sampleRate = 8000; // 采樣率private int sampleSizeInBits = 16; //位深度/*** 用戶接聽檢測* @param filePath 文件路徑* @param sampleRate 采樣率* @param sampleSizeInBits 位深度* @param noinputTimeout 需要跳過多久時長開始檢測* @param silenceTimeout 默認沉默多久結束(兜底)** @date 檢測方式:1.嘟嘟嘟采用450HZ的頻率檢測 2.彩鈴采用連續活躍進行檢測*/public PickUp(String filePath, float sampleRate, int sampleSizeInBits, int noinputTimeout, int silenceTimeout) {this.filePath = filePath;this.sampleRate = sampleRate;this.sampleSizeInBits = sampleSizeInBits;//根據參數計算100ms音頻的字節數this.readLength = (int)sampleRate*(sampleSizeInBits/8)/10;this.noinputTimeout = noinputTimeout;//計算檢測幾個 100毫秒單位長度this.silenceMaxTimes = (int)silenceTimeout/100;}public void start() {File audioFile = new File(this.filePath);FileInputStream fis;try {audioQueue.clear();fileName = audioFile.getName();isFinishReadFile = false;Thread sttThread = new Thread(vadRunbale);sttThread.start();fis = new FileInputStream(audioFile);byte[] byteArr = new byte[this.readLength];int size;fis.skip(44);while ((size = fis.read(byteArr)) != -1) {audioQueue.add(byteArr.clone());}while (!audioQueue.isEmpty() && !isFinishReadFile) {Thread.sleep(2000);}isFinishReadFile = true;fis.close();while (sttThread.isAlive()) {Thread.sleep(2000);}//在這里回調System.out.println("正常結束");} catch (FileNotFoundException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();} catch (InterruptedException e) {e.printStackTrace();}}private Runnable vadRunbale = new Runnable() {volatile int countHZ = 0;volatile int count450HZ = 0;@Overridepublic void run() {RingbackType ringbackType = RingbackType.UNCHECK;int currentPartTime = 0, silenceTimes = 0, firstActiveTimes = 0, differentCount = 0;try {// 使用tarsos檢測靜音TarsosDSPAudioFormat tdspFormat = new TarsosDSPAudioFormat(sampleRate, sampleSizeInBits, 1, true, false);float[] voiceFloatArr = new float[readLength / tdspFormat.getFrameSize()];while (!isFinishReadFile) {// 條件是主動結束,并且隊列中已經沒有數據byte[] data = audioQueue.poll();if (data == null) {Thread.sleep(50);continue;}TarsosDSPAudioFloatConverter.getConverter(tdspFormat).toFloatArray(data.clone(),voiceFloatArr);SilenceDetector silenceDetector = new SilenceDetector();boolean isSlience = silenceDetector.isSilence(voiceFloatArr);//以100ms為單位多次檢測靜音if ((currentPartTime+=100) >= noinputTimeout) {boolean checkHZ = false;if (isSlience) {if(firstActiveTimes == 0){System.out.println("活動前靜音,忽略");continue;}System.out.println("檢測到靜音"+ringbackType);// 檢測連續靜音到達最大值結束if(++silenceTimes >=silenceMaxTimes){isFinishReadFile = true;//檢測到靜音就不需要等待文件讀取完成}switch(ringbackType){case UNCHECK:if(countHZ==count450HZ){if(countHZ<=11){ringbackType = RingbackType.DU_NORMALITY;//中國標準為嘟1s 停4ssilenceMaxTimes = 41;}else {ringbackType = RingbackType.DU_OTHER;checkHZ = true;}}break;case DU_OTHER:checkHZ = true;//連續3個打破特征跳出if(countHZ!=count450HZ){differentCount++;count450HZ = countHZ;}else {differentCount = 0;}if(differentCount>=3){isFinishReadFile = true;}//嘟聲啟動hz檢查checkHZ = true;break;case SONG://持續音樂中斷isFinishReadFile = true;break;default:break;}} else {System.out.println("活動狀態"+ringbackType);switch(ringbackType){case UNCHECK:firstActiveTimes++;//首次活躍大于兩秒,判定為音樂if(firstActiveTimes>=20){ringbackType = RingbackType.SONG;}//首次活躍開始啟動HZ檢查checkHZ = true;break;case DU_NORMALITY://沉默時長小于40if(silenceTimes!=0 &&silenceTimes<35){isFinishReadFile = true;}//不break繼續執行case DU_OTHER://連續3個打破特征跳出if(countHZ!=count450HZ){differentCount++;count450HZ = countHZ;}else {differentCount = 0;}if(differentCount>=3){isFinishReadFile = true;}//嘟聲啟動hz檢查checkHZ = true;break;default:break;}//重置靜音次數silenceTimes = 0;}//做HZ檢查if(checkHZ && !isFinishReadFile){//做HZ判斷AudioDispatcher dispatcher = new AudioDispatcher(new UniversalAudioInputStream(new ByteArrayInputStream(data), tdspFormat), data.length, 0);AudioProcessor audioProcessor = new PitchProcessor(PitchProcessor.PitchEstimationAlgorithm.FFT_YIN, 8000, data.length, new PitchDetectionHandler(){@Overridepublic void handlePitch(PitchDetectionResult pitchDetectionResult, AudioEvent audioEvent) {countHZ++;float pitch = pitchDetectionResult.getPitch();System.out.println(pitch+"HZ");if(pitch>445&&pitch<455){count450HZ++;}}});dispatcher.addAudioProcessor(audioProcessor);dispatcher.run();}}}System.out.println(fileName+"退出,位置為"+currentPartTime/10+" "+ringbackType);} catch (Exception e) {e.printStackTrace();}}};}

效果測試

回鈴音

每0.1秒打印一次日志頻率特征符合預期響1s停4s符合預期

彩鈴音

每0.1秒打印一次日志特征識別為音樂特征結束符合實際接聽時間(對應上面的彩鈴音波形圖)

總結

以上是生活随笔為你收集整理的AXE模式隐私号基于语音流分析的用户接听识别方案的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： rdkit Recap、BRICS分子片
下一篇： SVN 合并分支