當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hi Azure. 从零开始打造一个语音机器人，跟你的电脑聊聊天。

發布時間：2024/5/17 编程问答 77 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hi Azure. 从零开始打造一个语音机器人，跟你的电脑聊聊天。小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

這篇文章帶大家來做一個簡單的聊天機器人，碰巧遇到微軟Azure招募開發者，可以免費試用人工智能服務，果斷申請。

先決條件：

熟悉 Azure 服務和 Azure 門戶（也可以用其它的AI開放平臺）

擁有 Python 編程經驗

熟悉API調用

先來看效果：

逐漸逼瘋圖靈機器人的聊天

序言

首先需要了解一下做對話機器人需要涉及到的技術，可以看一下這個流程圖。

理清思路之后，我們發現涉及到的技術有三種：語音識別、對話式問答、語音合成。

簡單解釋一下這三個技術：

語音識別：準確地將口述語音聽錄為文本，功能演示

對話式問答：提取文本信息，并將其轉化為一目了然的對話式答案

語音合成：將文本轉換為逼真的語音，功能演示

目前來說最流行的語音算法主要是依賴于深度學習的神經網絡算法，深度學習的應用真正讓語音識別/合成達到了商用級別。

不過如果要自己做一個語音識別系統，從零開始訓練語言模型勢必會非常耗時，雖然有很多預訓練的模型，但是對于沒有深度學習基礎的同學來說還是比較困難，所以我們借助Azure的認知服務開源的語音服務API來是想這些功能。

首先需要使用你的 Microsoft 帳戶登錄到 Azure 門戶。

然后創建一個認知服務資源。

創建并且等待部署完成后，在左側窗格中的“資源管理”下，選擇“密鑰和終結點” 。

每個訂閱有兩個密鑰；可在應用程序中使用任意一個密鑰。

語音識別+語音合成

安裝和導入語音 SDK

需要先安裝語音 SDK，然后才能執行任何操作。

pip install azure-cognitiveservices-speech

安裝語音 SDK 后，將其導入到 Python 項目中。

import azure.cognitiveservices.speech as speechsdk

創建語音識別配置

若要使用語音 SDK 調用語音服務，需要創建 SpeechConfig。此類包含有關訂閱的信息，例如密鑰和關聯的位置/區域、終結點、主機或授權令牌。

import azure.cognitiveservices.speech as speechsdk# 需要填入自己的密鑰和位置 speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion" speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region,speech_recognition_language="zh-cn")# Creates a recognizer with the given settings speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

從麥克風識別

若要使用設備麥克風識別語音，只需創建 SpeechRecognizer（無需傳遞 AudioConfig），并傳遞 speech_config。

import azure.cognitiveservices.speech as speechsdkdef from_mic():speech_config = speechsdk.SpeechConfig(subscription="<paste-your-speech-key-here>", region="<paste-your-speech-location/region-here>")speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)print("Speak into your microphone.")result = speech_recognizer.recognize_once_async().get()print(result.text)from_mic()

錯誤處理

前面的示例只從 result.text 獲取已識別的文本，但要處理錯誤和其他響應，需要編寫一些代碼來處理結果。以下代碼評估 result.reason 屬性并：

輸出識別結果：speechsdk.ResultReason.RecognizedSpeech
如果沒有識別匹配項，通知用戶：speechsdk.ResultReason.NoMatch
如果遇到錯誤，則輸出錯誤消息：speechsdk.ResultReason.Canceled

if result.reason == speechsdk.ResultReason.RecognizedSpeech:print("Recognized: {}".format(result.text)) elif result.reason == speechsdk.ResultReason.NoMatch:print("No speech could be recognized: {}".format(result.no_match_details)) elif result.reason == speechsdk.ResultReason.Canceled:cancellation_details = result.cancellation_detailsprint("Speech Recognition canceled: {}".format(cancellation_details.reason))if cancellation_details.reason == speechsdk.CancellationReason.Error:print("Error details: {}".format(cancellation_details.error_details))

語音合成

同樣，我們可以搭建一個語音合成的功能。

def speak(text_words):speech_config.speech_synthesis_language = "zh-cn"audio_config = AudioOutputConfig(use_default_speaker=True)speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)result = speech_synthesizer.speak_text_async(text_words).get()# Checks result.if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:print("Speech synthesized to speaker for text [{}]".format(text_words))elif result.reason == speechsdk.ResultReason.Canceled:cancellation_details = result.cancellation_detailsprint("Speech synthesis canceled: {}".format(cancellation_details.reason))if cancellation_details.reason == speechsdk.CancellationReason.Error:if cancellation_details.error_details:print("Error details: {}".format(cancellation_details.error_details))print("Did you update the subscription info?")

圖靈機器人服務構建聊天功能

對話式機器人現在已經成為相對成熟的AI解決方案，機器人以聊天的方式與用戶交互，比如edge自帶的小冰虛擬女友：

為了實現這種對話式的聊天功能，我們需要借助圖靈機器人開放平臺，通過圖靈機器人開放平臺，軟硬件產品開發者可快速為自己的產品接入一款具備個性化身份屬性特征、滿足不同場景多輪對話及上下文對話的人工智能機器人，實現產品的對話式交互。

創建機器人

進入圖靈機器人管理控制臺后。點擊“創建機器人”，按要求填寫機器人名稱及應用信息，即可完成創建，當前每個賬號只能創建5個機器人。

然后在機器人設置中可以定義機器人的一些屬性。

API接入

目前圖靈機器人API接口可調用聊天對話、語料庫、技能三大模塊的語料：

聊天對話是指平臺免費提供的近10億條公有對話語料，滿足用戶對話娛樂需求；
語料庫是指用戶在平臺上傳的私有語料，僅供個人查看使用，幫助用戶最便捷的搭建專業領域次的語料。
技能服務是指平臺打包的26種實用服務技能。涵蓋生活、出行、購物等多個領域，一站式滿足用戶需求。

編碼方式
UTF-8（調用圖靈API的各個環節的編碼方式均為UTF-8）

接口地址
http://openapi.turingapi.com/openapi/api/v2

請求方式
HTTP POST

請求參數
請求參數格式為 json

請求示例：

{"reqType":0,"perception": {"inputText": {"text": "附近的酒店"},"inputImage": {"url": "imageUrl"},"selfInfo": {"location": {"city": "北京","province": "北京","street": "信息路"}}},"userInfo": {"apiKey": "","userId": ""} }

參數說明

參數類型是否必須取值范圍說明

reqType	int	N	-	輸入類型:0-文本(默認)、1-圖片、2-音頻
perception	-	Y	-	輸入信息
userInfo	-	Y	-	用戶參數

perception

參數類型是否必須取值范圍說明

inputText	-	N	-	文本信息
inputImage	-	N	-	圖片信息
inputMedia	-	N	-	音頻信息
selfInfo	-	N	-	客戶端屬性

注意：輸入參數必須包含inputText或inputImage或inputMedia！
inputText

參數類型是否必須取值范圍說明

text

String

1-128字符

直接輸入文本

inputImage

參數類型是否必須取值范圍說明

url

String

圖片地址

inputMedia

參數類型是否必須取值范圍說明

url

String

音頻地址

selfInfo

參數類型是否必須取值范圍說明

location

地理位置信息

location

參數類型是否必須取值范圍說明

city	String	Y	-	所在城市
province	String	N	-	省份
street	String	N	-	街道

userInfo

參數類型是否必須取值范圍說明

apiKey	String	Y	32位	機器人標識
userId	String	Y	長度小于等于32位	用戶唯一標識
groupId	String	N	長度小于等于64位	群聊唯一標識
userIdName	String	N	長度小于等于64位	群內用戶昵稱

輸出示例：

{"intent": {"code": 10005,"intentName": "","actionName": "","parameters": {"nearby_place": "酒店"}},"results": [{"groupType": 1,"resultType": "url","values": {"url": "http://m.elong.com/hotel/0101/nlist/#indate=2016-12-10&outdate=2016-12-11&keywords=%E4%BF%A1%E6%81%AF%E8%B7%AF"}},{"groupType": 1,"resultType": "text","values": {"text": "親，已幫你找到相關酒店信息"}}] }

參數說明

參數類型是否必須取值范圍說明

intent	-	Y	-	請求意圖
results	-	N	-	輸出結果集

intent

參數類型是否包含取值范圍說明

code	int	Y	-	輸出功能code
intentName	String	N	-	意圖名稱
actionName	String	N	-	意圖動作名稱
parameters	Map	N	-	功能相關參數

results

參數類型是否包含取值范圍說明

resultType	String	Y	文本(text);連接(url);音頻(voice);視頻(video);圖片(image);圖文(news)	輸出類型
values	-	Y	-	輸出值
groupType	int	Y	-	‘組’編號:0為獨立輸出，大于0時可能包含同組相關內容 (如：音頻與文本為一組時說明內容一致)

Code

def chat(text_words=""):api_key = "1dde879fa**********1ee88269f3852"api_url = "http://openapi.tuling123.com/openapi/api/v2"headers = {"Content-Type": "application/json;charset=UTF-8"}req = {"reqType": 0,"perception": {"inputText": {"text": text_words},"selfInfo": {"location": {"city": "天津","province": "天津","street": "天津科技大學"}}},"userInfo": {"apiKey": api_key,"userId": "Alex"}}req["perception"]["inputText"]["text"] = text_wordsresponse = requests.request("post", api_url, json=req, headers=headers)response_dict = json.loads(response.text)result = response_dict["results"][0]["values"]["text"]print("AI Robot said: " + result)return result

匯總

最后，把語音識別、語音合成的代碼放在一起，再加上一個簡單的流程控制，所有的代碼加起來不超過一百行，一個能跟你聊天的機器人就完成了。

import jsonimport requests import azure.cognitiveservices.speech as speechsdkfrom azure.cognitiveservices.speech.audio import AudioOutputConfigspeech_key, service_region = "6d5afa148**********5d8801db8e408", "chinanorth2" speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region,speech_recognition_language="zh-cn")def record():# Creates a recognizer with the given settingsspeech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)print("Say something...")result = speech_recognizer.recognize_once()# Checks result.if result.reason == speechsdk.ResultReason.RecognizedSpeech:print("Recognized: {}".format(result.text))elif result.reason == speechsdk.ResultReason.NoMatch:print("No speech could be recognized: {}".format(result.no_match_details))elif result.reason == speechsdk.ResultReason.Canceled:cancellation_details = result.cancellation_detailsprint("Speech Recognition canceled: {}".format(cancellation_details.reason))if cancellation_details.reason == speechsdk.CancellationReason.Error:print("Error details: {}".format(cancellation_details.error_details))return result.textdef chat(text_words=""):api_key = "1dde879f**********f1ee88269f3852"api_url = "http://openapi.tuling123.com/openapi/api/v2"headers = {"Content-Type": "application/json;charset=UTF-8"}req = {"reqType": 0,"perception": {"inputText": {"text": text_words},"selfInfo": {"location": {"city": "天津","province": "天津","street": "天津科技大學"}}},"userInfo": {"apiKey": api_key,"userId": "Alex"}}req["perception"]["inputText"]["text"] = text_wordsresponse = requests.request("post", api_url, json=req, headers=headers)response_dict = json.loads(response.text)result = response_dict["results"][0]["values"]["text"]print("AI Robot said: " + result)return resultdef speak(text_words):speech_config.speech_synthesis_language = "zh-cn"audio_config = AudioOutputConfig(use_default_speaker=True)speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)result = speech_synthesizer.speak_text_async(text_words).get()# Checks result.if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:print("Speech synthesized to speaker for text [{}]".format(text_words))elif result.reason == speechsdk.ResultReason.Canceled:cancellation_details = result.cancellation_detailsprint("Speech synthesis canceled: {}".format(cancellation_details.reason))if cancellation_details.reason == speechsdk.CancellationReason.Error:if cancellation_details.error_details:print("Error details: {}".format(cancellation_details.error_details))print("Did you update the subscription info?")if __name__ == '__main__':while True:text = record()if "退出" in text:breakres = chat(text)speak(res)

代碼地址：https://github.com/Matrix-King-Studio/IntelligentVoiceRobot
演示視頻：https://www.bilibili.com/video/BV1BU4y1F7s7/

總結

以上是生活随笔為你收集整理的Hi Azure. 从零开始打造一个语音机器人，跟你的电脑聊聊天。的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：如何让Edge浏览器整体作为一个窗口，而
下一篇： unable to access ‘ht