嗨! Siri: 常見的語音辨識系統在識別異常嗓音的有效率
Hey Siri: How Effective are Common Voice Recognition Systems at Recognizing Dysphonic Voices
Volume 5, Issue 1
關鍵字 :
嗓音異常,語音辨識,聲音沙啞,手機,科技, Dysphonia, voice recognition, hoarseness, mobile phone, technology
作者 :
Matthew L. Rohlfing, MD ; Daniel P. Buckley, MS, CCC-SLP; Jacquelyn Piraquive, MD;Cara E. Stepp, PhD; Lauren F. Tracy, MD
譯者 :
臺中榮民總醫院 王仲祺醫師
摘要:
Objectives/Hypothesis: Interaction with voice recognition systems, such as Siri™ and Alexa™, is an increasingly important part of everyday life. Patients with voice disorders may have difficulty with this technology, leading to frustration and reduction in quality of life. This study evaluates the ability of common voice recognition systems to transcribe dysphonic voices.
Study Design: Retrospective evaluation of "Rainbow Passage" voice samples from patients with and without voice disorders.
Methods: Participants with (n = 30) and without (n = 23) voice disorders were recorded reading the “Rainbow Passage”. Recordings were played at standardized intensity and distance-to-dictation programs on Apple iPhone 6S™, Apple iPhone 11 Pro™, and Google Voice™. Word recognition scores were calculated as the proportion of correctly transcribed words. Word recognition scores were compared to auditory–perceptual and acoustic measures.
Results: Mean word recognition scores for participants with and without voice disorders were, respectively, 68.6% and 91.9% for Apple iPhone 6S™ (P < .001), 71.2% and 93.7% for Apple iPhone 11 Pro™ (P < .001), and 68.7% and 93.8% for Google Voice™ (P < .001). There were strong, approximately linear associations between CAPE-V ratings of overall severity of dysphonia and word recognition score, with correlation coefficients (R2) of 0.609 (iPhone 6S™), 0.670 (iPhone 11 Pro™), and 0.619 (Google Voice™). These relationships persisted when controlling for diagnosis, age, gender, fundamental frequency, and speech rate (P < .001 for all systems).
Conclusion: Common voice recognition systems function well with nondysphonic voices but are poor at accurately transcribing dysphonic voices. There was a strong negative correlation with word recognition scores and perceptual voice evaluation. As our society increasingly interfaces with automated voice recognition technology, the needs of patients with voice disorders should be considered.
專家評論:
嗨! Siri: 常見的語音辨識系統在識別異常嗓音的有效率
臺中榮民總醫院 王仲祺醫師
近年來嗓音異常的罹病率越來越高,2012 年美國健康訪談調查顯示每13名成年人中就有1 人有嗓音異常。而嗓音異常會影響人際間的溝通,並被證明會對健康的功能、社交、情感和生理機能等面向產生負面影響。Benninger等人也發現有嗓音異常者,相對患有坐骨神經痛、背痛和心絞痛等其他慢性疾病患者的社會功能差。原因可能來自嗓音異常患者的聲音強度降低、聲門閉合不佳,導致說高頻音或子音時氣流混亂不清,因此語音可被理解性降低所致。然而在過去十年,雲計算及機器學習等電腦科技不斷進步,電腦語音辨識系統功能也日漸強大,智能手機更成為大多數人日常生活最主要使用的電腦設備。再者,各種智能揚聲器和語音辨識助手如 Apple 的 Siri™、Google Assistant™ 和 Amazon 的Alexa™ 等的出現也進一步擴展了各種語音使用體驗。根據Google的報告,全球約有29%的網路使用者會使用語音在各種移動電腦設備上進行線上搜尋。而本研究主旨在評估常見的語音辨識系統在識別異常嗓音時的正確性。其次,作者也想衡量哪些聽覺感知面向或聲學參數對語音識別系統準確性的影響較大。
作者錄音記錄30位嗓音異常患者和23位嗓音正常者朗誦英文短篇“Rainbow Passage”最前段的98字,然後用65到70分貝的音量以Bose 揚聲器在距離Apple iPhone 6S™、Apple iPhone11 Pro™ 和蘋果電腦上的Google Voice™ 等電腦設備12英吋處播放,並計算語音辨識的正確性;正確性以”單字識別率”呈現,計算方法為正確轉錄朗誦單字的比例。這些朗誦錄音除了有聲學評估如嗓音基本頻率及其標準差等實驗室分析數據,也由兩位訓練中的喉科醫師判讀,並根據語音聽覺評估量表(CAPEV)對嗓音整體、嗓音粗糙度、嗓音氣息度、嗓音緊張度、嗓音音高、嗓音聲量等各面向予以評分。結果嗓音異常者和嗓音正常者的單字識別率在Apple iPhone 6S™ 分別為68.6%和91.9%(p<0.01,有顯著差異);在Apple iPhone 11 Pro™分別為71.2%和93.7% (P<.001); 在Google Voice™分別為68.7%和93.8%(P <.001);而”單字識別率”和語音聽覺評估量表CAPE-V5中的嗓音整體嚴重程度之間存在很強的近似線性相關。相關係數(R2)在iPhone 6S™為0.609、在iPhone 11 Pro™為0.670,在Google Voice™ 為0.619。他們的結論是,常見的語音辨識系統在處理沒有嗓音障礙時的語音效能良好,但有嗓音障礙的患者在使用時表現不佳。而語音聽覺評估的好壞和”單字識別率”有顯著相關。當我們的社會越來越頻繁使用語音辨識技術於日常生活時,嗓音異常患者在此方面的需求值得被進一步關注。
此篇文獻針對嗓音錄音後用於語音辨識的效能做了完整的分析,但因為COVID-19疫情,無法請聲量大小不同的受試者直接進行測試。所以結果並無法呈現聲量大小對”單字識別率”的影響。此外有構音問題的病人或非以英文為母語的人之錄音也被排除在研究樣本之外,因此這兩類情況的影響也無法以此文呈現。但如同肢體障礙患者的行動需要一些無障礙設施輔助;未來對於嗓音異常患者,社會也應該考慮提高語音辨識系統的效能,以協助排除嗓音患者在日常生活可能遭遇的障礙。
國際神經監測學組指引2018:第二部分:甲狀腺癌手術最佳的喉返神經處理-結合術中、喉及神經電生理訊號等數據
Volume: 4 Issue: 4
查看更多以整合分析方式研究年輕及年長之口腔癌病患存活比較
Volume: 5 Issue: 1
查看更多頭頸部鱗狀細胞癌發生單一遠隔器官轉移之治療
Volume: 5 Issue: 1
查看更多全喉切除術後使用大腿前外側皮瓣重建發聲管的說話表現
Volume: 5 Issue: 1
查看更多第二型糖尿病患者醫源性喉氣管狹窄的纖維母細胞表徵特性研究
Volume: 5 Issue: 1
查看更多