End-to-End ASR-Enhanced Neural Network for Alzheimer's Disease Diagnosis
Alzheimer's disease (AD) is a chronic neurodegenerative disorder, and accurate diagnosis is crucial. This research introduces an end-to-end ASR-enhanced neural network that leverages acoustic and text-based features for AD classification. By incorporating ASR technology, the system aims to automatically learn discriminative acoustic representations from speech cues without relying on precise transcripts. Experimental results demonstrate the effectiveness of this approach on a dataset of Chinese speech samples. The study offers insights into the potential of ASR in enhancing Alzheimer's disease diagnosis methods.
Uploaded on Feb 21, 2025 | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
END-TO-END ASR-ENHANCED NEURAL NETWORK FOR ALZHEIMER S DISEASE DIAGNOSIS Jiancheng Gui, Yikai Li, Kai Chen, Joanna Siebert, Qingcai Chen Harbin Institute of Technology (Shenzhen) ICASSP 2022
Introduction Background Alzheimer s disease (AD) is an irreversible chronic neurodegenerative disorder Alzheimer s disease(AD), mild cognitive impairment(MCI), and healthy controls(HC) Biomarkers Fluid and imaging examinations, such as CSF, PET, MRI, EEG, and MEG Cues of speech in Alzheimer s acoustic-based features: pitch, articulation rate, duration of breaks, and syllable duration, etc text-based features: lexical and grammar deficits, lexical complexity, etc
Related Works Linguistic methods Based on pretrained language model GPT, BERT, ERNIE, etc Extract hand-crafted features Sentence structure, repetitiveness, and lexical complexity features, etc Phonetic methods End-to-end DNN, CNN-LSTM, ResNet18, etc Extract hand-crafted features LLD(Low Level Descriptors), eGeMAPS, BoAW, etc
Motivation Defects of linguistic methods Manual transcripts are rarely available Trustless transcribed text leads to poor performance No specific ASR system for Alzheimer s speech Propose Using ASR to learn the semantic cues of Alzheimer s speech End-to-end structure to automatically learn discriminative acoustic representations
Experiments Dataset Train set: 280 Chinese speech samples collected from picture descriptions, fluency tests, and free conversation tasks. Their lengths ranged from 30 to 60 s. Test set: 119 long (60-s) and 1153 short (6-s) utterances. Details Pretrained ASR model: Wenet/multi_cn examples Transcripts: iFLYTEK ASR API Training and testing on 6-s utterances
Results BN: bottleneck features from ASR encoder Manual: 1582 dim acoustic features using openSMILE
Conclusion Contribution Incorporating ASR into AD classification task Dose not rely on reliable transcripts Limitations Small data size No other datasets were attempted Future work Try a dataset in another language Investigate the reasons for which the ASR-enhanced model can tolerate articulatory errors Elaborately devise a classification network