Comparison of voice activity detection algorithms for Parkinson's speech
Human conversation has pauses between sentences and words. Voice activity detection (VAD) is a technique used to detect the gaps in human speech. This technique is widely used in speech processing to avoid unnecessary coding/transmission of non-speech signals. There are few VAD algorithms available in the public domain. This project aims to compare the performance of these VAD algorithms for use in objective speech analysis for Parkinson’s disease (PD).
Speech data from recorded from 5 PD participants as part of a previous study was used.
Speech files were imported into Audacity and silence periods manually labelled. Three VAD algorithms were implemented in MATLAB and outputs compared with factory standard labels. Performance was defined by calculating false positives and negatives. I also investigated the influence of three different kind of background noise. VAD algorithms has been tried to improve by using majority rules.
- The algorithm specified in the “G.729” standard does not work well under background noise.
- Filter can improve performance under white noise.
- Majority rule can improve the result when background noise is continual.
- Late stage of Parkinson’s speech increases performance error.
The speech recording environment is important for determining the type of analyses to be used. Current majority rules can provide up to 95% accuracy compared to factory standard detected outcomes.