Stökkva yfir í aðalyfirlit Stökkva yfir í leit Stökkva yfir í aðalefni

Modal and nonmodal voice quality classification using acoustic and electroglottographic features

  • Michal Borsky
  • , Daryush D. Mehta
  • , Jarrad H. Van Stan
  • , Jon Gudnason

Rannsóknarafurð: Framlag til fræðitímaritsGreinritrýni

Útdráttur

The goal of this study was to investigate the performance of different feature types for voice quality classification using multiple classifiers. The study compared the COVAREP feature set; which included glottal source features, frequency warped cepstrum, and harmonicmodel features; against the mel-frequency cepstral coefficients (MFCCs) computed from the acoustic voice signal, acoustic-based glottal inverse filtered (GIF) waveform, and electroglottographic (EGG) waveform. Our hypothesis was that MFCCs can capture the perceived voice quality fromeither of these three voice signals. Experiments were carried out on recordings from 28 participants with normal vocal status who were prompted to sustain vowels with modal and nonmodal voice qualities. Recordings were rated by an expert listener using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), and the ratings were transformed into a dichotomous label (presence or absence) for the prompted voice qualities ofmodal voice, breathiness, strain, and roughness. The classification was done using support vector machines, random forests, deep neural networks, and Gaussian mixture model classifiers, which were built as speaker independent using a leave-one-speaker-out strategy. The best classification accuracy of 79.97% was achieved for the full COVAREP set. The harmonic model features were the best performing subset, with 78.47% accuracy, and the static+dynamic MFCCs scored at 74.52%. A closer analysis showed that MFCC and dynamic MFCC features were able to classify modal, breathy, and strained voice quality dimensions fromthe acoustic and GIF waveforms. Reduced classification performance was exhibited by the EGG waveform.

Upprunalegt tungumálEnska
Númer greinar8114356
Síður (frá-til)2281-2291
Síðufjöldi11
FræðitímaritIEEE/ACM Transactions on Audio Speech and Language Processing
Bindi25
Númer tölublaðs12
DOI
ÚtgáfustaðaÚtgefið - des. 2017

Athugasemd

Publisher Copyright: © 2017 IEEE.

Fingerprint

Sökktu þér í rannsóknarefni „Modal and nonmodal voice quality classification using acoustic and electroglottographic features“. Saman myndar þetta einstakt fingrafar.

Vitna í þetta