Audio networks for speech enhancement and indexing

Kühnapfel, Thorsten

dc.contributor.author	Kühnapfel, Thorsten
dc.contributor.supervisor	Prof. Svetha Venkatesh
dc.contributor.supervisor	Assoc. Prof. Tele Tan
dc.date.accessioned	2017-01-30T09:47:12Z
dc.date.available	2017-01-30T09:47:12Z
dc.date.created	2010-05-27T03:25:23Z
dc.date.issued	2009
dc.identifier.uri	http://hdl.handle.net/20.500.11937/206
dc.description.abstract	For humans, hearing is the second most important sense, after sight. Therefore, acoustic information greatly contributes to observing and analysing an area of interest. For this reason combining audio and video cues for surveillance enhances scene understanding and the observed events. However, when combining different sensors their measurements need to be correlated, which is done by either knowing the exact relative sensor alignment or learning a mapping function. Most deployed systems assume a known relative sensor alignment, making them susceptible to sensor drifts. Additionally, audio recordings are generally a mixture of several source signals and therefore need to be processed to extract a desired sound source, such as speech of a target person.In this thesis a generic framework is described that captures, indexes and extracts surveillance events from coordinated audio and video cues. It presents a dynamic joint-sensor calibration approach that uses audio-visual sensor measurements to dynamically and incrementally learn the calibration function, making the sensor calibration resilient to independent drifts in the sensor suite. Experiments demonstrate the use of such a framework for enhancing surveillance.Furthermore, a speech enhancement approach is presented based on a distributed network of microphones, increasing the effectiveness for acoustic surveillance of large areas. This approach is able to detect and enhance speech in the presence of rapidly changing environmental noise. Spectral subtraction, a single channel speech enhancement approach, is modified to adapt quickly to rapid noise changes of two common noise sources by incorporating multiple noise models. The result of the cross correlation based noise classification approach is also utilised to improve the voice activity detection by minimising false detection based on rapid noise changes. Experiments with real world noise consisting of scooter and café noise have proven the advantage of multiple noise models especially when the noise changes during speech.The modified spectral subtraction approach is then extended to real world scenarios by introducing more and highly non-stationary noise types. Thus, the focus is directed to implement a more sophisticated noise classification approach by extracting a variety of acoustic features and applying PCA transformation to compute the Mahalanobis distance to each noise class. This distance measurement is also included in the voice activity detection algorithm to reduce false detection for highly non-stationary noise types. However, using spectral subtraction in non-stationary noise environments, such as street noise, reduces the performance of the speech enhancement. For that reason the speech enhancement approach is further improved by using the sound information of the entire network to update the noise model of the detected noise type during speech. This adjustment considerably improved the speech enhancement performance in non-stationary noise environments. Experiments conducted under diverse real world conditions including rapid noise changes and non-stationary noise sources demonstrate the effectiveness of the presented method.
dc.language	en
dc.publisher	Curtin University
dc.subject	sensor drifts
dc.subject	acoustic information
dc.subject	relative sensor alignment
dc.subject	audio and video cues
dc.subject	hearing
dc.subject	scene understanding
dc.subject	observed events
dc.subject	surveillance
dc.subject	joint-sensor calibration
dc.subject	audio-visual sensor measurements
dc.title	Audio networks for speech enhancement and indexing
dc.type	Thesis
dcterms.educationLevel	PhD
curtin.department	Department of Computing
curtin.accessStatus	Open access

Files in this item

Name:: 138116_Kuhnapfel full.pdf
Size:: 7.778Mb
Format:: PDF

This item appears in the following Collection(s)

Curtin Theses

Show simple item record

Audio networks for speech enhancement and indexing

Files in this item

This item appears in the following Collection(s)

Related items