Audio networks for speech enhancement and indexing
Access Status
Authors
Date
2009Supervisor
Type
Award
Metadata
Show full item recordSchool
Collection
Abstract
For humans, hearing is the second most important sense, after sight. Therefore, acoustic information greatly contributes to observing and analysing an area of interest. For this reason combining audio and video cues for surveillance enhances scene understanding and the observed events. However, when combining different sensors their measurements need to be correlated, which is done by either knowing the exact relative sensor alignment or learning a mapping function. Most deployed systems assume a known relative sensor alignment, making them susceptible to sensor drifts. Additionally, audio recordings are generally a mixture of several source signals and therefore need to be processed to extract a desired sound source, such as speech of a target person.In this thesis a generic framework is described that captures, indexes and extracts surveillance events from coordinated audio and video cues. It presents a dynamic joint-sensor calibration approach that uses audio-visual sensor measurements to dynamically and incrementally learn the calibration function, making the sensor calibration resilient to independent drifts in the sensor suite. Experiments demonstrate the use of such a framework for enhancing surveillance.Furthermore, a speech enhancement approach is presented based on a distributed network of microphones, increasing the effectiveness for acoustic surveillance of large areas. This approach is able to detect and enhance speech in the presence of rapidly changing environmental noise. Spectral subtraction, a single channel speech enhancement approach, is modified to adapt quickly to rapid noise changes of two common noise sources by incorporating multiple noise models. The result of the cross correlation based noise classification approach is also utilised to improve the voice activity detection by minimising false detection based on rapid noise changes. Experiments with real world noise consisting of scooter and café noise have proven the advantage of multiple noise models especially when the noise changes during speech.The modified spectral subtraction approach is then extended to real world scenarios by introducing more and highly non-stationary noise types. Thus, the focus is directed to implement a more sophisticated noise classification approach by extracting a variety of acoustic features and applying PCA transformation to compute the Mahalanobis distance to each noise class. This distance measurement is also included in the voice activity detection algorithm to reduce false detection for highly non-stationary noise types. However, using spectral subtraction in non-stationary noise environments, such as street noise, reduces the performance of the speech enhancement. For that reason the speech enhancement approach is further improved by using the sound information of the entire network to update the noise model of the detected noise type during speech. This adjustment considerably improved the speech enhancement performance in non-stationary noise environments. Experiments conducted under diverse real world conditions including rapid noise changes and non-stationary noise sources demonstrate the effectiveness of the presented method.
Related items
Showing items related by title, author, creator and subject.
-
Kϋhnapfel, Thorsten; Tan, Tele; Venkatesh, Svetha; Igel, B.; Nordholm, Sven (2008)We present a new approach for speech enhancement in the presence of non-stationary and rapidly changing background noise. A distributed microphone system is used to capture the acoustic characteristics of the environment. ...
-
Tun, Min Han (2007)With the advancement of computer technology, demand for more accurate and intelligent monitoring systems has also risen. The use of computer vision and video analysis range from industrial inspection to surveillance. ...
-
The measurement of underwater acoustic noise radiated by a vessel using the vessel's own towed arrayDuncan, Alexander John (2003)The work described in this thesis tested the feasibility of using a towed array of hydrophones to: 1. localise sources of underwater acoustic noise radiated by the towvessel, 2. determine the absolute amplitudes of these ...