Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Theses
    • View Item
    • espace Home
    • espace
    • Curtin Theses
    • View Item

    Audio networks for speech enhancement and indexing

    138116_Kuhnapfel full.pdf (7.778Mb)
    Access Status
    Open access
    Authors
    Kühnapfel, Thorsten
    Date
    2009
    Supervisor
    Prof. Svetha Venkatesh
    Assoc. Prof. Tele Tan
    Type
    Thesis
    Award
    PhD
    
    Metadata
    Show full item record
    School
    Department of Computing
    URI
    http://hdl.handle.net/20.500.11937/206
    Collection
    • Curtin Theses
    Abstract

    For humans, hearing is the second most important sense, after sight. Therefore, acoustic information greatly contributes to observing and analysing an area of interest. For this reason combining audio and video cues for surveillance enhances scene understanding and the observed events. However, when combining different sensors their measurements need to be correlated, which is done by either knowing the exact relative sensor alignment or learning a mapping function. Most deployed systems assume a known relative sensor alignment, making them susceptible to sensor drifts. Additionally, audio recordings are generally a mixture of several source signals and therefore need to be processed to extract a desired sound source, such as speech of a target person.In this thesis a generic framework is described that captures, indexes and extracts surveillance events from coordinated audio and video cues. It presents a dynamic joint-sensor calibration approach that uses audio-visual sensor measurements to dynamically and incrementally learn the calibration function, making the sensor calibration resilient to independent drifts in the sensor suite. Experiments demonstrate the use of such a framework for enhancing surveillance.Furthermore, a speech enhancement approach is presented based on a distributed network of microphones, increasing the effectiveness for acoustic surveillance of large areas. This approach is able to detect and enhance speech in the presence of rapidly changing environmental noise. Spectral subtraction, a single channel speech enhancement approach, is modified to adapt quickly to rapid noise changes of two common noise sources by incorporating multiple noise models. The result of the cross correlation based noise classification approach is also utilised to improve the voice activity detection by minimising false detection based on rapid noise changes. Experiments with real world noise consisting of scooter and café noise have proven the advantage of multiple noise models especially when the noise changes during speech.The modified spectral subtraction approach is then extended to real world scenarios by introducing more and highly non-stationary noise types. Thus, the focus is directed to implement a more sophisticated noise classification approach by extracting a variety of acoustic features and applying PCA transformation to compute the Mahalanobis distance to each noise class. This distance measurement is also included in the voice activity detection algorithm to reduce false detection for highly non-stationary noise types. However, using spectral subtraction in non-stationary noise environments, such as street noise, reduces the performance of the speech enhancement. For that reason the speech enhancement approach is further improved by using the sound information of the entire network to update the noise model of the detected noise type during speech. This adjustment considerably improved the speech enhancement performance in non-stationary noise environments. Experiments conducted under diverse real world conditions including rapid noise changes and non-stationary noise sources demonstrate the effectiveness of the presented method.

    Related items

    Showing items related by title, author, creator and subject.

    • Adaptive speech enhancement with varying noise backgrounds
      Kϋhnapfel, Thorsten; Tan, Tele; Venkatesh, Svetha; Igel, B.; Nordholm, Sven (2008)
      We present a new approach for speech enhancement in the presence of non-stationary and rapidly changing background noise. A distributed microphone system is used to capture the acoustic characteristics of the environment. ...
    • Virtual image sensors to track human activity in a smart house
      Tun, Min Han (2007)
      With the advancement of computer technology, demand for more accurate and intelligent monitoring systems has also risen. The use of computer vision and video analysis range from industrial inspection to surveillance. ...
    • The measurement of underwater acoustic noise radiated by a vessel using the vessel's own towed array
      Duncan, Alexander John (2003)
      The work described in this thesis tested the feasibility of using a towed array of hydrophones to: 1. localise sources of underwater acoustic noise radiated by the towvessel, 2. determine the absolute amplitudes of these ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.