Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Theses
    • View Item
    • espace Home
    • espace
    • Curtin Theses
    • View Item

    Automated framework for robust content-based verification of print-scan degraded text documents

    192116_Shulman2013.pdf (2.964Mb)
    Access Status
    Open access
    Authors
    Shulman, Yaniv
    Date
    2012
    Supervisor
    Prof. Tele Tan
    Dr Patrick Peursum
    Type
    Thesis
    Award
    PhD
    
    Metadata
    Show full item record
    School
    School of Electrical Engineering and Computing, Department of Computing
    URI
    http://hdl.handle.net/20.500.11937/897
    Collection
    • Curtin Theses
    Abstract

    Fraudulent documents frequently cause severe financial damages and impose security breaches to civil and government organizations. The rapid advances in technology and the widespread availability of personal computers has not reduced the use of printed documents. While digital documents can be verified by many robust and secure methods such as digital signatures and digital watermarks, verification of printed documents still relies on manual inspection of embedded physical security mechanisms.The objective of this thesis is to propose an efficient automated framework for robust content-based verification of printed documents. The principal issue is to achieve robustness with respect to the degradations and increased levels of noise that occur from multiple cycles of printing and scanning. It is shown that classic OCR systems fail under such conditions, moreover OCR systems typically rely heavily on the use of high level linguistic structures to improve recognition rates. However inferring knowledge about the contents of the document image from a-priori statistics is contrary to the nature of document verification. Instead a system is proposed that utilizes specific knowledge of the document to perform highly accurate content verification based on a Print-Scan degradation model and character shape recognition. Such specific knowledge of the document is a reasonable choice for the verification domain since the document contents are already known in order to verify them.The system analyses digital multi font PDF documents to generate a descriptive summary of the document, referred to as \Document Description Map" (DDM). The DDM is later used for verifying the content of printed and scanned copies of the original documents. The system utilizes 2-D Discrete Cosine Transform based features and an adaptive hierarchical classifier trained with synthetic data generated by a Print-Scan degradation model. The system is tested with varying degrees of Print-Scan Channel corruption on a variety of documents with corruption produced by repetitive printing and scanning of the test documents. Results show the approach achieves excellent accuracy and robustness despite the high level of noise.

    Related items

    Showing items related by title, author, creator and subject.

    • Diverging trends for acute lower respiratory infections in non-Aboriginal and Aboriginal children
      Moore, H.; Burgner, D.; Carville, K.; Jacoby, P.; Richmond, P.; Lehmann, Deborah (2007)
      IngentaConnect * Home * About Ingenta * Ingenta Labs * Ingenta Blog * Help o Check our FAQs o Or contact us to report problems with: o Subscription access ...
    • Assessing the digital divide in a Jordanian academic library
      Obeidat, O.; Genoni, Paul (2010)
      The research reported attempts to assess the extent and nature of the digital divide as it applies in a developing Arab country. The method used is an innovative form of document availability test developed to measure the ...
    • Reading, democracy and discipline: Premises for reading activities in Swedish primary schools from 1967 to 1969
      Dolatkhah, M.; Lundh, A.H. (2014)
      In Sweden, as well as in many other countries, children’s literacy is a much debated topic. In the public discourse, politicians, researchers, and other groups are discussing the reading abilities, reading habits, and ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.