Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    Mathematical Information Retrieval (MIR) from scanned PDF and MathML conversion

    225790_146136_MIR.pdf (1.359Mb)
    Access Status
    Open access
    Authors
    Nazemi, Azadeh
    Murray, Iain
    McMeekin, David
    Date
    2014
    Type
    Journal Article
    
    Metadata
    Show full item record
    Citation
    Nazemi, A. and Murray, I. and McMeekin, D. 2014. Mathematical Information Retrieval (MIR) from scanned PDF and MathML conversion. IPSJ Transactions on Computer Vision and Applications. 6: pp. 132-142.
    Source Title
    IPSJ Transactions on Computer Vision and Applications
    DOI
    10.2197/ipsjtcva.6.132
    ISSN
    1882-6695
    School
    Department of Electrical and Computer Engineering
    URI
    http://hdl.handle.net/20.500.11937/36574
    Collection
    • Curtin Research Publications
    Abstract

    This paper describes part of an ongoing comprehensive research project that is aimed at generating a MathML format from images of mathematical expressions that have been extracted from scanned PDF documents. A MathML representation of a scanned PDF document reduces the document's storage size and encodes the mathematical notation and meaning. The MathML representation then becomes suitable for vocalization and accessible through the use of assistive technologies. In order to achieve an accurate layout analysis of a scanned PDF document, all textual and non-textual components must be recognised, identified and tagged. These components may be test or mathematical expressions and graphics in the form of images, figures, tables and/or diagrams. Mathematical expressions are one of the most significant components within scanned scientific and engineering PDF documents and need to be machine readable for use with assistive technologies. This research is a work in progress and includes multiple different modules: detecting and extracting mathematical expressions, recursive primitive component extraction, non-alphanumerical symbols recognition, structural semantic analysis and merging primitive components to generate the MathML of the scanned PDF document. An optional module converts MathML to audio format using a Text to Speech engine (TTS) to make the document accessible for vision-impaired users.

    Related items

    Showing items related by title, author, creator and subject.

    • Magnetic core–shell CuFe2O4@C3N4 hybrids for visible light photocatalysis of Orange II
      Yao, Y.; Lu, F.; Zhu, Y.; Wei, F.; Liu, X.; Lian, C.; Wang, Shaobin (2015)
      Novel CuFe2O4@C3N4 core–shell photocatalysts were fabricated through a self-assembly method and characterized by X-ray diffraction, Fourier transform infrared spectroscopy, thermogravimetric analysis, X-ray photoelectron ...
    • Zircon U–Pb–Lu–Hf–O isotopic evidence for =3.5 Ga crustal growth,reworking and differentiation in the northern Tarim Craton
      Ge, Rongfeng; Zhu, W.; Wilde, Simon; He, J. (2014)
      Continental crust was largely generated before 2.5 Ga through mafic–ultramafic and TTG (tonalite-trondhjemite-granodiorite) magmatism, but it is contentious when did such primitive crust evolve into mature granodioritic ...
    • Layout Analysis for Scanned PDF and Transformation to the Structured PDF Suitable for Vocalization and Navigation
      Nazemi, Azadeh; Murray, Iain; McMeekin, David (2014)
      Information can include text, pictures and signatures that can be scanned into a document format, such as the Portable Document Format (PDF), and easily emailed to recipients around the world. Upon the document’s arrival, ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.