Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    Semi-Automatic Information Extraction from Discussion Boards with Applications for Anti-Spam Technology

    Access Status
    Fulltext not available
    Authors
    Sarencheh, S.
    Potdar, Vidyasagar
    Yeganeh, E.
    Firoozeh, N.
    Date
    2010
    Type
    Book Chapter
    
    Metadata
    Show full item record
    Citation
    Sarencheh, Saeed and Potdar, Vidyasagar and Yeganeh, Elham and Firoozeh, Nazanin. 2010. Semi-Automatic Information Extraction from Discussion Boards with Applications for Anti-Spam Technology, in Taniar, D. and Gervasi, O. and Murgante, B. and Pardede, E. and Apduhan, B.O. (ed), Lecture Notes in Computer Science, Volume 6017: Computational science and its applications - ICCSA 2010, pp. 370-382. Germany: Springer.
    Source Title
    Lecture notes in computer science, volume 6017: computational science and its applications - ICCSA 2010
    ISBN
    9783642121647
    School
    Centre for Extended Enterprises and Business Intelligence
    URI
    http://hdl.handle.net/20.500.11937/11240
    Collection
    • Curtin Research Publications
    Abstract

    Forums (or discussion boards) represent a huge information collection structured under different boards, threads and posts. The actual information entity of a forum is a post, which has the information about authors, date and time of post, actual content etc. This information is significant for a number of applications like gathering market intelligence, analyzing customer perceptions etc. However automatically extracting this information from a forum is an extremely challenging task. There are several customized parsers designed for extracting information from a particular forum platform with a specific template (e.g. SMF or phpBB), however the problem with this approach is that these parsers are dependent upon the forum platform and the template used, which makes it unrealistic to use in practical situations. Hence, in this paper we propose a semi-automatic rule based solution for extracting forum post information and inserting the extracted information to a database for the purpose of analysis. The key challenge with this solution is identifying extraction rules, which are normally forum platform and forum template specific. As a result we analyzed 100 forums to derive these rules and test the performance of the algorithm. The results indicate that we were able to extract all the required information from SMF and phpBB forum platforms, which represent the majority of forums on the web.

    Related items

    Showing items related by title, author, creator and subject.

    • Increases in synthetic cannabinoids-related harms: Results from a longitudinal web-based content analysis
      Lamy, F.; Daniulaityte, R.; Nahhas, R.; Barratt, Monica; Smith, A.; Sheth, A.; Martins, S.; Boyer, E.; Carlson, R. (2017)
      © 2017 Elsevier B.V. Background Synthetic Cannabinoid Receptor Agonists (SCRA), also known as “K2� or “Spice,� have drawn considerable attention due to their potential of abuse and harmful consequences. More ...
    • How New and Expecting Fathers Engage With an App-Based Online Forum: Qualitative Analysis.
      White, B.; Giglia, R.; Scott, Jane; Burns, S. (2018)
      BACKGROUND: Breastfeeding is important for infants, and fathers are influential in supporting their partner in their decision to breastfeed and how long they breastfeed for. Fathers can feel excluded from traditional ...
    • Enhancing students’ Learning Experiences Outside School (LEOS) using digital technologies
      Coll, Sandhya Devi (2015)
      This thesis reports on an inquiry on enhancing students’ learning experiences outside school (LEOS) using digital technologies. The inquiry took the nature of an ethnographic case study which was conducted over a year. ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.