An evaluation study on text categorization using automatically generated labeled dataset
Access Status
Authors
Date
2017Type
Metadata
Show full item recordCitation
Source Title
ISSN
School
Collection
Abstract
Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, inconsistency of human labeling and high dimensionality of feature spaces are two issues to be addressed in text categorization. This paper focuses on evaluating the five commonly used text classifiers by using an automatically generated text document collection which is labeled by a group of experts to alleviate subjectivity of human category assignments, and at the same time to examine the influence of the number of features on the performance of the algorithms.
Related items
Showing items related by title, author, creator and subject.
-
Smith, Kyla; Kerr, Deborah; Fenner, Ashley; Straker, Leon (2014)Background: Adolescents are considered a hard to reach group and novel approaches are needed to encourage good health. Text messaging interventions have been reported as acceptable to adolescents but there is little ...
-
Pingault, N.; Lehmann, Deborah; Bowman, J.; Riley, T. (2007)* Skip to main content Blackwell Synergy Email: ? Password: University of Western Australia Library * Register * Forgotten Password * Athens/Institution Login * Synergy Home | * ...
-
Zhu, Dengya (2010)Web search results are far from perfect due to the polysemous and synonymous characteristics of nature languages, information overload as the results of information explosion on the Web, and the flat list, “one size fits ...