Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study

Zhu, Dengya; Wong, K.

doi:10.1007/978-3-319-12637-1_60

Access Status

Fulltext not available

Authors

Zhu, Dengya

Wong, K.

Date

2014

Type

Conference Paper

Metadata

Show full item record

Citation

Zhu, D. and Wong, K. 2014. Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study, in Loo, C.K. and Yap, K.S. and Wong, K.W. and Teoh, A. and Huang, K. (ed), Proceedings of 21st International Conference on Neural Information Processing: The Next Renaissance of the Neural Information Processing (Part 1), Nov 3-6 2014, pp. 479-486. Sarawak, Malaysia: University of Malaya.

Source Title

Neural Information Processing

Source Conference

ICONIP 2014

DOI

10.1007/978-3-319-12637-1_60

ISBN

9783319126364

School

School of Information Systems

URI

http://hdl.handle.net/20.500.11937/26799

Collection

Curtin Research Publications

Abstract

Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, labeling and corpus bias are two concerns in text categorization. This paper focuses on evaluating the three commonly used text classifiers by using an automatically generated text document set which is labelled by a group of experts to alleviate subjectiveness of labelling, and at the same time to examine how the performance of the algorithms is influenced by feature selection algorithms and the number of features selected.