Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study
MetadataShow full item record
Naïve Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, labeling and corpus bias are two concerns in text categorization. This paper focuses on evaluating the three commonly used text classifiers by using an automatically generated text document set which is labelled by a group of experts to alleviate subjectiveness of labelling, and at the same time to examine how the performance of the algorithms is influenced by feature selection algorithms and the number of features selected.
Showing items related by title, author, creator and subject.
Improving the relevance of web search results by combining web snippet categorization, clustering and personalizationZhu, Dengya (2010)Web search results are far from perfect due to the polysemous and synonymous characteristics of nature languages, information overload as the results of information explosion on the Web, and the flat list, “one size fits ...
Zhu, Dengya; Wong, K. (2017)Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered ...
Machine learning and natural language processing to identify falls in electronic patient care records from ambulance attendancesTohira, Hideo ; Finn, Judith ; Ball, Stephen ; Brink, D.; Buzzacott, Peter (2021)We derived machine learning models utilizing features generated by natural language processing (NLP) of free-text data from an ambulance services provider to identify fall cases. The data comprised samples of electronic ...