R-tfidf, A variety of TF-IDF term weighting strategy in document categorization
|dc.identifier.citation||Zhu, D. and Xiao, J. 2011. R-tfidf, A variety of TF-IDF term weighting strategy in document categorization, pp. 83-90.|
Term weighting strategy plays an essential role in the areas related to text processing such as text categorization and information retrieval. In such systems, term frequency, inverse document frequency, and document length normalization are important factors to be considered when a term weighting strategy is developed. Term length normalization is proposed to give equal opportunities to retrieve both lengthy documents and shorter ones. However, terms in very short documents that may be useless for users, especially in the scenario of Web information retrieval, could be assigned very high weights, resulting in a situation where shorter documents are ranked higher than lengthy documents that are more relevant to users information needs. In this research, a new R-tfidf term weighting strategy is proposed to alleviate the side effects of document length normalization. Experimental results demonstrate the proposed approach can to some extent improve the performance of text categorization. © 2011 IEEE.
|dc.title||R-tfidf, A variety of TF-IDF term weighting strategy in document categorization|
|dcterms.source.title||Proceedings - 7th International Conference on Semantics, Knowledge, and Grids, SKG 2011|
|dcterms.source.series||Proceedings - 7th International Conference on Semantics, Knowledge, and Grids, SKG 2011|
|curtin.department||School of Information Systems|
|curtin.accessStatus||Fulltext not available|
Files in this item
There are no files associated with this item.