R-tfidf, A variety of TF-IDF term weighting strategy in document categorization
Access Status
Authors
Date
2011Type
Metadata
Show full item recordCitation
Source Title
ISBN
School
Collection
Abstract
Term weighting strategy plays an essential role in the areas related to text processing such as text categorization and information retrieval. In such systems, term frequency, inverse document frequency, and document length normalization are important factors to be considered when a term weighting strategy is developed. Term length normalization is proposed to give equal opportunities to retrieve both lengthy documents and shorter ones. However, terms in very short documents that may be useless for users, especially in the scenario of Web information retrieval, could be assigned very high weights, resulting in a situation where shorter documents are ranked higher than lengthy documents that are more relevant to users information needs. In this research, a new R-tfidf term weighting strategy is proposed to alleviate the side effects of document length normalization. Experimental results demonstrate the proposed approach can to some extent improve the performance of text categorization. © 2011 IEEE.
Related items
Showing items related by title, author, creator and subject.
-
Spittle, A.; Orton, J.; Anderson, P.; Boyd, Roslyn; Doyle, L. (2015)Background: Infants born preterm are at increased risk of developing cognitive and motor impairment compared with infants born at term. Early developmental interventions have been provided in the clinical setting with the ...
-
Zhu, Dengya (2010)Web search results are far from perfect due to the polysemous and synonymous characteristics of nature languages, information overload as the results of information explosion on the Web, and the flat list, “one size fits ...
-
Gulland, Elizabeth-Kate; Moncrieff, Simon; West, Geoff (2015)Information retrieval - finding and retrieving relevant sources of data, such as documents or geospatially located records - is a bottleneck in the process of accessing online data. Metadata describing data sources is ...