Using the symmetrical Tau criterion for feature selection decision tree and neural network learning
Access Status
Authors
Date
2006Type
Metadata
Show full item recordCitation
Source Title
Source Conference
ISBN
Faculty
School
Collection
Abstract
The data collected for various domain purposes usually contains some features irrelevant tothe concept being learned. The presence of these features interferes with the learning mechanism and as a result the predicted models tend to be more complex and less accurate. It is important to employ an effective feature selection strategy so that only the necessary and significant features will be used to learn the concept at hand. The Symmetrical Tau (t) [13] is a statistical-heuristic measure for the capability of an attribute in predicting the class of another attribute, and it has successfully been used as a feature selection criterion during decision tree construction. In this paper we aim to demonstrate some other ways of effectively using the t criterion to filter out the irrelevant features prior to learning (pre-pruning) and after the learning process (post-pruning). For the pre-pruning approach we perform two experiments, one where the irrelevant features are filtered out according to their t value, and one where we calculate the t criterion for Boolean combinations of features and use the highest t-valued combination. In the post-pruning approach we use the t criterion to prune a trained neural network and thereby obtain a more accurate and simple rule set. The experiments are performed on data characterized by continuous and categorical attributes and the effectiveness of the proposed techniques is demonstrated by comparing the derived knowledge models in terms of complexity and accuracy.
Related items
Showing items related by title, author, creator and subject.
-
Wong, Wei; Ali, C.; Ing, W.; Haw, L.; Lee, V. (2016)Most advances on the Evolutionary Algorithm optimisation of Neural Network are on recurrent neural network using the NEAT optimisation method. For feed forward network, most of the optimisation are merely on the Weights ...
-
Hadzic, Fedja; Dillon, Tharam S. (2005)Nowadays, lots of data is being collected for different industrial and commercial purposes, where the aim is to discover useful patterns from data which leads to discovery of valuable domain knowledge. Unsupervised learning ...
-
Rapley, Pat; Nathan, Pauline; Davidson, Laura (2006)The context for this study is a conversion program for enrolled nurses (ENs) or division 2 level nurses who want to further their career as a registered nurse (RN) or division 1 nurse. While the conversion program is ...