Quality and interestingness of association rules derived from data mining of relational and semi-structured data

Mohd Shaharanee, Izwan Nizal

dc.contributor.author	Mohd Shaharanee, Izwan Nizal
dc.contributor.supervisor	Dr Fedja Hadzic
dc.date.accessioned	2017-01-30T10:10:44Z
dc.date.available	2017-01-30T10:10:44Z
dc.date.created	2012-08-01T07:07:19Z
dc.date.issued	2012
dc.identifier.uri	http://hdl.handle.net/20.500.11937/1643
dc.description.abstract	Deriving useful and interesting rules from a data mining system are essential and important tasks. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. As the data mining techniques are data-driven, it is beneficial to affirm the rules using a statistical approach. It is important to establish the ways in which the existing statistical measures and constraint parameters can be effectively utilized and the sequence of their usage.In this thesis, a systematic way to evaluate the association rules discovered from frequent, closed and maximal itemset mining algorithms; and frequent subtree mining algorithm including the rules based on induced, embedded and disconnected subtrees is presented. With reference to the frequent subtree mining, in addition a new direction is explored based on utilizing the DSM approach capable of preserving all information from tree-structured database in a flat data format, consequently enabling the direct application of a wider range of data mining analysis/techniques to tree-structured data. Implications of this approach were investigated and it was found that basing rules on disconnected subtrees, can be useful in terms of increasing the accuracy and the coverage rate of the rule set.A strategy that combines data mining and statistical measurement techniques such as sampling, redundancy and contradictive checks, correlation and regression analysis to evaluate the rules is developed. This framework is then applied to real-world datasets that represent diverse characteristics of data/items. Empirical results show that with a proper combination of data mining and statistical analysis, the proposed framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy rules. Moreover, the results reveal the important characteristics and differences between mining frequent, closed or maximal itemsets; and mining frequent subtree including the rules based on induced, embedded and disconnected subtrees; as well as the impact of confidence measure for the prediction and classification task.
dc.language	en
dc.publisher	Curtin University
dc.subject	data mining
dc.subject	relational data
dc.subject	semi-structured data
dc.subject	interestingness
dc.subject	quality
dc.subject	association rules
dc.title	Quality and interestingness of association rules derived from data mining of relational and semi-structured data
dc.type	Thesis
dcterms.educationLevel	PhD
curtin.department	Digital Ecosystems and Business Intelligence Institute, Curtin Business School
curtin.accessStatus	Open access

Files in this item

Name:: 186675_MohdShaharanee2012.pdf
Size:: 1.786Mb
Format:: PDF

This item appears in the following Collection(s)

Curtin Theses

Show simple item record

Quality and interestingness of association rules derived from data mining of relational and semi-structured data

Files in this item

This item appears in the following Collection(s)

Related items