Ascertaining data mining rules using statistical approaches

Mohd Shaharanee, I.; Dillon, Tharam S; Hadzic, Fedja

Access Status

Fulltext not available

Authors

Mohd Shaharanee, I.

Dillon, Tharam S

Hadzic, Fedja

Date

2009

Type

Conference Paper

Metadata

Show full item record

Citation

Mohd Shaharanee, Izwan Nizal and Dillon, Tharam S. and Hadzic, Fedja. 2009. Ascertaining data mining rules using statistical approaches, in Parvinder S. Sandhu (ed), International Symposium on Computing, Communication and Control (ISCCC 2009), Oct 9 2009, pp. 180-188. Singapore: International Association of Computer Science and Information Technology (IACSIT).

Source Title

Proceedings of the international symposium on computing, communication and control (ISCCC 2009)

Source Conference

International Symposium on Computing, Communication and Control (ISCCC 2009)

ISBN

9789810838157

Faculty

Curtin Business School

The Digital Ecosystems and Business Intelligence Institute (DEBII)

School

Digital Ecosystems and Business Intelligence Institute (DEBII)

URI

http://hdl.handle.net/20.500.11937/43700

Collection

Curtin Research Publications

Abstract

Knowledge acquisition techniques have been well researched in the data mining community. Such techniques, especially when used for unsupervised learning, often generate a large quantity of rules and patterns. While many rules generated are useful and interesting, some information is not captured by those rules, such as already known patterns, coincidental patterns and patterns with no significant value for the real world applications. Sustaining the interestingness of rules generated by data mining algorithm is an active and important area of data mining research. Different methods have been proposed and have been well examined for discovering interestingness in rules. These measures often only reflect the interestingness with respect to the database being observed, and as such the rules will satisfy the constrains with respect to the sample data only, but not with respect to the whole data distribution. Therefore, one can still argue the usefulness of the rules and patterns with respect to their use in practical problems. As the data mining techniques are naturally data driven, it would benefit to affirm the generated hypothesis with a statistical methodology. In our research, we investigate how to combine data mining and statistical measurement techniques to arrive at more reliable and interesting set of rules. Such a combination is greatly essential to conquer the data overload in practical problems. A real world data set is used to explore the ways in which one can measure and verify the usefulness of rules from data mining techniques using statistical analysis.