An efficient sampling scheme for approximate processing of decision support queries

Rudra, Amit; Gopalan, Raj; Achuthan, Narasimaha

dc.contributor.author	Rudra, Amit
dc.contributor.author	Gopalan, Raj
dc.contributor.author	Achuthan, Narasimaha
dc.contributor.editor	José Cordeiro
dc.contributor.editor	Leszek Maciaszek
dc.contributor.editor	Alfredo Cuzzocrea
dc.date.accessioned	2017-01-30T13:06:25Z
dc.date.available	2017-01-30T13:06:25Z
dc.date.created	2014-02-03T20:02:09Z
dc.date.issued	2012
dc.identifier.citation	Rudra, Amit and Gopalan, Raj P. and Achuthan, N.R. 2012. An efficient sampling scheme for approximate processing of decision support queries, in Cordeiro, J., Maciaszek, L., Cuzzocrea, A. (ed), 14th International Conference on Enterprise Information Systems, Jun 28 2012, pp. 16-26. Wroclaw, Poland: INSTICC.
dc.identifier.uri	http://hdl.handle.net/20.500.11937/28648
dc.description.abstract	Decision support queries usually involve accessing enormous amount of data requiring significant retrieval time. Faster retrieval of query results can often save precious time for the decision maker. Pre-computation of materialised views and sampling are two ways of achieving significant speed up. However, drawing random samples for queries on range restricted attributes has two problems: small random samples may miss relevant records and drawing larger samples from disk can be inefficient due to the large number of disk accesses required. In this paper, we propose an efficient indexing scheme for quickly drawing relevant samples for data warehouse queries as well as propose the concepts of database and sample relevancy ratios. We describe a method for estimating query results for range restricted queries using this index and experimentally evaluate the scheme using a relatively large real dataset. Further, we compute the confidence intervals for the estimates to investigate whether the results can be guaranteed to be within the desired level of confidence. Our experiments on data from a retail data warehouse show promising results. We also report the levels of accuracy achieved for various types of aggregate queries and relate them to the database relevancy ratios of the queries.
dc.publisher	INSTICC
dc.subject	Data Warehousing
dc.subject	Approximate Query Processing
dc.subject	Sampling
dc.title	An efficient sampling scheme for approximate processing of decision support queries
dc.type	Conference Paper
dcterms.source.startPage	16
dcterms.source.endPage	26
dcterms.source.title	Proceedings of ICEIS
dcterms.source.series	Proceedings of ICEIS
dcterms.source.conference	14th International Conference on Enterprise Information Systems
dcterms.source.conference-start-date	Jun 28 2012
dcterms.source.conferencelocation	Wroclaw, Poland
dcterms.source.place	Portugal
curtin.note	Publisher: SciTePress. (2012). ISBN: 9789898565105
curtin.department
curtin.accessStatus	Open access

Files in this item

Name:: 194734_101202_ICEIS_2012_144.pdf
Size:: 1.011Mb
Format:: PDF

This item appears in the following Collection(s)

Curtin Research Publications

Show simple item record

An efficient sampling scheme for approximate processing of decision support queries

Files in this item

This item appears in the following Collection(s)

Related items