Picking adequate samples for approximate decision support queries using inverse SRSWOR
MetadataShow full item record
A simple random sample of records from a large data warehouse may not contain sufficient number of records that satisfy highly selective queries. Efficient sampling schemes for such queries involve using innovative techniques that can access records that are relevant to specific queries. In drawing the sample, it is advantageous to know what would be an adequate sample size for a given query. This paper proposes methods for picking adequate samples that ensure approximate query results with a desired level of accuracy. A special index based on a structure known as the k-MDI Tree is used to draw samples. An unbiased estimator named inverse simple random sampling without replacement is adapted to estimate adequate sample sizes for queries. The methods are evaluated experimentally on a large real life data set. The results of evaluation show that adequate sample sizes can be determined with errors in outputs of most queries within the acceptable limit of 5%.
Showing items related by title, author, creator and subject.
Rudra, Amit; Gopalan, Raj; Achuthan, Narasimaha (2013)For highly selective queries, a simple random sample of records drawn from a large data warehouse may not contain sufficient number of records that satisfy the query conditions. Efficient sampling schemes for such queries ...
Approximate Query Processing on High Dimensionality Database Tables Using Multidimensional Cluster Sampling ViewInoue, T.; Krishna, Aneesh; Gopalan, Raj (2016)Approximate query processing based on random sampling is one of the most useful methods for the efficient computation of large quantities of data kept in databases. However, small samples obtained through random sampling ...
Rudra, Amit; Gopalan, Raj; Achuthan, Narasimaha (2012)Decision support queries usually involve accessing enormous amount of data requiring significant retrieval time. Faster retrieval of query results can often save precious time for the decision maker. Pre-computation of ...