Show simple item record

dc.contributor.authorTotterdell, J.A.
dc.contributor.authorNur, Darfiana
dc.contributor.authorMengersen, K.L.
dc.date.accessioned2020-06-12T04:36:29Z
dc.date.available2020-06-12T04:36:29Z
dc.date.issued2017
dc.identifier.citationTotterdell, J.A. and Nur, D. and Mengersen, K.L. 2017. Bayesian hidden Markov models in DNA sequence segmentation using R: the case of Simian Vacuolating virus (SV40). Journal of Statistical Computation and Simulation. 87 (14): pp. 2799-2827.
dc.identifier.urihttp://hdl.handle.net/20.500.11937/79609
dc.identifier.doi10.1080/00949655.2017.1344666
dc.description.abstract

Segmentation models aim to partition compositionally heterogeneous domains into homogeneous segments which may be reflective of biological function. Due to the latent nature of the segments a natural approach to segmentation that has gained favour recently uses Bayesian hidden Markov models (HMMs). Concomitantly in the last few decades, the free R programming language has become a dominant tool for computational statistics, visualization and data science. Therefore, this paper aims to fully exploit R to fit a Bayesian HMM for DNA segmentation. The joint posterior distribution of parameters in the model to be considered is derived followed by the algorithms that can be used for estimation. Functions following these algorithms (Gibbs Sampling, Data Augmentation and Label Switching) are then fully implemented in R. The methodology is assessed through extensive simulation studies and then being applied to analyse Simian Vacuolating virus (SV40). It is concluded that: (1) the algorithms and functions in R can correctly estimate sequence segmentation if the HMM structure is assumed; (2) the performance of the model improves with sequence length; (3) R is reasonably fast for short to medium sequence lengths and number of segments and (4) the segmentation of SV40 appears to correspond with the two major transcripts, early and late, that regulate the expression of SV40 genes.

dc.languageEnglish
dc.publisherTAYLOR & FRANCIS LTD
dc.subjectScience & Technology
dc.subjectTechnology
dc.subjectPhysical Sciences
dc.subjectComputer Science, Interdisciplinary Applications
dc.subjectStatistics & Probability
dc.subjectComputer Science
dc.subjectMathematics
dc.subjectBayesian modelling
dc.subjectDNA sequence
dc.subjectdata augmentation
dc.subjectGibbs sampler algorithm
dc.subjecthidden Markov models
dc.subjectlabel switching algorithm
dc.subjectR statistical software
dc.subjectsegmentation modelling
dc.subjectSimian Vacuolating virus (SV40)
dc.subjectPROBABILISTIC FUNCTIONS
dc.subjectSTATISTICAL-ANALYSIS
dc.subjectGENOME
dc.subjectCHAINS
dc.titleBayesian hidden Markov models in DNA sequence segmentation using R: the case of Simian Vacuolating virus (SV40)
dc.typeJournal Article
dcterms.source.volume87
dcterms.source.number14
dcterms.source.startPage2799
dcterms.source.endPage2827
dcterms.source.issn0094-9655
dcterms.source.titleJournal of Statistical Computation and Simulation
dc.date.updated2020-06-12T04:36:28Z
curtin.departmentSchool of Elec Eng, Comp and Math Sci (EECMS)
curtin.accessStatusFulltext not available
curtin.facultyFaculty of Science and Engineering
curtin.contributor.orcidNur, Darfiana [0000-0002-7690-1097]
dcterms.source.eissn1563-5163


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record