Search | VHL Regional Portal

Statistical comparison of methods to estimate the error probability in short-read Illumina sequencing.

Abnizova, Irina; Skelly, Tom; Naumenko, Fedor; Whiteford, Nava; Brown, Clive; Cox, Tony.

J Bioinform Comput Biol ; 8(3): 579-91, 2010 Jun.

Article in English | MEDLINE | ID: mdl-20556863

ABSTRACT

As was the case in the beginning of the sequencing era, the new generation of short-read sequencing technologies still requires both accuracy of data processing methods and reliable measures of that accuracy. Inspired by the classic of the genre, the Phred method, we generalized those findings in the area of base quality value calibration. We introduce a simple, straightforward statistically established way to measure the performance of a calibrator, and to find an optimal way to assess its reliability. We illustrate the method by assessing the performance of several calibrators/predictors for Illumina, Genome Analyser 2 (GA2) data. The choice of the best predictor is based on optimization of validity, discriminative ability and discrimination power for several candidate predictors. We applied the method on data from one experimental run for genome of the phage varphiX, and found the best predictor out of ten candidates to be 'Purity', a statistics derived from corrected cluster intensities. The source code for the comparison of the predictors is available from the authors by request.

Subject(s)

Algorithms , Artifacts , Chromosome Mapping/methods , Data Interpretation, Statistical , Sequence Analysis, DNA/methods , Software , Base Sequence , Molecular Sequence Data

Swift: primary data analysis for the Illumina Solexa sequencing platform.

Whiteford, Nava; Skelly, Tom; Curtis, Christina; Ritchie, Matt E; Löhr, Andrea; Zaranek, Alexander Wait; Abnizova, Irina; Brown, Clive.

Bioinformatics ; 25(17): 2194-9, 2009 Sep 01.

Article in English | MEDLINE | ID: mdl-19549630

ABSTRACT

MOTIVATION: Primary data analysis methods are of critical importance in second generation DNA sequencing. Improved methods have the potential to increase yield and reduce the error rates. Openly documented analysis tools enable the user to understand the primary data, this is important for the optimization and validity of their scientific work. RESULTS: In this article, we describe Swift, a new tool for performing primary data analysis on the Illumina Solexa Sequencing Platform. Swift is the first tool, outside of the vendors own software, which completes the full analysis process, from raw images through to base calls. As such it provides an alternative to, and independent validation of, the vendor supplied tool. Our results show that Swift is able to increase yield by 13.8%, at comparable error rate.

Subject(s)

Sequence Analysis, DNA/methods , Software , Base Sequence , Computational Biology , Molecular Sequence Data

Validation of all-atom phosphatidylcholine lipid force fields in the tensionless NPT ensemble.

Taylor, Justine; Whiteford, Nava E; Bradley, Geoff; Watson, Graeme W.

Biochim Biophys Acta ; 1788(3): 638-49, 2009 Mar.

Article in English | MEDLINE | ID: mdl-19014902

ABSTRACT

A recently defined charge set, to be used in conjunction with the all-atom CHARMM27r force field, has been validated for a series of phosphatidylcholine lipids. The work of Sonne et al. successfully replicated experimental bulk membrane behaviour for dipalmitoylphosphatidylcholine (DPPC) under the isothermal-isobaric (NPT) ensemble. Previous studies using the defined CHARMM27r charge set have resulted in lateral membrane contraction when used in the tensionless NPT ensemble, forcing the lipids to adopt a more ordered conformation than predicted experimentally. The current study has extended the newly defined charge set to 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphatidylcholine (POPC) and 1-palmitoyl-2-docosahexaenoyl-sn-glycero-3-phosphatidylcholine (PDPC). Molecular dynamics simulations were run for each of the lipids (including DPPC) using both the CHARMM27r charge set and the newly defined modified charge set. In all three cases a significant improvement was seen in both bulk membrane properties and individual atomistic effects. Membrane width, area per lipid and the depth of water penetration were all seen to converge to experimental values. Deuterium order parameters generated with the new charge set showed increased disorder across the width of the bilayer and reflected both results from experiment and similar simulations run with united atom models. These newly validated models can now find use in mixed biological simulations under the tensionless ensemble without concern for lateral contraction.

Subject(s)

Phosphatidylcholines/chemistry , 1,2-Dipalmitoylphosphatidylcholine/chemistry , Computer Simulation , Membranes, Artificial , Models, Molecular

Optimal probe length varies for targets with high sequence variation: implications for probe library design for resequencing highly variable genes.

Haslam, Niall J; Whiteford, Nava E; Weber, Gerald; Prügel-Bennett, Adam; Essex, Jonathan W; Neylon, Cameron.

PLoS One ; 3(6): e2500, 2008 Jun 18.

Article in English | MEDLINE | ID: mdl-18563203

ABSTRACT

BACKGROUND: Sequencing by hybridisation is an effective method for obtaining large amounts of DNA sequence information at low cost. The efficiency of SBH depends on the design of the probe library to provide the maximum information for minimum cost. Long probes provide a higher probability of non-repeated sequences but lead to an increase in the number of probes required whereas short probes may not provide unique sequence information due to repeated sequences. We have investigated the effect of probe length, use of reference sequences, and thermal filtering on the design of probe libraries for several highly variable target DNA sequences. RESULTS: We designed overlapping probe libraries for a range of highly variable drug target genes based on known sequence information and develop a formal terminology to describe probe library design. We find that for some targets these libraries can provide good coverage of a previously unseen target whereas for others the coverage is less than 30%. The optimal probe length varies from as short at 12 nt to as large as 19 nt and depends on the sequence, its variability, and the stringency of thermal filtering. It cannot be determined from inspection of an example gene sequence. CONCLUSIONS: Optimal probe length and the optimal number of reference sequences used to design a probe library are highly target specific for highly variable sequencing targets. The optimum design cannot be determined simply by inspection of input sequences or of alignments but only by detailed analysis of the each specific target. For highly variable sequences, shorter probes can in some cases provide better information than longer probes. Probe library design would benefit from a general purpose tool for analysing these issues. The formal terminology developed here and the analysis approaches it is used to describe will contribute to the development of such tools.

Subject(s)

Molecular Probes , HIV/genetics , Hepacivirus/genetics , Orthomyxoviridae/genetics , Polymorphism, Single Nucleotide , Sequence Analysis, DNA

An analysis of the feasibility of short read sequencing.

Whiteford, Nava; Haslam, Niall; Weber, Gerald; Prügel-Bennett, Adam; Essex, Jonathan W; Roach, Peter L; Bradley, Mark; Neylon, Cameron.

Nucleic Acids Res ; 33(19): e171, 2005 Nov 07.

Article in English | MEDLINE | ID: mdl-16275781

ABSTRACT

Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20-30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1.

Subject(s)

Genomics/methods , Sequence Analysis, DNA/methods , Chromosomes, Human, Pair 1 , Feasibility Studies , Genome, Bacterial , Genome, Human , Genome, Viral , Humans

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL