Search | VHL Regional Portal

1.

Analysis of context-dependent errors for illumina sequencing.

Abnizova, Irina; Leonard, Steven; Skelly, Tom; Brown, Andy; Jackson, David; Gourtovaia, Marina; Qi, Guoying; Te Boekhorst, Rene; Faruque, Nadeem; Lewis, Kevin; Cox, Tony.

J Bioinform Comput Biol ; 10(2): 1241005, 2012 Apr.

Article in English | MEDLINE | ID: mdl-22809341

ABSTRACT

The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of "correcting" this error (the "second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be "corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call's nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches, based on conditional probability of their "second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.

Subject(s)

Sequence Analysis, DNA/methods , Algorithms , Nucleotide Motifs , Polymorphism, Single Nucleotide , Research Design

2.

Major submissions tool developments at the European Nucleotide Archive.

Amid, Clara; Birney, Ewan; Bower, Lawrence; Cerdeño-Tárraga, Ana; Cheng, Ying; Cleland, Iain; Faruque, Nadeem; Gibson, Richard; Goodgame, Neil; Hunter, Christopher; Jang, Mikyung; Leinonen, Rasko; Liu, Xin; Oisel, Arnaud; Pakseresht, Nima; Plaister, Sheila; Radhakrishnan, Rajesh; Reddy, Kethi; Rivière, Stephane; Rossello, Marc; Senf, Alexander; Smirnov, Dimitriy; Ten Hoopen, Petra; Vaughan, Daniel; Vaughan, Robert; Zalunin, Vadim; Cochrane, Guy.

Nucleic Acids Res ; 40(Database issue): D43-7, 2012 Jan.

Article in English | MEDLINE | ID: mdl-22080548

ABSTRACT

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena), Europe's primary nucleotide sequence resource, captures and presents globally comprehensive nucleic acid sequence and associated information. Covering the spectrum from raw data to assembled and functionally annotated genomes, the ENA has witnessed a dramatic growth resulting from advances in sequencing technology and ever broadening application of the methodology. During 2011, we have continued to operate and extend the broad range of ENA services. In particular, we have released major new functionality in our interactive web submission system, Webin, through developments in template-based submissions for annotated sequences and support for raw next-generation sequence read submissions.

Subject(s)

Databases, Nucleic Acid , Sequence Analysis, DNA , Sequence Analysis, RNA , Genomics , High-Throughput Nucleotide Sequencing , Internet , Molecular Sequence Annotation , Software , User-Computer Interface

3.

The European Nucleotide Archive.

Leinonen, Rasko; Akhtar, Ruth; Birney, Ewan; Bower, Lawrence; Cerdeno-Tárraga, Ana; Cheng, Ying; Cleland, Iain; Faruque, Nadeem; Goodgame, Neil; Gibson, Richard; Hoad, Gemma; Jang, Mikyung; Pakseresht, Nima; Plaister, Sheila; Radhakrishnan, Rajesh; Reddy, Kethi; Sobhany, Siamak; Ten Hoopen, Petra; Vaughan, Robert; Zalunin, Vadim; Cochrane, Guy.

Nucleic Acids Res ; 39(Database issue): D28-31, 2011 Jan.

Article in English | MEDLINE | ID: mdl-20972220

ABSTRACT

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as an experimental research platform by providing data submission, archive, search and download services. In this article, we outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.

Subject(s)

Base Sequence , Databases, Nucleic Acid , Europe , High-Throughput Nucleotide Sequencing , Molecular Sequence Annotation

4.

Improvements to services at the European Nucleotide Archive.

Leinonen, Rasko; Akhtar, Ruth; Birney, Ewan; Bonfield, James; Bower, Lawrence; Corbett, Matt; Cheng, Ying; Demiralp, Fehmi; Faruque, Nadeem; Goodgame, Neil; Gibson, Richard; Hoad, Gemma; Hunter, Christopher; Jang, Mikyung; Leonard, Steven; Lin, Quan; Lopez, Rodrigo; Maguire, Michael; McWilliam, Hamish; Plaister, Sheila; Radhakrishnan, Rajesh; Sobhany, Siamak; Slater, Guy; Ten Hoopen, Petra; Valentin, Franck; Vaughan, Robert; Zalunin, Vadim; Zerbino, Daniel; Cochrane, Guy.

Nucleic Acids Res ; 38(Database issue): D39-45, 2010 Jan.

Article in English | MEDLINE | ID: mdl-19906712

ABSTRACT

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL-EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.

Subject(s)

Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Access to Information , Algorithms , Animals , Computational Biology/trends , DNA/genetics , Europe , Humans , Information Storage and Retrieval/methods , Internet , Software

5.

Petabyte-scale innovations at the European Nucleotide Archive.

Cochrane, Guy; Akhtar, Ruth; Bonfield, James; Bower, Lawrence; Demiralp, Fehmi; Faruque, Nadeem; Gibson, Richard; Hoad, Gemma; Hubbard, Tim; Hunter, Christopher; Jang, Mikyung; Juhos, Szilveszter; Leinonen, Rasko; Leonard, Steven; Lin, Quan; Lopez, Rodrigo; Lorenc, Dariusz; McWilliam, Hamish; Mukherjee, Gaurab; Plaister, Sheila; Radhakrishnan, Rajesh; Robinson, Stephen; Sobhany, Siamak; Hoopen, Petra Ten; Vaughan, Robert; Zalunin, Vadim; Birney, Ewan.

Nucleic Acids Res ; 37(Database issue): D19-25, 2009 Jan.

Article in English | MEDLINE | ID: mdl-18978013

ABSTRACT

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.

Subject(s)

Databases, Nucleic Acid , Sequence Analysis/trends , Internet , Systems Integration

6.

The minimum information about a genome sequence (MIGS) specification.

Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; DePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer.

Nat Biotechnol ; 26(5): 541-7, 2008 May.

Article in English | MEDLINE | ID: mdl-18464787

ABSTRACT

With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.

Subject(s)

Chromosome Mapping/methods , Chromosome Mapping/standards , Databases, Factual/standards , Information Dissemination/methods , Information Storage and Retrieval/standards , Information Theory , Internationality

7.

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database.

Cochrane, Guy; Akhtar, Ruth; Aldebert, Philippe; Althorpe, Nicola; Baldwin, Alastair; Bates, Kirsty; Bhattacharyya, Sumit; Bonfield, James; Bower, Lawrence; Browne, Paul; Castro, Matias; Cox, Tony; Demiralp, Fehmi; Eberhardt, Ruth; Faruque, Nadeem; Hoad, Gemma; Jang, Mikyung; Kulikova, Tamara; Labarga, Alberto; Leinonen, Rasko; Leonard, Steven; Lin, Quan; Lopez, Rodrigo; Lorenc, Dariusz; McWilliam, Hamish; Mukherjee, Gaurab; Nardone, Francesco; Plaister, Sheila; Robinson, Stephen; Sobhany, Siamak; Vaughan, Robert; Wu, Dan; Zhu, Weimin; Apweiler, Rolf; Hubbard, Tim; Birney, Ewan.

Nucleic Acids Res ; 36(Database issue): D5-12, 2008 Jan.

Article in English | MEDLINE | ID: mdl-18039715

ABSTRACT

The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.

Subject(s)

Databases, Nucleic Acid , Sequence Analysis, DNA , Animals , Archives , Genomics , Internet

8.

EMBL Nucleotide Sequence Database in 2006.

Kulikova, Tamara; Akhtar, Ruth; Aldebert, Philippe; Althorpe, Nicola; Andersson, Mikael; Baldwin, Alastair; Bates, Kirsty; Bhattacharyya, Sumit; Bower, Lawrence; Browne, Paul; Castro, Matias; Cochrane, Guy; Duggan, Karyn; Eberhardt, Ruth; Faruque, Nadeem; Hoad, Gemma; Kanz, Carola; Lee, Charles; Leinonen, Rasko; Lin, Quan; Lombard, Vincent; Lopez, Rodrigo; Lorenc, Dariusz; McWilliam, Hamish; Mukherjee, Gaurab; Nardone, Francesco; Pastor, Maria Pilar Garcia; Plaister, Sheila; Sobhany, Siamak; Stoehr, Peter; Vaughan, Robert; Wu, Dan; Zhu, Weimin; Apweiler, Rolf.

Nucleic Acids Res ; 35(Database issue): D16-20, 2007 Jan.

Article in English | MEDLINE | ID: mdl-17148479

ABSTRACT

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl) at the EMBL European Bioinformatics Institute, UK, offers a large and freely accessible collection of nucleotide sequences and accompanying annotation. The database is maintained in collaboration with DDBJ and GenBank. Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation, alignments and bulk data. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. In 2006, the volume of data has continued to grow exponentially. Access to the data is provided via SRS, ftp and variety of other methods. Extensive external and internal cross-references enable users to search for related information across other databases and within the database. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk/. Changes over the past year include changes to the file format, further development of the EMBLCDS dataset and developments to the XML format.

Subject(s)

Databases, Nucleic Acid , Base Sequence , Databases, Nucleic Acid/trends , Internet , User-Computer Interface

9.

Concept of sample in OMICS technology.

Morrison, Norman; Cochrane, Guy; Faruque, Nadeem; Tatusova, Tatiana; Tateno, Yoshio; Hancock, David; Field, Dawn.

OMICS ; 10(2): 127-37, 2006.

Article in English | MEDLINE | ID: mdl-16901217

ABSTRACT

Fundamental biological processes can now be studied by applying the full range of OMICS technologies (genomics, transcriptomics, proteomics, metabolomics, and beyond) to the same biological sample. Clearly, it would be desirable if the concept of sample were shared among these technologies, especially as up until the time a biological sample is prepared for use in a specific OMICS assay, its description is inherently technology independent. Sharing a common informatic representation would encourage data sharing (rather than data replication), thereby reducing redundant data capture and the potential for error. This would result in a significant degree of harmonization across different OMICS data standardization activities, a task that is critical if we are to integrate data from these different data sources. Here, we review the current concept of sample in OMICS technologies as it is being dealt with by different OMICS standardization initiatives and discuss the special role that the newly formed Genomic Standards Consortium (GSC) might have to play in this domain.

Subject(s)

Databases, Nucleic Acid/standards , Genome, Human , Genome , Genomics/standards , Proteome/genetics , Proteomics/standards , Animals , Humans , Oligonucleotide Array Sequence Analysis/standards

10.

EMBL Nucleotide Sequence Database: developments in 2005.

Cochrane, Guy; Aldebert, Philippe; Althorpe, Nicola; Andersson, Mikael; Baker, Wendy; Baldwin, Alastair; Bates, Kirsty; Bhattacharyya, Sumit; Browne, Paul; van den Broek, Alexandra; Castro, Matias; Duggan, Karyn; Eberhardt, Ruth; Faruque, Nadeem; Gamble, John; Kanz, Carola; Kulikova, Tamara; Lee, Charles; Leinonen, Rasko; Lin, Quan; Lombard, Vincent; Lopez, Rodrigo; McHale, Michelle; McWilliam, Hamish; Mukherjee, Gaurab; Nardone, Francesco; Pastor, Maria Pilar Garcia; Sobhany, Siamak; Stoehr, Peter; Tzouvara, Katerina; Vaughan, Robert; Wu, Dan; Zhu, Weimin; Apweiler, Rolf.

Nucleic Acids Res ; 34(Database issue): D10-5, 2006 Jan 01.

Article in English | MEDLINE | ID: mdl-16381823

ABSTRACT

The EMBL Nucleotide Sequence Database (www.ebi.ac.uk/embl) at the EMBL European Bioinformatics Institute, UK, offers a comprehensive set of publicly available nucleotide sequence and annotation, freely accessible to all. Maintained in collaboration with partners DDBJ and GenBank, coverage includes whole genome sequencing project data, directly submitted sequence, sequence recorded in support of patent applications and much more. The database continues to offer submission tools, data retrieval facilities and user support. In 2005, the volume of data offered has continued to grow exponentially. In addition to the newly presented data, the database encompasses a range of new data types generated by novel technologies, offers enhanced presentation and searchability of the data and has greater integration with other data resources offered at the EBI and elsewhere. In stride with these developing data types, the database has continued to develop submission and retrieval tools to maximise the information content of submitted data and to offer the simplest possible submission routes for data producers. New developments, the submission process, data retrieval and access to support are presented in this paper, along with links to sources of further information.

Subject(s)

Databases, Nucleic Acid , Animals , Base Sequence , Genomics , Internet , Software , User-Computer Interface

11.

The EMBL Nucleotide Sequence Database.

Kanz, Carola; Aldebert, Philippe; Althorpe, Nicola; Baker, Wendy; Baldwin, Alastair; Bates, Kirsty; Browne, Paul; van den Broek, Alexandra; Castro, Matias; Cochrane, Guy; Duggan, Karyn; Eberhardt, Ruth; Faruque, Nadeem; Gamble, John; Diez, Federico Garcia; Harte, Nicola; Kulikova, Tamara; Lin, Quan; Lombard, Vincent; Lopez, Rodrigo; Mancuso, Renato; McHale, Michelle; Nardone, Francesco; Silventoinen, Ville; Sobhany, Siamak; Stoehr, Peter; Tuli, Mary Ann; Tzouvara, Katerina; Vaughan, Robert; Wu, Dan; Zhu, Weimin; Apweiler, Rolf.

Nucleic Acids Res ; 33(Database issue): D29-33, 2005 Jan 01.

Article in English | MEDLINE | ID: mdl-15608199

ABSTRACT

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.

Subject(s)

Databases, Nucleic Acid , Base Sequence , Databases, Nucleic Acid/trends , Internet , User-Computer Interface

12.

Integr8 and Genome Reviews: integrated views of complete genomes and proteomes.

Kersey, Paul; Bower, Lawrence; Morris, Lorna; Horne, Alan; Petryszak, Robert; Kanz, Carola; Kanapin, Alexander; Das, Ujjwal; Michoud, Karine; Phan, Isabelle; Gattiker, Alexandre; Kulikova, Tamara; Faruque, Nadeem; Duggan, Karyn; Mclaren, Peter; Reimholz, Britt; Duret, Laurent; Penel, Simon; Reuter, Ingmar; Apweiler, Rolf.

Nucleic Acids Res ; 33(Database issue): D297-302, 2005 Jan 01.

Article in English | MEDLINE | ID: mdl-15608201

ABSTRACT

Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8.

Subject(s)

Databases, Genetic , Genomics , Proteomics , DNA, Archaeal/chemistry , DNA, Bacterial/chemistry , Internet , Systems Integration , User-Computer Interface

13.

The EMBL Nucleotide Sequence Database.

Kulikova, Tamara; Aldebert, Philippe; Althorpe, Nicola; Baker, Wendy; Bates, Kirsty; Browne, Paul; van den Broek, Alexandra; Cochrane, Guy; Duggan, Karyn; Eberhardt, Ruth; Faruque, Nadeem; Garcia-Pastor, Maria; Harte, Nicola; Kanz, Carola; Leinonen, Rasko; Lin, Quan; Lombard, Vincent; Lopez, Rodrigo; Mancuso, Renato; McHale, Michelle; Nardone, Francesco; Silventoinen, Ville; Stoehr, Peter; Stoesser, Guenter; Tuli, Mary Ann; Tzouvara, Katerina; Vaughan, Robert; Wu, Dan; Zhu, Weimin; Apweiler, Rolf.

Nucleic Acids Res ; 32(Database issue): D27-30, 2004 Jan 01.

Article in English | MEDLINE | ID: mdl-14681351

ABSTRACT

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), maintained at the European Bioinformatics Institute (EBI), incorporates, organizes and distributes nucleotide sequences from public sources. The database is a part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. The web-based tool, Webin, is the preferred system for individual submission of nucleotide sequences, including Third Party Annotation (TPA) and alignment data. Automatic submission procedures are used for submission of data from large-scale genome sequencing centres and from the European Patent Office. Database releases are produced quarterly. The latest data collection can be accessed via FTP, email and WWW interfaces. The EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases as well as many other specialist molecular biology databases. For sequence similarity searching, a variety of tools (e.g. FASTA and BLAST) are available that allow external users to compare their own sequences against the data in the EMBL Nucleotide Sequence Database, the complete genomic component subsection of the database, the WGS data sets and other databases. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk.

Subject(s)

Databases, Nucleic Acid , Animals , Europe , Genomics , Humans , Information Storage and Retrieval , Internet

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL