Search | VHL Regional Portal

De novo clustering of long reads by gene from transcriptomics data.

Marchet, Camille; Lecompte, Lolita; Silva, Corinne Da; Cruaud, Corinne; Aury, Jean-Marc; Nicolas, Jacques; Peterlongo, Pierre.

Nucleic Acids Res ; 47(1): e2, 2019 01 10.

Article in English | MEDLINE | ID: mdl-30260405

ABSTRACT

Long-read sequencing currently provides sequences of several thousand base pairs. It is therefore possible to obtain complete transcripts, offering an unprecedented vision of the cellular transcriptome. However the literature lacks tools for de novo clustering of such data, in particular for Oxford Nanopore Technologies reads, because of the inherent high error rate compared to short reads. Our goal is to process reads from whole transcriptome sequencing data accurately and without a reference genome in order to reliably group reads coming from the same gene. This de novo approach is therefore particularly suitable for non-model species, but can also serve as a useful pre-processing step to improve read mapping. Our contribution both proposes a new algorithm adapted to clustering of reads by gene and a practical and free access tool that allows to scale the complete processing of eukaryotic transcriptomes. We sequenced a mouse RNA sample using the MinION device. This dataset is used to compare our solution to other algorithms used in the context of biological clustering. We demonstrate that it is the best approach for transcriptomics long reads. When a reference is available to enable mapping, we show that it stands as an alternative method that predicts complementary clusters.

Subject(s)

Gene Expression Profiling/methods , Genomics , High-Throughput Nucleotide Sequencing/methods , Transcriptome/genetics , Animals , Genome/genetics , Mice , RNA/genetics , Sequence Analysis, DNA

Expression sequence tag library derived from peripheral blood mononuclear cells of the chlorocebus sabaeus.

Tchitchek, Nicolas; Jacquelin, Béatrice; Wincker, Patrick; Dossat, Carole; Silva, Corinne Da; Weissenbach, Jean; Blancher, Antoine; Müller-Trutwin, Michaela; Benecke, Arndt.

BMC Genomics ; 13: 279, 2012 Jun 22.

Article in English | MEDLINE | ID: mdl-22726727

ABSTRACT

BACKGROUND: African Green Monkeys (AGM) are amongst the most frequently used nonhuman primate models in clinical and biomedical research, nevertheless only few genomic resources exist for this species. Such information would be essential for the development of dedicated new generation technologies in fundamental and pre-clinical research using this model, and would deliver new insights into primate evolution. RESULTS: We have exhaustively sequenced an Expression Sequence Tag (EST) library made from a pool of Peripheral Blood Mononuclear Cells from sixteen Chlorocebus sabaeus monkeys. Twelve of them were infected with the Simian Immunodeficiency Virus. The mononuclear cells were or not stimulated in vitro with Concanavalin A, with lipopolysacharrides, or through mixed lymphocyte reaction in order to generate a representative and broad library of expressed sequences in immune cells. We report here 37,787 sequences, which were assembled into 14,410 contigs representing an estimated 12% of the C. sabaeus transcriptome. Using data from primate genome databases, 9,029 assembled sequences from C. sabaeus could be annotated. Sequences have been systematically aligned with ten cDNA references of primate species including Homo sapiens, Pan troglodytes, and Macaca mulatta to identify ortholog transcripts. For 506 transcripts, sequences were quasi-complete. In addition, 6,576 transcript fragments are potentially specific to the C. sabaeus or corresponding to not yet described primate genes. CONCLUSIONS: The EST library we provide here will prove useful in gene annotation efforts for future sequencing of the African Green Monkey genomes. Furthermore, this library, which particularly well represents immunological and hematological gene expression, will be an important resource for the comparative analysis of gene expression in clinically relevant nonhuman primate and human research.

Subject(s)

Cercopithecinae/genetics , Expressed Sequence Tags , Gene Library , Leukocytes, Mononuclear/chemistry , Animals , Base Sequence , Cluster Analysis , Molecular Sequence Data , Phylogeny , Sequence Alignment , Sequence Analysis, DNA , Species Specificity

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL