Pesquisa | Portal Regional da BVS

1.

Establishing a national research software award.

Blanc Catala, Isabelle; Di Cosmo, Roberto; Giraud, Mathieu; Le Berre, Daniel; Louvet, Violaine; Renaudin, Sophie.

Open Res Eur ; 3: 185, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38009089

RESUMO

Software development has become an integral part of the scholarly ecosystem, spanning all fields and disciplines. To support the sharing and creation of knowledge in line with open science principles, and particularly to enable the reproducibility of research results, it is crucial to make the source code of research software available, allowing for modification, reuse, and distribution. Recognizing the significance of open-source software contributions in academia, the second French Plan for Open Science, announced by the Minister of Higher Education and Research in 2021, introduced a National Award to promote open-source research software. This award serves multiple objectives: firstly, to highlight the software projects and teams that have devoted time and effort to develop outstanding research software, sometimes for decades, and often with little recognition; secondly, to draw attention to the importance of software as a valuable research output and to inspire new generations of researchers to follow and learn from these examples. We present here an in-depth analysis of the design and implementation of this unique initiative. As a national award established explicitly to foster Open Science practices by the French Minister of Research, it faced the intricate challenge of fairly evaluating open research software across all fields, striving for inclusivity across domains, applications, and participants. We provide a comprehensive report on the results of the first edition, which received 129 high-quality submissions. Additionally, we emphasize the impact of this initiative on the open science landscape, promoting software as a valuable research outcome, on par with publications.

Software is crucial for modern research. For the goals of open science, reproducibility, and wider reuse, sharing software source code and acknowledging software development are essential. In France, in 2021, the Minister of Higher Education and Research introduced the National Plan for Open Science. The plan highlights the role of open-source software in academia and aims to give software the same recognition as publications and data. A part of the plan is the introduction of a National Award to recognize open-source research software contributions. This award acknowledges software projects and their teams, which have often worked without much recognition. It also emphasizes the importance of software as a research output, hoping to inspire future researchers. This article examines the award's design and implementation. It addresses the challenges of assessing open research software from different research fields. In the first edition of the award, there were 129 high-quality submissions, indicating the award's potential to shift perspectives on software's role in open science, aligning it with the importance of academic publications. Through a detailed account of our experiences and the insights gained, we aim to provide a reference for other countries or institutions considering to establish similar recognitions.

2.

One-Step Next-Generation Sequencing of Immunoglobulin and T-Cell Receptor Gene Recombinations for MRD Marker Identification in Acute Lymphoblastic Leukemia.

Villarese, Patrick; Abdo, Chrystelle; Bertrand, Matthieu; Thonier, Florian; Giraud, Mathieu; Salson, Mikaël; Macintyre, Elizabeth.

Methods Mol Biol ; 2453: 43-59, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35622319

RESUMO

Within the EuroClonality-NGS group, immune repertoire analysis for target identification in lymphoid malignancies was initially developed using two-stage amplicon approaches, essentially as a progressive modification of preceding methods developed for Sanger sequencing. This approach has, however, limitations with respect to sample handling, adaptation to automation, and risk of contamination by amplicon products. We therefore developed one-step PCR amplicon methods with individual barcoding for batched analysis for IGH, IGK, TRD, TRG, and TRB rearrangements, followed by Vidjil-based data analysis.

Assuntos

Genes Codificadores dos Receptores de Linfócitos T , Sequenciamento de Nucleotídeos em Larga Escala , Imunoglobulinas , Leucemia-Linfoma Linfoblástico de Células Precursoras , Recombinação Genética , Genes Codificadores dos Receptores de Linfócitos T/genética , Genes Codificadores dos Receptores de Linfócitos T/imunologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Imunoglobulinas/genética , Imunoglobulinas/imunologia , Neoplasia Residual/diagnóstico , Neoplasia Residual/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/imunologia , Recombinação Genética/genética , Recombinação Genética/imunologia

3.

Immunoglobulin Gene Mutational Status Assessment by Next Generation Sequencing in Chronic Lymphocytic Leukemia.

Langlois de Septenville, Anne; Boudjoghra, Myriam; Bravetti, Clotilde; Armand, Marine; Salson, Mikaël; Giraud, Mathieu; Davi, Frederic.

Methods Mol Biol ; 2453: 153-167, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35622326

RESUMO

B cell receptor (BcR) immunoglobulins (IG) display a tremendous diversity due to complex DNA rearrangements, the V(D)J recombination, further enhanced by the somatic hypermutation process. In chronic lymphocytic leukemia (CLL), the mutational load of the clonal BcR IG expressed by the leukemic cells constitutes an important prognostic and predictive biomarker. Here, we provide a reliable methodology capable of determining the mutational status of IG genes in CLL using high-throughput sequencing, starting from leukemic cell DNA or RNA.

Assuntos

Leucemia Linfocítica Crônica de Células B , Genes de Imunoglobulinas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Imunoglobulinas/genética , Leucemia Linfocítica Crônica de Células B/genética , Receptores de Antígenos de Linfócitos B/genética

4.

Standardized next-generation sequencing of immunoglobulin and T-cell receptor gene recombinations for MRD marker identification in acute lymphoblastic leukaemia; a EuroClonality-NGS validation study.

Brüggemann, Monika; Kotrová, Michaela; Knecht, Henrik; Bartram, Jack; Boudjogrha, Myriam; Bystry, Vojtech; Fazio, Grazia; Fronková, Eva; Giraud, Mathieu; Grioni, Andrea; Hancock, Jeremy; Herrmann, Dietrich; Jiménez, Cristina; Krejci, Adam; Moppett, John; Reigl, Tomas; Salson, Mikael; Scheijen, Blanca; Schwarz, Martin; Songia, Simona; Svaton, Michael; van Dongen, Jacques J M; Villarese, Patrick; Wakeman, Stephanie; Wright, Gary; Cazzaniga, Giovanni; Davi, Frédéric; García-Sanz, Ramón; Gonzalez, David; Groenen, Patricia J T A; Hummel, Michael; Macintyre, Elizabeth A; Stamatopoulos, Kostas; Pott, Christiane; Trka, Jan; Darzentas, Nikos; Langerak, Anton W.

Leukemia ; 33(9): 2241-2253, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31243313

RESUMO

Amplicon-based next-generation sequencing (NGS) of immunoglobulin (IG) and T-cell receptor (TR) gene rearrangements for clonality assessment, marker identification and quantification of minimal residual disease (MRD) in lymphoid neoplasms has been the focus of intense research, development and application. However, standardization and validation in a scientifically controlled multicentre setting is still lacking. Therefore, IG/TR assay development and design, including bioinformatics, was performed within the EuroClonality-NGS working group and validated for MRD marker identification in acute lymphoblastic leukaemia (ALL). Five EuroMRD ALL reference laboratories performed IG/TR NGS in 50 diagnostic ALL samples, and compared results with those generated through routine IG/TR Sanger sequencing. A central polytarget quality control (cPT-QC) was used to monitor primer performance, and a central in-tube quality control (cIT-QC) was spiked into each sample as a library-specific quality control and calibrator. NGS identified 259 (average 5.2/sample, range 0-14) clonal sequences vs. Sanger-sequencing 248 (average 5.0/sample, range 0-14). NGS primers covered possible IG/TR rearrangement types more completely compared with local multiplex PCR sets and enabled sequencing of bi-allelic rearrangements and weak PCR products. The cPT-QC showed high reproducibility across all laboratories. These validated and reproducible quality-controlled EuroClonality-NGS assays can be used for standardized NGS-based identification of IG/TR markers in lymphoid malignancies.

Assuntos

Rearranjo Gênico do Linfócito T/genética , Genes Codificadores dos Receptores de Linfócitos T/genética , Marcadores Genéticos/genética , Imunoglobulinas/genética , Neoplasia Residual/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Biologia Computacional/métodos , Genes de Imunoglobulinas/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Receptores de Antígenos de Linfócitos T/genética , Recombinação Genética/genética , Padrões de Referência , Reprodutibilidade dos Testes

5.

Indexing labeled sequences.

Rocher, Tatiana; Giraud, Mathieu; Salson, Mikaël.

PeerJ Comput Sci ; 4: e148, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-33816803

RESUMO

BACKGROUND: Labels are a way to add some information on a text, such as functional annotations such as genes on a DNA sequences. V(D)J recombinations are DNA recombinations involving two or three short genes in lymphocytes. Sequencing this short region (500 bp or less) produces labeled sequences and brings insight in the lymphocyte repertoire for onco-hematology or immunology studies. METHODS: We present two indexes for a text with non-overlapping labels. They store the text in a Burrows-Wheeler transform (BWT) and a compressed label sequence in a Wavelet Tree. The label sequence is taken in the order of the text (TL-index) or in the order of the BWT (TLBW-index). Both indexes need a space related to the entropy of the labeled text. RESULTS: These indexes allow efficient text-label queries to count and find labeled patterns. The TLBW-index has an overhead on simple label queries but is very efficient on combined pattern-label queries. We implemented the indexes in C++ and compared them against a baseline solution on pseudo-random as well as on V(D)J labeled texts. DISCUSSION: New indexes such as the ones we proposed improve the way we index and query labeled texts as, for instance, lymphocyte repertoire for hematological and immunological studies.

6.

High-Throughput Immunogenetics for Clinical and Research Applications in Immunohematology: Potential and Challenges.

Langerak, Anton W; Brüggemann, Monika; Davi, Frédéric; Darzentas, Nikos; van Dongen, Jacques J M; Gonzalez, David; Cazzaniga, Gianni; Giudicelli, Véronique; Lefranc, Marie-Paule; Giraud, Mathieu; Macintyre, Elizabeth A; Hummel, Michael; Pott, Christiane; Groenen, Patricia J T A; Stamatopoulos, Kostas.

J Immunol ; 198(10): 3765-3774, 2017 05 15.

Artigo em Inglês | MEDLINE | ID: mdl-28416603

RESUMO

Analysis and interpretation of Ig and TCR gene rearrangements in the conventional, low-throughput way have their limitations in terms of resolution, coverage, and biases. With the advent of high-throughput, next-generation sequencing (NGS) technologies, a deeper analysis of Ig and/or TCR (IG/TR) gene rearrangements is now within reach, which impacts on all main applications of IG/TR immunogenetic analysis. To bridge the generation gap from low- to high-throughput analysis, the EuroClonality-NGS Consortium has been formed, with the main objectives to develop, standardize, and validate the entire workflow of IG/TR NGS assays for 1) clonality assessment, 2) minimal residual disease detection, and 3) repertoire analysis. This concerns the preanalytical (sample preparation, target choice), analytical (amplification, NGS), and postanalytical (immunoinformatics) phases. Here we critically discuss pitfalls and challenges of IG/TR NGS methodology and its applications in hemato-oncology and immunology.

Assuntos

Hematologia/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Imunogenética/métodos , Técnicas Imunológicas , Alelos , Biologia Computacional/métodos , Rearranjo Gênico , Genes de Imunoglobulinas , Genes Codificadores dos Receptores de Linfócitos T/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Imunogenética/normas

7.

Correction: Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.

Duez, Marc; Giraud, Mathieu; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian.

PLoS One ; 12(2): e0172249, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28182777

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0166126.].

8.

High-throughput sequencing in acute lymphoblastic leukemia: Follow-up of minimal residual disease and emergence of new clones.

Salson, Mikaël; Giraud, Mathieu; Caillault, Aurélie; Grardel, Nathalie; Duployez, Nicolas; Ferret, Yann; Duez, Marc; Herbert, Ryan; Rocher, Tatiana; Sebda, Shéhérazade; Quief, Sabine; Villenet, Céline; Figeac, Martin; Preudhomme, Claude.

Leuk Res ; 53: 1-7, 2017 02.

Artigo em Inglês | MEDLINE | ID: mdl-27930944

RESUMO

Minimal residual disease (MRD) is known to be an independent prognostic factor in patients with acute lymphoblastic leukemia (ALL). High-throughput sequencing (HTS) is currently used in routine practice for the diagnosis and follow-up of patients with hematological neoplasms. In this retrospective study, we examined the role of immunoglobulin/T-cell receptor-based MRD in patients with ALL by HTS analysis of immunoglobulin H and/or T-cell receptor gamma chain loci in bone marrow samples from 11 patients with ALL, at diagnosis and during follow-up. We assessed the clinical feasibility of using combined HTS and bioinformatics analysis with interactive visualization using Vidjil software. We discuss the advantages and drawbacks of HTS for monitoring MRD. HTS gives a more complete insight of the leukemic population than conventional real-time quantitative PCR (qPCR), and allows identification of new emerging clones at each time point of the monitoring. Thus, HTS monitoring of Ig/TR based MRD is expected to improve the management of patients with ALL.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasia Residual/diagnóstico , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Medula Óssea , Células Clonais/patologia , Seguimentos , Genes Codificadores da Cadeia gama de Receptores de Linfócitos T , Humanos , Cadeias Pesadas de Imunoglobulinas/genética , Monitorização Imunológica , Neoplasia Residual/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Estudos Retrospectivos , Software

9.

Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.

Duez, Marc; Giraud, Mathieu; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian.

PLoS One ; 11(11): e0166126, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27835690

RESUMO

BACKGROUND: The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. METHODS AND RESULTS: Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications.

Assuntos

Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Recombinação V(D)J/genética , Navegador , Algoritmos , Sequência de Bases , Humanos , Internet , Linfócitos/imunologia , Linfócitos/metabolismo , Reprodutibilidade dos Testes , Homologia de Sequência do Ácido Nucleico

10.

Multi-loci diagnosis of acute lymphoblastic leukaemia with high-throughput sequencing and bioinformatics analysis.

Ferret, Yann; Caillault, Aurélie; Sebda, Shéhérazade; Duez, Marc; Grardel, Nathalie; Duployez, Nicolas; Villenet, Céline; Figeac, Martin; Preudhomme, Claude; Salson, Mikaël; Giraud, Mathieu.

Br J Haematol ; 173(3): 413-20, 2016 05.

Artigo em Inglês | MEDLINE | ID: mdl-26898266

RESUMO

High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients.

Assuntos

Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Adolescente , Adulto , Criança , Pré-Escolar , Células Clonais , Erros de Diagnóstico/prevenção & controle , Rearranjo Gênico do Linfócito T , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Lactente , Recém-Nascido , Neoplasia Residual/diagnóstico , Estudos Prospectivos , Software , Recombinação V(D)J/genética , Adulto Jovem

11.

The predictive strength of next-generation sequencing MRD detection for relapse compared with current methods in childhood ALL.

Kotrova, Michaela; Muzikova, Katerina; Mejstrikova, Ester; Novakova, Michaela; Bakardjieva-Mihaylova, Violeta; Fiser, Karel; Stuchly, Jan; Giraud, Mathieu; Salson, Mikaël; Pott, Christiane; Brüggemann, Monika; Füllgrabe, Marc; Stary, Jan; Trka, Jan; Fronkova, Eva.

Blood ; 126(8): 1045-7, 2015 Aug 20.

Artigo em Inglês | MEDLINE | ID: mdl-26294720

Assuntos

Neoplasia Residual/diagnóstico , Reação em Cadeia da Polimerase/métodos , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Valor Preditivo dos Testes , Recidiva

12.

Modeling alternate RNA structures in genomic sequences.

Saffarian, Azadeh; Giraud, Mathieu; Touzet, Hélène.

J Comput Biol ; 22(3): 190-204, 2015 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-25768235

RESUMO

We introduce the concept of RNA multistructures, which is a formal grammar-based framework specifically designed to model a set of alternate RNA secondary structures. Such alternate structures can either be a set of suboptimal foldings, or distinct stable folding states, or variants within an RNA family. We provide several such examples and propose an efficient algorithm to search for RNA multistructures within a genomic sequence.

Assuntos

Dobramento de RNA , RNA de Transferência/química , RNA/química , Algoritmos , Proteínas de Bactérias/química , Genoma , Humanos , Sequências Repetidas Invertidas , Modelos Moleculares , RNA Bacteriano/química , RNA Mitocondrial , Ribonuclease P/química , Riboswitch

13.

Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing.

Giraud, Mathieu; Salson, Mikaël; Duez, Marc; Villenet, Céline; Quief, Sabine; Caillault, Aurélie; Grardel, Nathalie; Roumier, Christophe; Preudhomme, Claude; Figeac, Martin.

BMC Genomics ; 15: 409, 2014 May 28.

Artigo em Inglês | MEDLINE | ID: mdl-24885090

RESUMO

BACKGROUND: V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. RESULTS: We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR Î³ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols. CONCLUSIONS: The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Análise de Sequência de DNA/métodos , Recombinação V(D)J , Humanos , Neoplasia Residual/diagnóstico , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Software

14.

Querying highly similar sequences.

Barton, Carl; Giraud, Mathieu; Iliopoulos, Costas S; Lecroq, Thierry; Mouchard, Laurent; Pissis, Solon P.

Int J Comput Biol Drug Des ; 6(1-2): 119-30, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23428478

RESUMO

In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S(0), S(1), , S(k), of sequences of equal length, where S(i), for all 1≤i≤k, differs from S(0) by a constant number of errors - around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k - the number of sequences.

Assuntos

Algoritmos , Análise de Sequência de DNA/métodos , Homologia de Sequência do Ácido Nucleico , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala

15.

RNA locally optimal secondary structures.

Saffarian, Azadeh; Giraud, Mathieu; de Monte, Antoine; Touzet, Hélène.

J Comput Biol ; 19(10): 1120-33, 2012 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-23057822

RESUMO

RNA locally optimal secondary structures provide a concise and exhaustive description of all possible secondary structures of a given RNA sequence, and hence a very good representation of the RNA folding space. In this paper, we present an efficient algorithm that computes all locally optimal secondary structures for any folding model that takes into account the stability of helical regions. This algorithm is implemented in a software called regliss that runs on a publicly accessible web server.

Assuntos

Algoritmos , Internet , Conformação de Ácido Nucleico , RNA , Análise de Sequência de RNA/métodos , Software , RNA/química , RNA/genética

16.

Optimal neighborhood indexing for protein similarity search.

Peterlongo, Pierre; Noé, Laurent; Lavenier, Dominique; Nguyen, Van Hoa; Kucherov, Gregory; Giraud, Mathieu.

BMC Bioinformatics ; 9: 534, 2008 Dec 16.

Artigo em Inglês | MEDLINE | ID: mdl-19087280

RESUMO

BACKGROUND: Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet. RESULTS: The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum. CONCLUSION: We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction.

Assuntos

Indexação e Redação de Resumos/métodos , Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação , Proteínas/química

17.

Domain organization within repeated DNA sequences: application to the study of a family of transposable elements.

Tempel, Sébastien; Giraud, Mathieu; Lavenier, Dominique; Lerman, Israël-César; Valin, Anne-Sophie; Couée, Ivan; Amrani, Abdelhak El; Nicolas, Jacques.

Bioinformatics ; 22(16): 1948-54, 2006 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-16809391

RESUMO

MOTIVATION: The analysis of repeated elements in genomes is a fascinating domain of research that is lacking relevant tools for transposable elements (TEs), the most complex ones. The dynamics of TEs, which provides the main mechanism of mutation in some genomes, is an essential component of genome evolution. In this study we introduce a new concept of domain, a segmentation unit useful for describing the architecture of different copies of TEs. Our method extracts occurrences of a terminus-defined family of TEs, aligns the sequences, finds the domains in the alignment and searches the distribution of each domain in sequences. After a classification step relative to the presence or the absence of domains, the method results in a graphical view of sequences segmented into domains. RESULTS: Analysis of the new non-autonomous TE AtREP21 in the model plant Arabidopsis thaliana reveals copies of very different sizes and various combinations of domains which show the potential of our method. AVAILABILITY: DomainOrganizer web page is available at www.irisa.fr/symbiose/DomainOrganizer/.

Assuntos

Biologia Computacional/métodos , Elementos de DNA Transponíveis/genética , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Aminoácidos , Arabidopsis/genética , Genes de Plantas , Cadeias de Markov , Modelos Biológicos , Modelos Estatísticos , Dados de Sequência Molecular , Proteínas de Plantas/química , Estrutura Terciária de Proteína

18.

The dog and rat olfactory receptor repertoires.

Quignon, Pascale; Giraud, Mathieu; Rimbault, Maud; Lavigne, Patricia; Tacher, Sandrine; Morin, Emmanuelle; Retout, Elodie; Valin, Anne-Sophie; Lindblad-Toh, Kerstin; Nicolas, Jacques; Galibert, Francis.

Genome Biol ; 6(10): R83, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-16207354

RESUMO

BACKGROUND: Dogs and rats have a highly developed capability to detect and identify odorant molecules, even at minute concentrations. Previous analyses have shown that the olfactory receptors (ORs) that specifically bind odorant molecules are encoded by the largest gene family sequenced in mammals so far. RESULTS: We identified five amino acid patterns characteristic of ORs in the recently sequenced boxer dog and brown Norway rat genomes. Using these patterns, we retrieved 1,094 dog genes and 1,493 rat genes from these shotgun sequences. The retrieved sequences constitute the olfactory receptor repertoires of these two animals. Subsets of 20.3% (for the dog) and 19.5% (for the rat) of these genes were annotated as pseudogenes as they had one or several mutations interrupting their open reading frames. We performed phylogenetic studies and organized these two repertoires into classes, families and subfamilies. CONCLUSION: We have established a complete or almost complete list of OR genes in the dog and the rat and have compared the sequences of these genes within and between the two species. Our results provide insight into the evolutionary development of these genes and the local amplifications that have led to the specific amplification of many subfamilies. We have also compared the human and rat ORs with the human and mouse OR repertoires.

Assuntos

Receptores Odorantes/química , Animais , Sequência Conservada , Cães , Genoma/genética , Família Multigênica , Filogenia , Pseudogenes/genética , Ratos , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA