Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
1.
Biophys Physicobiol ; 16: 444-451, 2019.
Article in English | MEDLINE | ID: mdl-31984196

ABSTRACT

This paper presents a preliminary work consisting of two contributions. The first one is the design of a very efficient algorithm based on an "Overlap-Layout-Consensus" (OLC) graph to assemble the long reads provided by 3rd generation technologies. The second concerns the analysis of this graph using algebraic topology concepts to determine, in advance, whether the assembly of the genome will be straightforward, i.e., whether it will lead to a pseudo-Hamiltonian path or cycle, or whether the results will need to be scrutinized. In the latter case, it will be necessary to look for "loops" in the OLC assembly graph caused by unresolved repeated genomic regions, and then try to untie the "knots" created by these regions.

2.
BMC Bioinformatics ; 19(1): 226, 2018 06 15.
Article in English | MEDLINE | ID: mdl-29902968

ABSTRACT

BACKGROUND: Third generation sequencing technologies generate long reads that exhibit high error rates, in particular for insertions and deletions which are usually the most difficult errors to cope with. The only exact algorithm capable of aligning sequences with insertions and deletions is a dynamic programming algorithm. RESULTS: In this note, for the sake of efficiency, we consider dynamic programming in a band. We show how to choose the band width in function of the long reads' error rates, thus obtaining an [Formula: see text] algorithm in space and time. We also propose a procedure to decide whether this algorithm, when applied to semi-global alignments, provides the optimal score. CONCLUSIONS: We suggest that dynamic programming in a band is well suited to the problem of aligning long reads between themselves and can be used as a core component of methods for obtaining a consensus sequence from the long reads alone. The function implementing the dynamic programming algorithm in a band is available, as a standalone program, at: https://forgemia.inra.fr/jean-francois.gibrat/BAND_DYN_PROG.git.


Subject(s)
Algorithms , Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Programming Languages , Sequence Analysis, DNA/methods , Software , Genome, Human , Humans
3.
F1000Res ; 72018.
Article in English | MEDLINE | ID: mdl-29568489

ABSTRACT

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).

4.
Bioinformatics ; 32(7): 1083-4, 2016 04 01.
Article in English | MEDLINE | ID: mdl-26607491

ABSTRACT

MOTIVATION: High-throughput sequencing technologies provide access to an increasing number of bacterial genomes. Today, many analyses involve the comparison of biological properties among many strains of a given species, or among species of a particular genus. Tools that can help the microbiologist with these tasks become increasingly important. RESULTS: Insyght is a comparative visualization tool whose core features combine a synchronized navigation across genomic data of multiple organisms with a versatile interoperability between complementary views. In this work, we have greatly increased the scope of the Insyght public dataset by including 2688 complete bacterial genomes available in Ensembl thus vastly improving its phylogenetic coverage. We also report the development of a virtual machine that allows users to easily set up and customize their own local Insyght server. AVAILABILITY AND IMPLEMENTATION: http://genome.jouy.inra.fr/Insyght CONTACT: Thomas.Lacroix@jouy.inra.fr.


Subject(s)
Computer Graphics , Genome, Bacterial , Phylogeny , Genomics , High-Throughput Nucleotide Sequencing , Internet , Software
5.
Stud Health Technol Inform ; 216: 1005, 2015.
Article in English | MEDLINE | ID: mdl-26262306

ABSTRACT

In Europe, health and medical administrative data is increasingly accumulating on a national level. Looking further than re-use of this data on a national level, sharing health and medical administrative data would enable large-scale analyses and European-level public health projects. There is currently no research infrastructure for this type of sharing. The PHRIMA consortium proposes to realise the Public Health Research Infrastructure for Sharing of health and Medical Administrative data (PHRIMA) which will enable and facilitate the efficient and secure sharing of healthcare data.


Subject(s)
Electronic Health Records/organization & administration , Health Services Research/organization & administration , Hospital Information Systems/organization & administration , Medical Record Linkage/methods , Public Health Administration/methods , Public Health Informatics/organization & administration , Europe , Information Dissemination/methods , Models, Organizational , Public Health
6.
PLoS One ; 10(4): e0124360, 2015.
Article in English | MEDLINE | ID: mdl-25867897

ABSTRACT

Cheese ripening is a complex biochemical process driven by microbial communities composed of both eukaryotes and prokaryotes. Surface-ripened cheeses are widely consumed all over the world and are appreciated for their characteristic flavor. Microbial community composition has been studied for a long time on surface-ripened cheeses, but only limited knowledge has been acquired about its in situ metabolic activities. We applied metagenomic, metatranscriptomic and biochemical analyses to an experimental surface-ripened cheese composed of nine microbial species during four weeks of ripening. By combining all of the data, we were able to obtain an overview of the cheese maturation process and to better understand the metabolic activities of the different community members and their possible interactions. Furthermore, differential expression analysis was used to select a set of biomarker genes, providing a valuable tool that can be used to monitor the cheese-making process.


Subject(s)
Cheese , Microbiota , Metagenomics , Transcriptome
7.
Prog Mol Biol Transl Sci ; 130: 1-36, 2015.
Article in English | MEDLINE | ID: mdl-25623335

ABSTRACT

This chapter describes the main characteristics of olfactory receptor (OR) genes of vertebrates, including generation of this large multigenic family and pseudogenization. OR genes are compared in relation to evolution and among species. OR gene structure and selection of a given gene for expression in an olfactory sensory neuron (OSN) are tackled. The specificities of OR proteins, their expression, and their function are presented. The expression of OR proteins in locations other than the nasal cavity is regulated by different mechanisms, and ORs display various additional functions. A conventional olfactory signal transduction cascade is observed in OSNs, but individual ORs can also mediate different signaling pathways, through the involvement of other molecular partners and depending on the odorant ligand encountered. ORs are engaged in constitutive dimers. Ligand binding induces conformational changes in the ORs that regulate their level of activity depending on odorant dose. When present, odorant binding proteins induce an allosteric modulation of OR activity. Since no 3D structure of an OR has been yet resolved, modeling has to be performed using the closest G-protein-coupled receptor 3D structures available, to facilitate virtual ligand screening using the models. The study of odorant binding modes and affinities may infer best-bet OR ligands, to be subsequently checked experimentally. The relationship between spatial and steric features of odorants and their activity in terms of perceived odor quality are also fields of research that development of computing tools may enhance.


Subject(s)
Imaging, Three-Dimensional , Odorants/analysis , Olfactory Mucosa/physiology , Receptors, Odorant/chemistry , Receptors, Odorant/physiology , Animals , Humans , Structure-Activity Relationship
8.
Nucleic Acids Res ; 42(21)2014 Dec 01.
Article in English | MEDLINE | ID: mdl-25249626

ABSTRACT

High-throughput techniques have considerably increased the potential of comparative genomics whilst simultaneously posing many new challenges. One of those challenges involves efficiently mining the large amount of data produced and exploring the landscape of both conserved and idiosyncratic genomic regions across multiple genomes. Domains of application of these analyses are diverse: identification of evolutionary events, inference of gene functions, detection of niche-specific genes or phylogenetic profiling. Insyght is a comparative genomic visualization tool that combines three complementary displays: (i) a table for thoroughly browsing amongst homologues, (ii) a comparator of orthologue functional annotations and (iii) a genomic organization view designed to improve the legibility of rearrangements and distinctive loci. The latter display combines symbolic and proportional graphical paradigms. Synchronized navigation across multiple species and interoperability between the views are core features of Insyght. A gene filter mechanism is provided that helps the user to build a biologically relevant gene set according to multiple criteria such as presence/absence of homologues and/or various annotations. We illustrate the use of Insyght with scenarios. Currently, only Bacteria and Archaea are supported. A public instance is available at http://genome.jouy.inra.fr/Insyght. The tool is freely downloadable for private data set analysis.


Subject(s)
Data Mining/methods , Genes, Bacterial , Genomics/methods , Molecular Sequence Annotation , Synteny , Computer Graphics , Genes, Archaeal , Sequence Homology, Nucleic Acid , Software
9.
Gut ; 63(10): 1566-77, 2014 Oct.
Article in English | MEDLINE | ID: mdl-24436141

ABSTRACT

OBJECTIVE: No Crohn's disease (CD) molecular maker has advanced to clinical use, and independent lines of evidence support a central role of the gut microbial community in CD. Here we explore the feasibility of extracting bacterial protein signals relevant to CD, by interrogating myriads of intestinal bacterial proteomes from a small number of patients and healthy controls. DESIGN: We first developed and validated a workflow-including extraction of microbial communities, two-dimensional difference gel electrophoresis (2D-DIGE), and LC-MS/MS-to discover protein signals from CD-associated gut microbial communities. Then we used selected reaction monitoring (SRM) to confirm a set of candidates. In parallel, we used 16S rRNA gene sequencing for an integrated analysis of gut ecosystem structure and functions. RESULTS: Our 2D-DIGE-based discovery approach revealed an imbalance of intestinal bacterial functions in CD. Many proteins, largely derived from Bacteroides species, were over-represented, while under-represented proteins were mostly from Firmicutes and some Prevotella members. Most overabundant proteins could be confirmed using SRM. They correspond to functions allowing opportunistic pathogens to colonise the mucus layers, breach the host barriers and invade the mucosae, which could still be aggravated by decreased host-derived pancreatic zymogen granule membrane protein GP2 in CD patients. Moreover, although the abundance of most protein groups reflected that of related bacterial populations, we found a specific independent regulation of bacteria-derived cell envelope proteins. CONCLUSIONS: This study provides the first evidence that quantifiable bacterial protein signals are associated with CD, which can have a profound impact on future molecular diagnosis.


Subject(s)
Bacterial Proteins/metabolism , Biomarkers/metabolism , Crohn Disease/microbiology , Intestines/microbiology , Adult , Bacteria/genetics , Bacteria/isolation & purification , Chromatography, Liquid , Cross-Sectional Studies , Electrophoresis, Gel, Two-Dimensional , Female , Humans , Male , RNA, Ribosomal, 16S/genetics , Sequence Analysis, Protein , Tandem Mass Spectrometry
10.
BMC Evol Biol ; 13: 154, 2013 Jul 17.
Article in English | MEDLINE | ID: mdl-23865988

ABSTRACT

BACKGROUND: Birnaviruses form a distinct family of double-stranded RNA viruses infecting animals as different as vertebrates, mollusks, insects and rotifers. With such a wide host range, they constitute a good model for studying the adaptation to the host. Additionally, several lines of evidence link birnaviruses to positive strand RNA viruses and suggest that phylogenetic analyses may provide clues about transition. RESULTS: We characterized the genome of a birnavirus from the rotifer Branchionus plicalitis. We used X-ray structures of RNA-dependent RNA polymerases and capsid proteins to obtain multiple structure alignments that allowed us to obtain reliable multiple sequence alignments and we employed "advanced" phylogenetic methods to study the evolutionary relationships between some positive strand and double-stranded RNA viruses. We showed that the rotifer birnavirus genome exhibited an organization remarkably similar to other birnaviruses. As this host was phylogenetically very distant from the other known species targeted by birnaviruses, we revisited the evolutionary pathways within the Birnaviridae family using phylogenetic reconstruction methods. We also applied a number of phylogenetic approaches based on structurally conserved domains/regions of the capsid and RNA-dependent RNA polymerase proteins to study the evolutionary relationships between birnaviruses, other double-stranded RNA viruses and positive strand RNA viruses. CONCLUSIONS: We show that there is a good correlation between the phylogeny of the birnaviruses and that of their hosts at the phylum level using the RNA-dependent RNA polymerase (genomic segment B) on the one hand and a concatenation of the capsid protein, protease and ribonucleoprotein (genomic segment A) on the other hand. This correlation tends to vanish within phyla. The use of advanced phylogenetic methods and robust structure-based multiple sequence alignments allowed us to obtain a more accurate picture (in terms of probability of the tree topologies) of the evolutionary affinities between double-stranded RNA and positive strand RNA viruses. In particular, we were able to show that there exists a good statistical support for the claims that dsRNA viruses are not monophyletic and that viruses with permuted RdRps belong to a common evolution lineage as previously proposed by other groups. We also propose a tree topology with a good statistical support describing the evolutionary relationships between the Picornaviridae, Caliciviridae, Flaviviridae families and a group including the Alphatetraviridae, Nodaviridae, Permutotretraviridae, Birnaviridae, and Cystoviridae families.


Subject(s)
Evolution, Molecular , RNA Viruses/genetics , Rotifera/virology , Amino Acid Sequence , Animals , Genome, Viral , Host Specificity , Phylogeny , RNA Viruses/classification , RNA Viruses/physiology , RNA Viruses/radiation effects , RNA, Double-Stranded/genetics , Rotifera/classification , Sequence Alignment , Viral Proteins/chemistry , Viral Proteins/genetics
11.
J Bacteriol ; 194(18): 5141-2, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22933766

ABSTRACT

Staphylococcus equorum subsp. equorum is a member of the coagulase-negative staphylococcus group and is frequently isolated from fermented food products and from food-processing environments. It contributes to the formation of aroma compounds during the ripening of fermented foods, especially cheeses and sausages. Here, we report the draft genome sequence of Staphylococcus equorum subsp. equorum Mu2 to provide insights into its physiology and compare it with other Staphylococcus species.


Subject(s)
DNA, Bacterial/chemistry , DNA, Bacterial/genetics , Genome, Bacterial , Sequence Analysis, DNA , Staphylococcus/genetics , Cheese/microbiology , Molecular Sequence Data , Staphylococcus/isolation & purification
12.
Protein Eng Des Sel ; 25(8): 377-86, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22691703

ABSTRACT

We present a procedure that (i) automates the homology modeling of mammalian olfactory receptors (ORs) based on the six three-dimensional (3D) structures of G protein-coupled receptors (GPCRs) available so far and (ii) performs the docking of odorants on these models, using the concept of colony energy to score the complexes. ORs exhibit low-sequence similarities with other GPCR and current alignment methods often fail to provide a reliable alignment. Here, we use a fold recognition technique to obtain a robust initial alignment. We then apply our procedure to a human OR that we have previously functionally characterized. The analysis of the resulting in silico complexes, supported by receptor mutagenesis and functional assays in a heterologous expression system, suggests that antagonists dock in the upper part of the binding pocket whereas agonists dock in the narrow lower part. We propose that the potency of agonists in activating receptors depends on their ability to establish tight interactions with the floor of the binding pocket. We developed a web site that allows the user to upload a GPCR sequence, choose a ligand in a library and obtain the 3D structure of the free receptor and ligand-receptor complex (http://genome.jouy.inra.fr/GPCRautomodel).


Subject(s)
Receptors, Odorant/chemistry , Receptors, Odorant/metabolism , Amino Acid Sequence , Computer Simulation , Databases, Protein , Humans , Ligands , Models, Molecular , Molecular Sequence Data , Odorants , Protein Binding , Protein Folding , Reproducibility of Results , Sequence Alignment , Sequence Homology, Amino Acid , Thermodynamics
13.
J Comput Biol ; 19(6): 796-813, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22506536

ABSTRACT

Mapping short reads against a reference genome is classically the first step of many next-generation sequencing data analyses, and it should be as accurate as possible. Because of the large number of reads to handle, numerous sophisticated algorithms have been developped in the last 3 years to tackle this problem. In this article, we first review the underlying algorithms used in most of the existing mapping tools, and then we compare the performance of nine of these tools on a well controled benchmark built for this purpose. We built a set of reads that exist in single or multiple copies in a reference genome and for which there is no mismatch, and a set of reads with three mismatches. We considered as reference genome both the human genome and a concatenation of all complete bacterial genomes. On each dataset, we quantified the capacity of the different tools to retrieve all the occurrences of the reads in the reference genome. Special attention was paid to reads uniquely reported and to reads with multiple hits.


Subject(s)
Algorithms , Bacteria/genetics , Chromosome Mapping/methods , Genome, Bacterial , Sequence Analysis, DNA/methods , Software , Base Sequence , Chromosome Mapping/statistics & numerical data , Genome, Human , Genomics , Humans , Molecular Sequence Data , Sequence Alignment , Sequence Analysis, DNA/statistics & numerical data
14.
J Bacteriol ; 194(9): 2385-6, 2012 May.
Article in English | MEDLINE | ID: mdl-22493197

ABSTRACT

Salmonella enterica subsp. enterica serotype Senftenberg is an emerging serotype in poultry production which has been found to persist in animals and the farm environment. We report the genome sequence and annotation of the SS209 strain of S. Senftenberg, isolated from a hatchery, which was identified as persistent in broiler chickens.


Subject(s)
Genome, Bacterial , Salmonella enterica/classification , Salmonella enterica/genetics , Chromosomes, Bacterial , DNA, Bacterial/genetics , Gene Expression Regulation, Bacterial , Molecular Sequence Data
15.
J Bacteriol ; 194(9): 2387-8, 2012 May.
Article in English | MEDLINE | ID: mdl-22493198

ABSTRACT

Salmonella enterica subsp. enterica serotype Enteritidis is one of the major causes of gastroenteritis in humans due to consumption of poultry derivatives. Here we report the whole-genome sequence and annotation, including the virulence plasmid, of S. Enteritidis LA5, which is a chicken isolate used by numerous laboratories in virulence studies.


Subject(s)
Genome, Bacterial , Salmonella enterica/classification , Salmonella enterica/genetics , Chromosomes, Bacterial , DNA, Bacterial/genetics , Gene Expression Regulation, Bacterial , Molecular Sequence Data
16.
Bioinformatics ; 28(7): 1040-1, 2012 Apr 01.
Article in English | MEDLINE | ID: mdl-22345617

ABSTRACT

SUMMARY: The DOMIRE web server implements a novel, automatic, protein structural domain assignment procedure based on 3D substructures of the query protein which are also found within structures of a non-redundant protein database. These common 3D substructures are transformed into a co-occurrence matrix that offers a global view of the protein domain organization. Three different algorithms are employed to define structural domain boundaries from this co-occurrence matrix. For each query, a list of structural neighbors and their alignments are provided. DOMIRE, by displaying the protein structural domain organization, can be a useful tool for defining protein common cores and for unravelling the evolutionary relationship between different proteins. AVAILABILITY: http://genome.jouy.inra.fr/domire CONTACT: jean.garnier@jouy.inra.fr.


Subject(s)
Internet , Protein Structure, Tertiary , Proteins/chemistry , Software , Algorithms , Databases, Protein , Sequence Alignment
17.
J Bacteriol ; 194(3): 738-9, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22247534

ABSTRACT

Corynebacterium casei is one of the most prevalent species present on the surfaces of smear-ripened cheeses, where it contributes to the production of the desired organoleptic properties. Here, we report the draft genome sequence of Corynebacterium casei UCMA 3821 to provide insights into its physiology.


Subject(s)
Cheese/microbiology , Corynebacterium/genetics , Genome, Bacterial , Base Sequence , Corynebacterium/isolation & purification , Molecular Sequence Data
18.
Biophys Rev ; 4(3): 255-269, 2012 Sep.
Article in English | MEDLINE | ID: mdl-28510073

ABSTRACT

Olfactory receptors (ORs) belong to the superfamily of G protein-coupled receptors (GPCRs), the second largest class of genes after those related to immunity, and account for about 3 % of mammalian genomes. ORs are present in all multicellular organisms and represent more than half the GPCRs in mammalian species (e.g., the mouse OR repertoire contains >1,000 functional genes). ORs are mainly expressed in the olfactory epithelium where they detect odorant molecules, but they are also expressed in a number of other cells, such as sperm cells, although their functions in these cells remain mostly unknown. It has recently been reported that ORs are present in tumoral tissues where they are expressed at different levels than in healthy tissues. A specific OR is over-expressed in prostate cancer cells, and activation of this OR has been shown to inhibit the proliferation of these cells. Odorant stimulation of some of these receptors results in inhibition of cell proliferation. Even though their biological role has not yet been elucidated, these receptors might constitute new targets for diagnosis and therapeutics. It is important to understand the activation mechanism of these receptors at the molecular level, in particular to be able to predict which ligands are likely to activate a particular receptor ('deorphanization') or to design antagonists for a given receptor. In this review, we describe the in silico methodologies used to model the three-dimensional (3D) structure of ORs (in the more general framework of GPCR modeling) and to dock ligands into these 3D structures.

19.
J Comput Biol ; 19(1): 13-29, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22149633

ABSTRACT

We present a general method for assessing threading score significance. The threading score of a protein sequence, thread onto a given structure, should be compared with the threading score distribution of a random amino-acid sequence, of the same length, thread on the same structure; small p-values point significantly high scores. We claim that, due to general protein contact map properties, this reference distribution is a Weibull extreme value distribution whose parameters depend on the threading method, the structure, the length of the query and the random sequence simulation model used. These parameters can be estimated off-line with simulated sequence samples, for different sequence lengths. They can further be interpolated at the exact length of a query, enabling the quick computation of the p-value.


Subject(s)
Models, Statistical , Sequence Alignment/methods , Sequence Analysis/methods , Statistical Distributions , Algorithms , Amino Acid Sequence , Computational Biology/methods , Computer Simulation , Markov Chains , Protein Conformation , Proteins/chemistry
20.
J Bacteriol ; 193(19): 5581-2, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21914889

ABSTRACT

Streptococcus thermophilus is a dairy species commonly used in the manufacture of cheese and yogurt. Here, we report the complete sequence of S. thermophilus strain JIM8232, isolated from milk and which produces a yellow pigment, an atypical trait for this bacterium.


Subject(s)
Genome, Bacterial/genetics , Streptococcus thermophilus/genetics , Animals , Coloring Agents , Milk/microbiology , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL
...