Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 49
Filter
1.
J Mol Evol ; 90(1): 56-72, 2022 02.
Article in English | MEDLINE | ID: mdl-35089376

ABSTRACT

DNA methylation is a crucial, abundant mechanism of gene regulation in vertebrates. It is less prevalent in many other metazoan organisms and completely absent in some key model species, such as Drosophila melanogaster and Caenorhabditis elegans. We report here a comprehensive study of the presence and absence of DNA methyltransferases (DNMTs) in 138 Ecdysozoa, covering Arthropoda, Nematoda, Priapulida, Onychophora, and Tardigrada. Three of these phyla have not been investigated for the presence of DNA methylation before. We observe that the loss of individual DNMTs independently occurred multiple times across ecdysozoan phyla. We computationally predict the presence of DNA methylation based on CpG rates in coding sequences using an implementation of Gaussian Mixture Modeling, MethMod. Integrating both analysis we predict two previously unknown losses of DNA methylation in Ecdysozoa, one within Chelicerata (Mesostigmata) and one in Tardigrada. In the early-branching Ecdysozoa Priapulus caudatus, we predict the presence of a full set of DNMTs and the presence of DNA methylation. We are therefore showing a very diverse and independent evolution of DNA methylation in different ecdysozoan phyla spanning a phylogenetic range of more than 700 million years.


Subject(s)
Arthropods , Nematoda , Tardigrada , Animals , Arthropods/genetics , Caenorhabditis elegans , DNA Methylation/genetics , Drosophila melanogaster , Nematoda/genetics , Phylogeny , Tardigrada/genetics
2.
J Exp Zool B Mol Dev Evol ; 330(1): 5-14, 2018 01.
Article in English | MEDLINE | ID: mdl-29356321

ABSTRACT

Reconciling different underlying ontologies and explanatory contexts has been one of the main challenges and impediments for theory integration in biology. Here, we analyze the challenge of developing an inclusive and integrative theory of phenotypic evolution as an example for the broader challenge of developing a theory of theory integration within the life sciences and suggest a number of necessary formal steps toward the resolution of often incompatible (hidden) assumptions. Theory integration in biology requires a better formal understanding of the structure of biological theories The strategy for integrating theories crucially depends on the relationships of the underlying ontologies.


Subject(s)
Biological Evolution , Models, Biological , Animals , Informatics , Logic
3.
J Math Biol ; 77(2): 313-341, 2018 08.
Article in English | MEDLINE | ID: mdl-29260295

ABSTRACT

Clusters of paralogous genes such as the famous HOX cluster of developmental transcription factors tend to evolve by stepwise duplication of its members, often involving unequal crossing over. Gene conversion and possibly other mechanisms of concerted evolution further obfuscate the phylogenetic relationships. As a consequence, it is very difficult or even impossible to disentangle the detailed history of gene duplications in gene clusters. In this contribution we show that the expansion of gene clusters by unequal crossing over as proposed by Walter Gehring leads to distinctive patterns of genetic distances, namely a subclass of circular split systems. Furthermore, when the gene cluster was left undisturbed by genome rearrangements, the shortest Hamiltonian paths with respect to genetic distances coincide with the genomic order. This observation can be used to detect ancient genomic rearrangements of gene clusters and to distinguish gene clusters whose evolution was dominated by unequal crossing over within genes from those that expanded through other mechanisms.


Subject(s)
Models, Genetic , Multigene Family , Alcohol Dehydrogenase/genetics , Algorithms , Animals , Computer Simulation , Crossing Over, Genetic , Evolution, Molecular , Gene Duplication , Genes, Homeobox , Genome , Humans , Mathematical Concepts , Phylogeny , Recombination, Genetic
4.
Ecol Lett ; 20(12): 1576-1590, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29027325

ABSTRACT

Growing evidence shows that epigenetic mechanisms contribute to complex traits, with implications across many fields of biology. In plant ecology, recent studies have attempted to merge ecological experiments with epigenetic analyses to elucidate the contribution of epigenetics to plant phenotypes, stress responses, adaptation to habitat, and range distributions. While there has been some progress in revealing the role of epigenetics in ecological processes, studies with non-model species have so far been limited to describing broad patterns based on anonymous markers of DNA methylation. In contrast, studies with model species have benefited from powerful genomic resources, which contribute to a more mechanistic understanding but have limited ecological realism. Understanding the significance of epigenetics for plant ecology requires increased transfer of knowledge and methods from model species research to genomes of evolutionarily divergent species, and examination of responses to complex natural environments at a more mechanistic level. This requires transforming genomics tools specifically for studying non-model species, which is challenging given the large and often polyploid genomes of plants. Collaboration among molecular geneticists, ecologists and bioinformaticians promises to enhance our understanding of the mutual links between genome function and ecological processes.


Subject(s)
Ecology , Epigenesis, Genetic , Plants , DNA Methylation , Ecosystem
5.
BMC Evol Biol ; 17(1): 163, 2017 07 06.
Article in English | MEDLINE | ID: mdl-28683816

ABSTRACT

BACKGROUND: The cytosolic arrestin proteins mediate desensitization of activated G protein-coupled receptors (GPCRs) via competition with G proteins for the active phosphorylated receptors. Arrestins in active, including receptor-bound, conformation are also transducers of signaling. Therefore, this protein family is an attractive therapeutic target. The signaling outcome is believed to be a result of structural and sequence-dependent interactions of arrestins with GPCRs and other protein partners. Here we elucidated the detailed evolution of arrestins in deuterostomes. RESULTS: Identity and number of arrestin paralogs were determined searching deuterostome genomes and gene expression data. In contrast to standard gene prediction methods, our strategy first detects exons situated on different scaffolds and then solves the problem of assigning them to the correct gene. This increases both the completeness and the accuracy of the annotation in comparison to conventional database search strategies applied by the community. The employed strategy enabled us to map in detail the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes. The two rounds of whole genome duplications in the vertebrate stem lineage gave rise to four arrestin paralogs. Surprisingly, visual arrestin ARR3 was lost in the mammalian clades Afrotheria and Xenarthra. Duplications in specific clades, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. CONCLUSION: The current study traces the functional evolution of deuterostome arrestins in unprecedented detail. Based on a precise re-annotation of the exon-intron structure at nucleotide resolution, we infer the gain and loss of paralogs and patterns of conservation, co-variation and selection.


Subject(s)
Arrestins/genetics , Evolution, Molecular , Animals , Humans , Phosphorylation , Protein Binding , Signal Transduction
6.
Theory Biosci ; 135(4): 231-240, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27864730

ABSTRACT

A critical feature of all cellular processes is the ability to control the rate of gene or protein expression and metabolic flux in changing environments through regulatory feedback. We review the many ways that regulation is represented through causal, logical, and dynamical components. Formalizing the nature of these components promotes effective comparison among distinct regulatory networks and provides a common framework for the potential design and control of regulatory systems in synthetic biology.


Subject(s)
Gene Regulatory Networks , Models, Genetic , Synthetic Biology/methods , Systems Biology/methods , Cell Cycle , Computer Simulation , Escherichia coli/genetics , Feedback , Lac Operon/genetics
7.
Algorithms Mol Biol ; 11: 1, 2016.
Article in English | MEDLINE | ID: mdl-26913054

ABSTRACT

BACKGROUND: The accurate annotation of genes in newly sequenced genomes remains a challenge. Although sophisticated comparative pipelines are available, computationally derived gene models are often less than perfect. This is particularly true when multiple similar paralogs are present. The issue is aggravated further when genomes are assembled only at a preliminary draft level to contigs or short scaffolds. However, these genomes deliver valuable information for studying gene families. High accuracy models of protein coding genes are needed in particular for phylogenetics and for the analysis of gene family histories. RESULTS: We present a pipeline, ExonMatchSolver, that is designed to help the user to produce and curate high quality models of the protein-coding part of genes. The tool in particular tackles the problem of identifying those coding exon groups that belong to the same paralogous genes in a fragmented genome assembly. This paralog-to-contig assignment problem is shown to be NP-complete. It is phrased and solved as an Integer Linear Programming problem. CONCLUSIONS: The ExonMatchSolver-pipeline can be employed to build highly accurate models of protein coding genes even when spanning several genomic fragments. This sets the stage for a better understanding of the evolutionary history within particular gene families which possess a large number of paralogs and in which frequent gene duplication events occurred.

8.
BMC Bioinformatics ; 16 Suppl 19: S2, 2015.
Article in English | MEDLINE | ID: mdl-26695390

ABSTRACT

BACKGROUND: Dynamic programming algorithms provide exact solutions to many problems in computational biology, such as sequence alignment, RNA folding, hidden Markov models (HMMs), and scoring of phylogenetic trees. Structurally analogous algorithms compute optimal solutions, evaluate score distributions, and perform stochastic sampling. This is explained in the theory of Algebraic Dynamic Programming (ADP) by a strict separation of state space traversal (usually represented by a context free grammar), scoring (encoded as an algebra), and choice rule. A key ingredient in this theory is the use of yield parsers that operate on the ordered input data structure, usually strings or ordered trees. The computation of ensemble properties, such as a posteriori probabilities of HMMs or partition functions in RNA folding, requires the combination of two distinct, but intimately related algorithms, known as the inside and the outside recursion. Only the inside recursions are covered by the classical ADP theory. RESULTS: The ideas of ADP are generalized to a much wider scope of data structures by relaxing the concept of parsing. This allows us to formalize the conceptual complementarity of inside and outside variables in a natural way. We demonstrate that outside recursions are generically derivable from inside decomposition schemes. In addition to rephrasing the well-known algorithms for HMMs, pairwise sequence alignment, and RNA folding we show how the TSP and the shortest Hamiltonian path problem can be implemented efficiently in the extended ADP framework. As a showcase application we investigate the ancient evolution of HOX gene clusters in terms of shortest Hamiltonian paths. CONCLUSIONS: The generalized ADP framework presented here greatly facilitates the development and implementation of dynamic programming algorithms for a wide spectrum of applications.


Subject(s)
Computational Biology/methods , Databases, Genetic , Algorithms , Genes, Homeobox , Markov Chains , Multigene Family , Probability , RNA Folding , Sequence Alignment , Software
9.
Theory Biosci ; 134(3-4): 143-7, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26449352

ABSTRACT

Function is a central concept in biological theories and explanations. Yet discussions about function are often based on a narrow understanding of biological systems and processes, such as idealized molecular systems or simple evolutionary, i.e., selective, dynamics. Conflicting conceptions of function continue to be used in the scientific literature to support certain claims, for instance about the fraction of "functional DNA" in the human genome. Here we argue that all biologically meaningful interpretations of function are necessarily context dependent. This implies that they derive their meaning as well as their range of applicability only within a specific theoretical and measurement context. We use this framework to shed light on the current debate about functional DNA and argue that without considering explicitly the theoretical and measurement contexts all attempts to integrate biological theories are prone to fail.


Subject(s)
DNA/physiology , Genome/physiology , Models, Biological , Animals , Humans
10.
Genome Biol ; 16: 147, 2015 Jul 23.
Article in English | MEDLINE | ID: mdl-26201466

ABSTRACT

BACKGROUND: Kiwi, comprising five species from the genus Apteryx, are endangered, ground-dwelling bird species endemic to New Zealand. They are the smallest and only nocturnal representatives of the ratites. The timing of kiwi adaptation to a nocturnal niche and the genomic innovations, which shaped sensory systems and morphology to allow this adaptation, are not yet fully understood. RESULTS: We sequenced and assembled the brown kiwi genome to 150-fold coverage and annotated the genome using kiwi transcript data and non-redundant protein information from multiple bird species. We identified evolutionary sequence changes that underlie adaptation to nocturnality and estimated the onset time of these adaptations. Several opsin genes involved in color vision are inactivated in the kiwi. We date this inactivation to the Oligocene epoch, likely after the arrival of the ancestor of modern kiwi in New Zealand. Genome comparisons between kiwi and representatives of ratites, Galloanserae, and Neoaves, including nocturnal and song birds, show diversification of kiwi's odorant receptors repertoire, which may reflect an increased reliance on olfaction rather than sight during foraging. Further, there is an enrichment of genes influencing mitochondrial function and energy expenditure among genes that are rapidly evolving specifically on the kiwi branch, which may also be linked to its nocturnal lifestyle. CONCLUSIONS: The genomic changes in kiwi vision and olfaction are consistent with changes that are hypothesized to occur during adaptation to nocturnal lifestyle in mammals. The kiwi genome provides a valuable genomic resource for future genome-wide comparative analyses to other extinct and extant diurnal ratites.


Subject(s)
Adaptation, Biological/genetics , Darkness , Evolution, Molecular , Genome , Palaeognathae/genetics , Animals , Genomics , Molecular Sequence Annotation , Molecular Sequence Data , Multigene Family , Palaeognathae/anatomy & histology , Selection, Genetic , Smell/genetics , Vision, Ocular/genetics
11.
Nucleic Acids Res ; 43(14): 6739-46, 2015 Aug 18.
Article in English | MEDLINE | ID: mdl-26117543

ABSTRACT

Transfer RNAs (tRNAs) require the absolutely conserved sequence motif CCA at their 3'-ends, representing the site of aminoacylation. In the majority of organisms, this trinucleotide sequence is not encoded in the genome and thus has to be added post-transcriptionally by the CCA-adding enzyme, a specialized nucleotidyltransferase. In eukaryotic genomes this ubiquitous and highly conserved enzyme family is usually represented by a single gene copy. Analysis of published sequence data allows us to pin down the unusual evolution of eukaryotic CCA-adding enzymes. We show that the CCA-adding enzymes of animals originated from a horizontal gene transfer event in the stem lineage of Holozoa, i.e. Metazoa (animals) and their unicellular relatives, the Choanozoa. The tRNA nucleotidyltransferase, acquired from an α-proteobacterium, replaced the ancestral enzyme in Metazoa. However, in Choanoflagellata, the group of Choanozoa that is closest to Metazoa, both the ancestral and the horizontally transferred CCA-adding enzymes have survived. Furthermore, our data refute a mitochondrial origin of the animal tRNA nucleotidyltransferases.


Subject(s)
Alphaproteobacteria/genetics , Evolution, Molecular , Gene Transfer, Horizontal , RNA Nucleotidyltransferases/genetics , Alphaproteobacteria/classification , Animals , Choanoflagellata/genetics , Eukaryota/classification , Eukaryota/genetics , Phylogeny
12.
PLoS One ; 9(8): e105015, 2014.
Article in English | MEDLINE | ID: mdl-25137074

ABSTRACT

The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.


Subject(s)
Models, Genetic , Software , Synteny , Bacterial Proteins/genetics , Cluster Analysis , Computer Simulation , Datasets as Topic , Genes, Bacterial
13.
Nucleic Acids Res ; 42(16): 10331-50, 2014.
Article in English | MEDLINE | ID: mdl-25106871

ABSTRACT

The cell cycle genes homology region (CHR) has been identified as a DNA element with an important role in transcriptional regulation of late cell cycle genes. It has been shown that such genes are controlled by DREAM, MMB and FOXM1-MuvB and that these protein complexes can contact DNA via CHR sites. However, it has not been elucidated which sequence variations of the canonical CHR are functional and how frequent CHR-based regulation is utilized in mammalian genomes. Here, we define the spectrum of functional CHR elements. As the basis for a computational meta-analysis, we identify new CHR sequences and compile phylogenetic motif conservation as well as genome-wide protein-DNA binding and gene expression data. We identify CHR elements in most late cell cycle genes binding DREAM, MMB, or FOXM1-MuvB. In contrast, Myb- and forkhead-binding sites are underrepresented in both early and late cell cycle genes. Our findings support a general mechanism: sequential binding of DREAM, MMB and FOXM1-MuvB complexes to late cell cycle genes requires CHR elements. Taken together, we define the group of CHR-regulated genes in mammalian genomes and provide evidence that the CHR is the central promoter element in transcriptional regulation of late cell cycle genes by DREAM, MMB and FOXM1-MuvB.


Subject(s)
Cell Cycle/genetics , DNA-Binding Proteins/metabolism , Gene Expression Regulation , Genes, cdc , Promoter Regions, Genetic , Transcription Factors/metabolism , Animals , Binding Sites , Cell Division/genetics , Cell Line , Forkhead Box Protein M1 , Forkhead Transcription Factors/metabolism , G2 Phase/genetics , Genome , Humans , Mice , NIH 3T3 Cells , Proto-Oncogene Proteins c-myb/metabolism , Repressor Proteins/metabolism , Transcription, Genetic
14.
J Theor Biol ; 336: 61-74, 2013 Nov 07.
Article in English | MEDLINE | ID: mdl-23880640

ABSTRACT

Eukaryotic histones carry a diverse set of specific chemical modifications that accumulate over the life-time of a cell and have a crucial impact on the cell state in general and the transcriptional program in particular. Replication constitutes a dramatic disruption of the chromatin states that effectively amounts to partial erasure of stored information. To preserve its epigenetic state the cell reconstructs (at least part of) the histone modifications by means of processes that are still very poorly understood. A plausible hypothesis is that the different combinations of reader and writer domains in histone-modifying enzymes implement local rewriting rules that are capable of "recomputing" the desired parental modification patterns on the basis of the partial information contained in that half of the nucleosomes that predate replication. To test whether such a mechanism is theoretically feasible, we have developed a flexible stochastic simulation system (available at http://www.bioinf.uni-leipzig.de/Software/StoChDyn) for studying the dynamics of histone modification states. The implementation is based on Gillespie's approach, i.e., it models the master equation of a detailed chemical model. It is efficient enough to use an evolutionary algorithm to find patterns across multiple cell divisions with high accuracy. We found that it is easy to evolve a system of enzymes that can maintain a particular chromatin state roughly stable, even without explicit boundary elements separating differentially modified chromatin domains. However, the success of this task depends on several previously unanticipated factors, such as the length of the initial state, the specific pattern that should be maintained, the time between replications, and chemical parameters such as enzymatic binding and dissociation rates. All these factors also influence the accumulation of errors in the wake of cell divisions.


Subject(s)
Chromatin/genetics , Epigenesis, Genetic , Inheritance Patterns/genetics , Algorithms , Computer Simulation , Evolution, Molecular , Genetic Fitness , Models, Biological , Nucleosomes/metabolism , Stochastic Processes
15.
PLoS One ; 7(10): e46811, 2012.
Article in English | MEDLINE | ID: mdl-23077526

ABSTRACT

Current genome-wide ChIP-seq experiments on different epigenetic marks aim at unraveling the interplay between their regulation mechanisms. Published evaluation tools, however, allow testing for predefined hypotheses only. Here, we present a novel method for annotation-independent exploration of epigenetic data and their inter-correlation with other genome-wide features. Our method is based on a combinatorial genome segmentation solely using information on combinations of epigenetic marks. It does not require prior knowledge about the data (e.g. gene positions), but allows integrating the data in a straightforward manner. Thereby, it combines compression, clustering and visualization of the data in a single tool. Our method provides intuitive maps of epigenetic patterns across multiple levels of organization, e.g. of the co-occurrence of different epigenetic marks in different cell types. Thus, it facilitates the formulation of new hypotheses on the principles of epigenetic regulation. We apply our method to histone modification data on trimethylation of histone H3 at lysine 4, 9 and 27 in multi-potent and lineage-primed mouse cells, analyzing their combinatorial modification pattern as well as differentiation-related changes of single modifications. We demonstrate that our method is capable of reproducing recent findings of gene centered approaches, e.g. correlations between CpG-density and the analyzed histone modifications. Moreover, combining the clustered epigenetic data with information on the expression status of associated genes we classify differences in epigenetic status of e.g. house-keeping genes versus differentiation-related genes. Visualizing the distribution of modification states on the chromosomes, we discover strong patterns for chromosome X. For example, exclusively H3K9me3 marked segments are enriched, while poised and active states are rare. Hence, our method also provides new insights into chromosome-specific epigenetic patterns, opening up new questions how "epigenetic computation" is distributed over the genome in space and time.


Subject(s)
Epigenesis, Genetic , Epigenomics/methods , Histones/genetics , Algorithms , Animals , Cell Differentiation , Cell Lineage , DNA Methylation , Genes, Essential , Genome , Mice
16.
Biochimie ; 93(11): 2019-23, 2011 Nov.
Article in English | MEDLINE | ID: mdl-21835221

ABSTRACT

Functional RNA elements can be embedded also within exonic sequences coding for functional proteins. While not uncommon in viruses, only a few examples of this type have been described in some detail for eukaryotic genomes. Here we use RNAz and RNAcode, two comparative genomics methods that measure signatures of stabilizing selection acting on RNA secondary structure and peptide sequence, resp., to survey the fruit fly genomes. We estimate that there might be on the order of 1000 loci that are subject to dual selection pressure. The used genome-wide screens also expose the limitations of the currently available methods.


Subject(s)
Drosophilidae/genetics , Evolution, Molecular , Nucleic Acid Conformation , Open Reading Frames/genetics , RNA, Messenger/chemistry , Animals , Computational Biology , Conserved Sequence , Drosophila melanogaster/genetics , Exons/genetics , Genome, Insect , Introns/genetics , RNA, Messenger/genetics , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , Selection, Genetic , Untranslated Regions/genetics
17.
J Exp Zool B Mol Dev Evol ; 316(6): 451-64, 2011 Sep 15.
Article in English | MEDLINE | ID: mdl-21688387

ABSTRACT

Teleost fishes have extra Hox gene clusters owing to shared or lineage-specific genome duplication events in rayfinned fish (actinopterygian) phylogeny. Hence, extrapolating between genome function of teleosts and human or even between different fish species is difficult. We have sequenced and analyzed Hox gene clusters of the Senegal bichir (Polypterus senegalus), an extant representative of the most basal actinopterygian lineage. Bichir possesses four Hox gene clusters (A, B, C, D); phylogenetic analysis supports their orthology to the four Hox gene clusters of the gnathostome ancestor. We have generated a comprehensive database of conserved Hox noncoding sequences that include cartilaginous, lobe-finned, and ray-finned fishes (bichir and teleosts). Our analysis identified putative and known Hox cis-regulatory sequences with differing depths of conservation in Gnathostoma. We found that although bichir possesses four Hox gene clusters, its pattern of conservation of noncoding sequences is mosaic between outgroups, such as human, coelacanth, and shark, with four Hox gene clusters and teleosts, such as zebrafish and pufferfish, with seven or eight Hox gene clusters. Notably, bichir Hox gene clusters have been invaded by DNA transposons and this trend is further exemplified in teleosts, suggesting an as yet unrecognized mechanism of genome evolution that may explain Hox cluster plasticity in actinopterygians. Taken together, our results suggest that actinopterygian Hox gene clusters experienced a reduction in selective constraints that surprisingly predates the teleost-specific genome duplication.


Subject(s)
Evolution, Molecular , Fishes/genetics , Gene Duplication/genetics , Homeodomain Proteins/genetics , Models, Genetic , Multigene Family/genetics , Phylogeny , Animals , Genes, Homeobox , Genome , Humans
18.
BMC Bioinformatics ; 12: 124, 2011 Apr 28.
Article in English | MEDLINE | ID: mdl-21526987

ABSTRACT

BACKGROUND: Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. RESULTS: The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. CONCLUSIONS: Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.


Subject(s)
Genomics/methods , Phylogeny , Sequence Alignment/methods , Software , Base Sequence , Databases, Genetic
19.
Methods Mol Biol ; 719: 173-96, 2011.
Article in English | MEDLINE | ID: mdl-21370084

ABSTRACT

The diverse fields of Omics research share a common logical structure combining a cataloging effort for a particular class of molecules or interactions, the underlying -ome, and a quantitative aspect attempting to record spatiotemporal patterns of concentration, expression, or variation. Consequently, these fields also share a common set of difficulties and limitations. In spite of the great success stories of Omics projects over the last decade, much remains to be understood not only at the technological, but also at the conceptual level. Here, we focus on the dark corners of Omics research, where the problems, limitations, conceptual difficulties, and lack of knowledge are hidden.


Subject(s)
Computational Biology/methods , Terminology as Topic , Animals , Data Interpretation, Statistical , Humans , Information Management , Research Design , Systems Integration
20.
J Theor Biol ; 276(1): 269-76, 2011 May 07.
Article in English | MEDLINE | ID: mdl-21315730

ABSTRACT

Scientific theories seek to provide simple explanations for significant empirical regularities based on fundamental physical and mechanistic constraints. Biological theories have rarely reached a level of generality and predictive power comparable to physical theories. This discrepancy is explained through a combination of frozen accidents, environmental heterogeneity, and widespread non-linearities observed in adaptive processes. At the same time, model building has proven to be very successful when it comes to explaining and predicting the behavior of particular biological systems. In this respect biology resembles alternative model-rich frameworks, such as economics and engineering. In this paper we explore the prospects for general theories in biology, and suggest that these take inspiration not only from physics, but also from the information sciences. Future theoretical biology is likely to represent a hybrid of parsimonious reasoning and algorithmic or rule-based explanation. An open question is whether these new frameworks will remain transparent to human reason. In this context, we discuss the role of machine learning in the early stages of scientific discovery. We argue that evolutionary history is not only a source of uncertainty, but also provides the basis, through conserved traits, for very general explanations for biological regularities, and the prospect of unified theories of life.


Subject(s)
Biology , Models, Biological , Animals , Biological Evolution , Humans , Language
SELECTION OF CITATIONS
SEARCH DETAIL
...