Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 116(9): 3636-3645, 2019 02 26.
Artigo em Inglês | MEDLINE | ID: mdl-30733291

RESUMO

From an abstract, informational perspective, protein domains appear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. We employ a popular linguistic technique, n-gram analysis, to probe the "proteome grammar"-that is, the rules of association of domains that generate various domain architectures of proteins. Comparison of the complexity measures of "protein languages" in major branches of life shows that the relative entropy difference (information gain) between the observed domain architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are observed in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. We also identify the n-grams that represent signatures of the major branches of cellular life. The results of this analysis bolster the analogy between genomes and natural language and show that a "quasi-universal grammar" underlies the evolution of domain architectures in all divisions of cellular life. The nearly universal value of information gain by the domain architectures could reflect the minimum complexity of signal processing that is required to maintain a functioning cell.


Assuntos
Evolução Molecular , Domínios Proteicos/genética , Estrutura Terciária de Proteína , Proteoma/química , Archaea/química , Archaea/genética , Feminino , Humanos , Linguística , Masculino , Filogenia , Proteoma/genética
2.
Bioinformatics ; 33(4): 547-548, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27797761

RESUMO

Summary: The move of computational genomics workflows to Cloud Computing platforms is associated with a new level of integration and interoperability that challenges existing data representation formats. The Variant Calling Format (VCF) is in a particularly sensitive position in that regard, with both clinical and consumer-facing analysis tools relying on this self-contained description of genomic variation in Next Generation Sequencing (NGS) results. In this report we identify an isomorphic map between VCF and the reference Resource Description Framework. RDF is advanced by the World Wide Web Consortium (W3C) to enable representations of linked data that are both distributed and discoverable. The resulting ability to decompose VCF reports of genomic variation without loss of context addresses the need to modularize and govern NGS pipelines for Precision Medicine. Specifically, it provides the flexibility (i.e. the indexing) needed to support the wide variety of clinical scenarios and patient-facing governance where only part of the VCF data is fitting. Availability and Implementation: Software libraries with a claim to be both domain-facing and consumer-facing have to pass the test of portability across the variety of devices that those consumers in fact adopt. That is, ideally the implementation should itself take place within the space defined by web technologies. Consequently, the isomorphic mapping function was implemented in JavaScript, and was tested in a variety of environments and devices, client and server side alike. These range from web browsers in mobile phones to the most popular micro service platform, NodeJS. The code is publicly available at https://github.com/ibl/VCFr , with a live deployment at: http://ibl.github.io/VCFr/ . Contact: jonas.almeida@stonybrookmedicine.edu.


Assuntos
Variação Genética , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Genômica/métodos , Humanos , Armazenamento e Recuperação da Informação , Semântica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...