Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
PeerJ ; 12: e16804, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38313028

RESUMO

Once thought to be a unique capability of the Langerhans islets in the pancreas of mammals, insulin (INS) signaling is now recognized as an evolutionarily ancient function going back to prokaryotes. INS is ubiquitously present not only in humans but also in unicellular eukaryotes, fungi, worms, and Drosophila. Remote homologue identification also supports the presence of INS and INS receptor in corals where the availability of glucose is largely dependent on the photosynthetic activity of the symbiotic algae. The cnidarian animal host of corals operates together with a 20,000-sized microbiome, in direct analogy to the human gut microbiome. In humans, aberrant INS signaling is the hallmark of metabolic disease, and is thought to play a major role in aging, and age-related diseases, such as Alzheimer's disease. We here would like to argue that a broader view of INS beyond its human homeostasis function may help us understand other organisms, and in turn, studying those non-model organisms may enable a novel view of the human INS signaling system. To this end, we here review INS signaling from a new angle, by drawing analogies between humans and corals at the molecular level.


Assuntos
Antozoários , Ilhotas Pancreáticas , Animais , Humanos , Antozoários/metabolismo , Insulina/metabolismo , Ilhotas Pancreáticas/metabolismo , Pâncreas/metabolismo , Transdução de Sinais
2.
iScience ; 26(3): 106238, 2023 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-36926651

RESUMO

RNA splicing dysfunctions are more widespread than what is believed by only estimating the effects resulting by splicing factor mutations (SFMT) in myeloid neoplasia (MN). The genetic complexity of MN is amenable to machine learning (ML) strategies. We applied an integrative ML approach to identify co-varying features by combining genomic lesions (mutations, deletions, and copy number), exon-inclusion ratio as measure of RNA splicing (percent spliced in, PSI), and gene expression (GE) of 1,258 MN and 63 normal controls. We identified 15 clusters based on mutations, GE, and PSI. Different PSI levels were present at various extents regardless of SFMT suggesting that changes in RNA splicing were not strictly related to SFMT. Combination of PSI and GE further distinguished the features and identified PSI similarities and differences, common pathways, and expression signatures across clusters. Thus, multimodal features can resolve the complex architecture of MN and help identifying convergent molecular and transcriptomic pathways amenable to therapies.

3.
PLoS One ; 18(2): e0270965, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36735673

RESUMO

With the ease of gene sequencing and the technology available to study and manipulate non-model organisms, the extension of the methodological toolbox required to translate our understanding of model organisms to non-model organisms has become an urgent problem. For example, mining of large coral and their symbiont sequence data is a challenge, but also provides an opportunity for understanding functionality and evolution of these and other non-model organisms. Much more information than for any other eukaryotic species is available for humans, especially related to signal transduction and diseases. However, the coral cnidarian host and human have diverged over 700 million years ago and homologies between proteins in the two species are therefore often in the gray zone, or at least often undetectable with traditional BLAST searches. We introduce a two-stage approach to identifying putative coral homologues of human proteins. First, through remote homology detection using Hidden Markov Models, we identify candidate human homologues in the cnidarian genome. However, for many proteins, the human genome alone contains multiple family members with similar or even more divergence in sequence. In the second stage, therefore, we filter the remote homology results based on the functional and structural plausibility of each coral candidate, shortlisting the coral proteins likely to have conserved some of the functions of the human proteins. We demonstrate our approach with a pipeline for mapping membrane receptors in humans to membrane receptors in corals, with specific focus on the stony coral, P. damicornis. More than 1000 human membrane receptors mapped to 335 coral receptors, including 151 G protein coupled receptors (GPCRs). To validate specific sub-families, we chose opsin proteins, representative GPCRs that confer light sensitivity, and Toll-like receptors, representative non-GPCRs, which function in the immune response, and their ability to communicate with microorganisms. Through detailed structure-function analysis of their ligand-binding pockets and downstream signaling cascades, we selected those candidate remote homologues likely to carry out related functions in the corals. This pipeline may prove generally useful for other non-model organisms, such as to support the growing field of synthetic biology.


Assuntos
Antozoários , Receptores Acoplados a Proteínas G , Transdução de Sinais , Animais , Humanos , Antozoários/genética , Antozoários/fisiologia , Genoma , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Modelos Animais
4.
Database (Oxford) ; 20222022 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-35976727

RESUMO

Reproducibility of research is essential for science. However, in the way modern computational biology research is done, it is easy to lose track of small, but extremely critical, details. Key details, such as the specific version of a software used or iteration of a genome can easily be lost in the shuffle or perhaps not noted at all. Much work is being done on the database and storage side of things, ensuring that there exists a space-to-store experiment-specific details, but current mechanisms for recording details are cumbersome for scientists to use. We propose a new metadata description language, named MEtaData Format for Open Reef Data (MEDFORD), in which scientists can record all details relevant to their research. Being human-readable, easily editable and templatable, MEDFORD serves as a collection point for all notes that a researcher could find relevant to their research, be it for internal use or for future replication. MEDFORD has been applied to coral research, documenting research from RNA-seq analyses to photo collections.


Assuntos
Idioma , Metadados , Biologia Computacional , Humanos , Reprodutibilidade dos Testes , Software
5.
Metabol Open ; 12: 100133, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34693240

RESUMO

BACKGROUND: Exercise-induced muscle damage (EIMD) commonly occurs following intense resistance exercise and is associated with decrements in exercise performance and delayed muscle recovery. Thus, practical methods to attenuate EIMD would prove useful to both training and athletic populations. Omega-3 (n-3) supplementation has been shown to mitigate EIMD with evidence of increasing efficacy at higher doses (up to 6 g/day). However, data of its efficacy in trained individuals is limited. Therefore, this study investigated the effects of 6 and 8 g of n-3 supplementation on markers of muscle damage and muscle recovery after eccentric resistance exercise in resistance-trained males. METHODS: Using a double-blind, randomized, placebo-controlled design, 26 resistance trained males (23 ± 4 years; 173.6 ± 20.5 cm; 81.9 ± 9.7 kg; 14.2 ± 3.7% body fat) supplemented with 6 (n=10) or 8 g (n=7) of n-3 polyunsaturated fatty acids, or placebo (n=9) for 33 days. On day 30, participants performed a lower body muscle-damaging eccentric resistance exercise bout. Measures of muscle performance, soreness, and damage were taken pre-exercise on day 30 as well as on days 31-33, including vertical jump height (VJH), perceived muscle soreness (PMS), hip and knee range of motion (ROM), repetitions to fatigue (RTF) at 70% 1-RM, and serum creatine kinase (CK) while participants continued to supplement until day 33. RESULTS: There were significant differences in VJH, PMS, and serum CK following the muscle-damaging exercise bout compared to pre-exercise (p<0.05). However, there were no significant (p>0.05) differences between supplementation groups (6 g, 8 g, and placebo) at any time point post-exercise (day 31-33). There were no changes in hip and knee ROM or RTF at any time point or between groups. Vertical jump height and PMS returned to pre-exercise levels despite CK remaining elevated post-exercise. CONCLUSIONS: Thirty-three days of 6 and 8 g of n-3 supplementation did not attenuate EIMD or enhance muscle recovery following muscle-damaging eccentric resistance exercise in resistance-trained males. Further research using various n-3 supplementation durations, doses, and eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) concentrations may be needed to establish its efficacy in attenuating EIMD, which may vary between trained and untrained individuals. Furthermore, while circulating CK is commonly used to assess muscle damage, elevated CK levels may not reflect muscle recovery status following muscle-damaging exercise.

6.
Cell Stem Cell ; 28(11): 1966-1981.e6, 2021 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-34473945

RESUMO

DDX41 mutations are the most common germline alterations in adult myelodysplastic syndromes (MDSs). The majority of affected individuals harbor germline monoallelic frameshift DDX41 mutations and subsequently acquire somatic mutations in their other DDX41 allele, typically missense R525H. Hematopoietic progenitor cells (HPCs) with biallelic frameshift and R525H mutations undergo cell cycle arrest and apoptosis, causing bone marrow failure in mice. Mechanistically, DDX41 is essential for small nucleolar RNA (snoRNA) processing, ribosome assembly, and protein synthesis. Although monoallelic DDX41 mutations do not affect hematopoiesis in young mice, a subset of aged mice develops features of MDS. Biallelic mutations in DDX41 are observed at a low frequency in non-dominant hematopoietic stem cell clones in bone marrow (BM) from individuals with MDS. Mice chimeric for monoallelic DDX41 mutant BM cells and a minor population of biallelic mutant BM cells develop hematopoietic defects at a younger age, suggesting that biallelic DDX41 mutant cells are disease modifying in the context of monoallelic DDX41 mutant BM.


Assuntos
RNA Helicases DEAD-box , Síndromes Mielodisplásicas , Animais , RNA Helicases DEAD-box/genética , Células Germinativas , Hematopoese/genética , Camundongos , Mutação/genética , Síndromes Mielodisplásicas/genética
7.
Cell Rep ; 35(2): 108989, 2021 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33852859

RESUMO

Vertebrates have evolved three paralogs, termed LUC7L, LUC7L2, and LUC7L3, of the essential yeast U1 small nuclear RNA (snRNA)-associated splicing factor Luc7p. We investigated the mechanistic and regulatory functions of these putative splicing factors, of which one (LUC7L2) is mutated or deleted in myeloid neoplasms. Protein interaction data show that all three proteins bind similar core but distinct regulatory splicing factors, probably mediated through their divergent arginine-serine-rich domains, which are not present in Luc7p. Knockdown of each factor reveals mostly unique sets of significantly dysregulated alternative splicing events dependent on their binding locations, which are largely non-overlapping. Notably, knockdown of LUC7L2 alone significantly upregulates the expression of multiple spliceosomal factors and downregulates glycolysis genes, possibly contributing to disease pathogenesis. RNA binding studies reveal that LUC7L2 and LUC7L3 crosslink to weak 5' splice sites and to the 5' end of U1 snRNA, establishing an evolutionarily conserved role in 5' splice site selection.


Assuntos
Leucemia Mieloide/genética , Síndromes Mielodisplásicas/genética , Proteínas Nucleares/genética , Splicing de RNA , Proteínas de Ligação a RNA/genética , Sequência de Bases , Éxons , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Íntrons , Leucemia Mieloide/metabolismo , Leucemia Mieloide/patologia , Mutação , Síndromes Mielodisplásicas/metabolismo , Síndromes Mielodisplásicas/patologia , Proteínas Nucleares/metabolismo , RNA Nuclear Pequeno/genética , RNA Nuclear Pequeno/metabolismo , Proteínas de Ligação a RNA/metabolismo , Ribonucleoproteína Nuclear Pequena U1/genética , Ribonucleoproteína Nuclear Pequena U1/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Transdução de Sinais , Spliceossomos
8.
Dev Cell ; 56(5): 627-640.e5, 2021 03 08.
Artigo em Inglês | MEDLINE | ID: mdl-33651979

RESUMO

Hematopoietic stem and progenitor cells (HSPCs) arise during embryonic development and are essential for sustaining the blood and immune systems throughout life. Tight regulation of HSPC numbers is critical for hematopoietic homeostasis. Here, we identified DEAD-box helicase 41 (Ddx41) as a gatekeeper of HSPC production. Using zebrafish ddx41 mutants, we unveiled a critical role for this helicase in regulating HSPC production at the endothelial-to-hematopoietic transition. We determined that Ddx41 suppresses the accumulation of R-loops, nucleic acid structures consisting of RNA:DNA hybrids and ssDNAs whose equilibrium is essential for cellular fitness. Excess R-loop levels in ddx41 mutants triggered the cGAS-STING inflammatory pathway leading to increased numbers of hemogenic endothelium and HSPCs. Elevated R-loop accumulation and inflammatory signaling were observed in human cells with decreased DDX41, suggesting possible conservation of mechanism. These findings delineate that precise regulation of R-loop levels during development is critical for limiting cGAS-STING activity and HSPC numbers.


Assuntos
Embrião não Mamífero/citologia , Células-Tronco Hematopoéticas/citologia , Estruturas R-Loop , Proteínas de Peixe-Zebra/metabolismo , Animais , Animais Geneticamente Modificados , Diferenciação Celular , RNA Helicases DEAD-box/genética , RNA Helicases DEAD-box/metabolismo , Embrião não Mamífero/metabolismo , Células-Tronco Hematopoéticas/metabolismo , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Nucleotidiltransferases/genética , Nucleotidiltransferases/metabolismo , Transdução de Sinais , Peixe-Zebra , Proteínas de Peixe-Zebra/genética
9.
Best Pract Res Clin Haematol ; 33(3): 101199, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-33038983

RESUMO

Somatic, heterozygous missense and nonsense mutations in at least seven proteins that function in the spliceosome are found at high frequency in MDS patients. These proteins act at various steps in the process of splicing by the spliceosome and lead to characteristic alterations in the alternative splicing of a subset of genes. Several studies have investigated the effects of these mutations and have attempted to identify a commonly affected gene or pathway. Here, we summarize what is known about the normal function of these proteins and how the mutations alter the splicing landscape of the genome. We also summarize the commonly mis-spliced gene targets and discuss the state of mechanistic unification that has been achieved. Finally, we discuss alternative mechanisms by which these mutations may lead to disease.


Assuntos
Mutação , Fatores de Processamento de RNA , Splicing de RNA/genética , Spliceossomos , Humanos , Síndromes Mielodisplásicas/genética , Síndromes Mielodisplásicas/metabolismo , Fatores de Processamento de RNA/genética , Fatores de Processamento de RNA/metabolismo , Spliceossomos/genética , Spliceossomos/metabolismo
10.
Emerg Infect Dis ; 22(5): 786-93, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-27089479

RESUMO

Hispaniola is the only Caribbean island to which Plasmodium falciparum malaria remains endemic. Resistance to the antimalarial drug chloroquine has rarely been reported in Haiti, which is located on Hispaniola, but the K76T pfcrt (P. falciparum chloroquine resistance transporter) gene mutation that confers chloroquine resistance has been detected intermittently. We analyzed 901 patient samples collected during 2006-2009 and found 2 samples showed possible mixed parasite infections of genetically chloroquine-resistant and -sensitive parasites. Direct sequencing of the pfcrt resistance locus and single-nucleotide polymorphism barcoding did not definitively identify a resistant population, suggesting that sustained propagation of chloroquine-resistant parasites was not occurring in Haiti during the study period. Comparison of parasites from Haiti with those from Colombia, Panama, and Venezuela reveals a geographically distinct population with highly related parasites. Our findings indicate low genetic diversity in the parasite population and low levels of chloroquine resistance in Haiti, raising the possibility that reported cases may be of exogenous origin.


Assuntos
Malária Falciparum/epidemiologia , Malária Falciparum/parasitologia , Proteínas de Membrana Transportadoras/genética , Mutação , Plasmodium falciparum/genética , Proteínas de Protozoários/genética , Código de Barras de DNA Taxonômico , Geografia , Haiti/epidemiologia , História do Século XXI , Humanos , Malária Falciparum/história , Filogeografia , Plasmodium falciparum/classificação , Análise de Sequência de DNA
12.
Proc Data Compress Conf ; 2016: 221-230, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28845445

RESUMO

This paper provides the specification and an initial validation of an evaluation framework for the comparison of lossy compressors of genome sequencing quality values. The goal is to define reference data, test sets, tools and metrics that shall be used to evaluate the impact of lossy compression of quality values on human genome variant calling. The functionality of the framework is validated referring to two state-of-the-art genomic compressors. This work has been spurred by the current activity within the ISO/IEC SC29/WG11 technical committee (a.k.a. MPEG), which is investigating the possibility of starting a standardization activity for genomic information representation.

13.
Cell Syst ; 1(2): 130-140, 2015 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-26436140

RESUMO

Many data sets exhibit well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here we introduce a framework for similarity search based on characterizing a data set's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the data set is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains-high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND (3700x BLASTX)), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve 'compressive omics,' and the general theory can be readily applied to data science problems outside of biology. Source code: http://gems.csail.mit.edu.

14.
Artigo em Inglês | MEDLINE | ID: mdl-26357074

RESUMO

We introduce MRFy, a tool for protein remote homology detection that captures beta-strand dependencies in the Markov random field. Over a set of 11 SCOP beta-structural superfamilies, MRFy shows a 14 percent improvement in mean Area Under the Curve for the motif recognition problem as compared to HMMER, 25 percent improvement as compared to RAPTOR, 14 percent improvement as compared to HHPred, and a 18 percent improvement as compared to CNFPred and RaptorX. MRFy was implemented in the Haskell functional programming language, and parallelizes well on multi-core systems. MRFy is available, as source code as well as an executable, from http://mrfy.cs.tufts.edu/.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Algoritmos , Motivos de Aminoácidos , Cadeias de Markov , Modelos Estatísticos , Processos Estocásticos
15.
Pathog Glob Health ; 109(3): 153-61, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25892032

RESUMO

Genetic polymorphisms identified from genomic sequencing can be used to track changes in parasite populations through time. Such tracking is particularly informative when applying control strategies and evaluating their effectiveness. Using genomic approaches may also enable improved ability to categorise populations and to stratify them according to the likely effectiveness of intervention. Clinical applications of genomic approaches also allow relapses to be classified according to reinfection or recrudescence. These tools can be used not only to assess the effectiveness of malaria interventions but also to appraise the strategies for malaria elimination.


Assuntos
Genômica , Malária Vivax/genética , Plasmodium vivax/genética , Animais , Antimaláricos , DNA de Protozoário , Resistência a Medicamentos , Humanos , Malária Vivax/transmissão , Epidemiologia Molecular , Polimorfismo de Nucleotídeo Único , Vigilância da População , Prevenção Secundária
16.
PLoS One ; 8(10): e76339, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24194834

RESUMO

In protein-protein interaction (PPI) networks, functional similarity is often inferred based on the function of directly interacting proteins, or more generally, some notion of interaction network proximity among proteins in a local neighborhood. Prior methods typically measure proximity as the shortest-path distance in the network, but this has only a limited ability to capture fine-grained neighborhood distinctions, because most proteins are close to each other, and there are many ties in proximity. We introduce diffusion state distance (DSD), a new metric based on a graph diffusion property, designed to capture finer-grained distinctions in proximity for transfer of functional annotation in PPI networks. We present a tool that, when input a PPI network, will output the DSD distances between every pair of proteins. We show that replacing the shortest-path metric by DSD improves the performance of classical function prediction methods across the board.


Assuntos
Algoritmos , Modelos Genéticos , Mapas de Interação de Proteínas/genética , Proteínas/metabolismo
17.
Bioinformatics ; 29(13): i283-90, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812995

RESUMO

MOTIVATION: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed-up homology search directly but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools. RESULTS: We introduce a suite of homology search tools, powered by compressively accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate with all known state-of-the-art tools, including HHblits, DELTA-BLAST and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP's runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants, which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed-up many tasks, such as protein structure prediction and orthology mapping, which rely heavily on homology search. AVAILABILITY: CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/ CONTACT: bab@mit.edu.


Assuntos
Algoritmos , Compressão de Dados/métodos , Bases de Dados de Proteínas , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos , Genômica/métodos
18.
BMC Bioinformatics ; 13: 259, 2012 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-23039758

RESUMO

BACKGROUND: The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. RESULTS: We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. CONCLUSIONS: Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Sequência de Aminoácidos , Proteínas/química
19.
Bioinformatics ; 28(9): 1216-22, 2012 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-22408192

RESUMO

MOTIVATION: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif. RESULTS: We test SMURFLite on all propeller and barrel folds in the mainly-beta class of the SCOP hierarchy in stringent cross-validation experiments. We show a mean 26% (median 16%) improvement in area under curve (AUC) for beta-structural motif recognition as compared with HMMER (a well-known HMM method) and a mean 33% (median 19%) improvement as compared with RAPTOR (a well-known threading method) and even a mean 18% (median 10%) improvement in AUC over HHPred (a profile-profile HMM method), despite HHpred's use of extensive additional training data. We demonstrate SMURFLite's ability to scale to whole genomes by running a SMURFLite library of 207 beta-structural SCOP superfamilies against the entire genome of Thermotoga maritima, and make over a 100 new fold predictions. Availability and implementaion: A webserver that runs SMURFLite is available at: http://smurf.cs.tufts.edu/smurflite/


Assuntos
Cadeias de Markov , Estrutura Secundária de Proteína , Proteínas/química , Software , Sequência de Aminoácidos , Genoma Bacteriano , Humanos , Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas/genética , Thermotoga maritima/genética
20.
Artigo em Inglês | MEDLINE | ID: mdl-21464511

RESUMO

Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based measures of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level, and demonstrates qualitative differences in performance between Matt and DaliLite. Implications for the debate over the organization of protein fold space are discussed. Based on our clustering of protein space, we introduce the Mattbench benchmark set, a new collection of structural alignments useful for testing sequence aligners on more distantly homologous proteins.


Assuntos
Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Análise por Conglomerados , Biologia Computacional , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...