Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Syst Rev ; 10(1): 28, 2021 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-33453724

RESUMO

BACKGROUND: Sepsis is a life-threatening organ dysfunction caused by a dysregulated host response to infection. To decrease the high case fatality rates and morbidity for sepsis and septic shock, there is a need to increase the accuracy of early detection of suspected sepsis in prehospital and emergency department settings. This may be achieved by developing risk prediction decision support systems based on artificial intelligence. METHODS: The overall aim of this scoping review is to summarize the literature on existing methods for early detection of sepsis using artificial intelligence. The review will be performed using the framework formulated by Arksey and O'Malley and further developed by Levac and colleagues. To identify primary studies and reviews that are suitable to answer our research questions, a comprehensive literature collection will be compiled by searching several sources. Constrictions regarding time and language will have to be implemented. Therefore, only studies published between 1 January 1990 and 31 December 2020 will be taken into consideration, and foreign language publications will not be considered, i.e., only papers with full text in English will be included. Databases/web search engines that will be used are PubMed, Web of Science Platform, Scopus, IEEE Xplore, Google Scholar, Cochrane Library, and ACM Digital Library. Furthermore, clinical studies that have completed patient recruitment and reported results found in the database ClinicalTrials.gov will be considered. The term artificial intelligence is viewed broadly, and a wide range of machine learning and mathematical models suitable as base for decision support will be evaluated. Two members of the team will test the framework on a sample of included studies to ensure that the coding framework is suitable and can be consistently applied. Analysis of collected data will provide a descriptive summary and thematic analysis. The reported results will convey knowledge about the state of current research and innovation for using artificial intelligence to detect sepsis in early phases of the medical care chain. ETHICS AND DISSEMINATION: The methodology used here is based on the use of publicly available information and does not need ethical approval. It aims at aiding further research towards digital solutions for disease detection and health innovation. Results will be extracted into a review report for submission to a peer-reviewed scientific journal. Results will be shared with relevant local and national authorities and disseminated in additional appropriate formats such as conferences, lectures, and press releases.


Assuntos
Inteligência Artificial , Choque Séptico , Humanos , Grupos Populacionais , Publicações , Projetos de Pesquisa , Literatura de Revisão como Assunto
2.
Stud Health Technol Inform ; 216: 1065, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26262364

RESUMO

Late phase clinical trials are regularly outsourced to a Contract Research Organisation (CRO) while the risk and accountability remain within the sponsor company. Many statistical tasks are delivered by the CRO and later revalidated by the sponsor. Here, we report a technological approach to standardised event prediction. We have built a dynamic web application around an R-package with the aim of delivering reliable event predictions, simplifying communication and increasing trust between the CRO and the in-house statisticians via transparency. Short learning curve, interactivity, reproducibility and data diagnostics are key here. The current implementation is motivated by time-to-event prediction in oncology. We demonstrate a clear benefit of standardisation for both parties. The tool can be used for exploration, communication, sensitivity analysis and generating standard reports. At this point we wish to present this tool and share some of the insights we have gained during the development.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/organização & administração , Ensaios Clínicos como Assunto/estatística & dados numéricos , Monitoramento de Medicamentos/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Registros Eletrônicos de Saúde/estatística & dados numéricos , Serviços Terceirizados/estatística & dados numéricos , Simulação por Computador , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/diagnóstico , Registros Eletrônicos de Saúde/classificação , Humanos , Incidência , Modelos Estatísticos , Medição de Risco/métodos , Software , Reino Unido/epidemiologia
3.
PLoS One ; 8(8): e70568, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23950964

RESUMO

An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Humanos , Especificidade de Órgãos/genética
4.
Mol Cancer ; 12(1): 70, 2013 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-23835063

RESUMO

BACKGROUND: Neuroblastoma (NB) tumours are commonly divided into three cytogenetic subgroups. However, by unsupervised principal components analysis of gene expression profiles we recently identified four distinct subgroups, r1-r4. In the current study we characterized these different subgroups in more detail, with a specific focus on the fourth divergent tumour subgroup (r4). METHODS: Expression microarray data from four international studies corresponding to 148 neuroblastic tumour cases were subject to division into four expression subgroups using a previously described 6-gene signature. Differentially expressed genes between groups were identified using Significance Analysis of Microarray (SAM). Next, gene expression network modelling was performed to map signalling pathways and cellular processes representing each subgroup. Findings were validated at the protein level by immunohistochemistry and immunoblot analyses. RESULTS: We identified several significantly up-regulated genes in the r4 subgroup of which the tyrosine kinase receptor ERBB3 was most prominent (fold change: 132-240). By gene set enrichment analysis (GSEA) the constructed gene network of ERBB3 (n = 38 network partners) was significantly enriched in the r4 subgroup in all four independent data sets. ERBB3 was also positively correlated to the ErbB family members EGFR and ERBB2 in all data sets, and a concurrent overexpression was seen in the r4 subgroup. Further studies of histopathology categories using a fifth data set of 110 neuroblastic tumours, showed a striking similarity between the expression profile of r4 to ganglioneuroblastoma (GNB) and ganglioneuroma (GN) tumours. In contrast, the NB histopathological subtype was dominated by mitotic regulating genes, characterizing unfavourable NB subgroups in particular. The high ErbB3 expression in GN tumour types was verified at the protein level, and showed mainly expression in the mature ganglion cells. CONCLUSIONS: Conclusively, this study demonstrates the importance of performing unsupervised clustering and subtype discovery of data sets prior to analyses to avoid a mixture of tumour subtypes, which may otherwise give distorted results and lead to incorrect conclusions. The current study identifies ERBB3 as a clear-cut marker of a GNB/GN-like expression profile, and we suggest a 7-gene expression signature (including ERBB3) as a complement to histopathology analysis of neuroblastic tumours. Further studies of ErbB3 and other ErbB family members and their role in neuroblastic differentiation and pathogenesis are warranted.


Assuntos
Biomarcadores Tumorais/metabolismo , Ganglioneuroblastoma/metabolismo , Ganglioneuroma/metabolismo , Neoplasias do Sistema Nervoso Periférico/metabolismo , Receptor ErbB-3/metabolismo , Biomarcadores Tumorais/genética , Regulação Neoplásica da Expressão Gênica , Ontologia Genética , Redes Reguladoras de Genes , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Receptor ErbB-3/genética , Transcriptoma , Regulação para Cima
5.
Cancer Cell Int ; 11: 9, 2011 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-21492432

RESUMO

BACKGROUND: There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB); Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA) and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK) was associated to unfavourable biology of sporadic NB. Also, various other genes have been linked to NB pathogenesis. RESULTS: The present study explores subgroup discrimination by gene expression profiling using three published microarray studies on NB (47 samples). Four distinct clusters were identified by Principal Components Analysis (PCA) in two separate data sets, which could be verified by an unsupervised hierarchical clustering in a third independent data set (101 NB samples) using a set of 74 discriminative genes. The expression signature of six NB-associated genes ALK, BIRC5, CCND1, MYCN, NTRK1, and PHOX2B, significantly discriminated the four clusters (p < 0.05, one-way ANOVA test). PCA clusters p1, p2, and p3 were found to correspond well to the postulated subtypes 1, 2A, and 2B, respectively. Remarkably, a fourth novel cluster was detected in all three independent data sets. This cluster comprised mainly 11q-deleted MNA-negative tumours with low expression of ALK, BIRC5, and PHOX2B, and was significantly associated with higher tumour stage, poor outcome and poor survival compared to the Type 1-corresponding favourable group (INSS stage 4 and/or dead of disease, p < 0.05, Fisher's exact test). CONCLUSIONS: Based on expression profiling we have identified four molecular subgroups of neuroblastoma, which can be distinguished by a 6-gene signature. The fourth subgroup has not been described elsewhere, and efforts are currently made to further investigate this group's specific characteristics.

6.
Bioinformatics ; 26(3): 295-301, 2010 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-20008478

RESUMO

MOTIVATION: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , DNA/química , Metagenoma , Metagenômica/métodos , Análise de Sequência de DNA/métodos , DNA/genética , Bases de Dados Genéticas
7.
Genome Med ; 1(9): 88, 2009 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-19754960

RESUMO

Systems biology has matured considerably as a discipline over the last decade, yet some of the key challenges separating current research efforts in systems biology and clinically useful results are only now becoming apparent. As these gaps are better defined, the new discipline of systems medicine is emerging as a translational extension of systems biology. How is systems medicine defined? What are relevant ontologies for systems medicine? What are the key theoretic and methodologic challenges facing computational disease modeling? How are inaccurate and incomplete data, and uncertain biologic knowledge best synthesized in useful computational models? Does network analysis provide clinically useful insight? We discuss the outstanding difficulties in translating a rapidly growing body of data into knowledge usable at the bedside. Although core-specific challenges are best met by specialized groups, it appears fundamental that such efforts should be guided by a roadmap for systems medicine drafted by a coalition of scientists from the clinical, experimental, computational, and theoretic domains.

8.
Bioinformatics ; 25(20): 2737-8, 2009 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-19696045

RESUMO

UNLABELLED: Microorganisms are ubiquitous in nature and constitute intrinsic parts of almost every ecosystem. A culture-independent and powerful way to study microbial communities is metagenomics. In such studies, functional analysis is performed on fragmented genetic material from multiple species in the community. The recent advances in high-throughput sequencing have greatly increased the amount of data in metagenomic projects. At present, there is an urgent need for efficient statistical tools to analyse these data. We have created ShotgunFunctionalizeR, an R-package for functional comparison of metagenomes. The package contains tools for importing, annotating and visualizing metagenomic data produced by shotgun high-throughput sequencing. ShotgunFunctionalizeR contains several statistical procedures for assessing functional differences between samples, both for individual genes and for entire pathways. In addition to standard and previously published methods, we have developed and implemented a novel approach based on a Poisson model. This procedure is highly flexible and thus applicable to a wide range of different experimental designs. We demonstrate the potential of ShotgunFunctionalizeR by performing a regression analysis on metagenomes sampled at multiple depths in the Pacific Ocean. AVAILABILITY: http://shotgun.zool.gu.se


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Metagenoma , Metagenômica/métodos , Software , Animais , Bases de Dados Genéticas , Humanos
9.
Stat Appl Genet Mol Biol ; 8: Article 19, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19341353

RESUMO

Clumping of gene properties like expression or mutant phenotypes along chromosomes is commonly detected using completely random null-models where their location is equally likely across the chromosomes. Interpretation of statistical tests based on these assumptions may be misleading if dependencies exist that are unequal between chromosomes or in different chromosomal parts. One such regional dependency is the telomeric effect, observed in several studies of Saccharomyces cerevisiae, under which e.g. essential genes are less likely to reside near the chromosomal ends. In this study we demonstrate that standard randomisation test procedures are of limited applicability in the presence of telomeric effects. Several extensions of such standard tests are here suggested for handling clumping simultaneously with regional differences in essentiality frequencies in sub-telomeric and central gene positions. Furthermore, a general non-homogeneous discrete Markov approach for combining parametrically modelled position dependent probabilities of a dichotomous property with a simple single parameter clumping is suggested. This Markov model is adapted to the observed telomeric effects and then simulations are used to demonstrate properties of the suggested modified randomisation tests. The model is also applied as a direct alternative tool for statistical analysis of the S. cerevisiae genome for clumping of phenotypes.


Assuntos
Mapeamento Cromossômico , Modelos Genéticos , Saccharomyces cerevisiae/metabolismo , Simulação por Computador , Regulação Fúngica da Expressão Gênica , Genes Fúngicos , Genoma , Cadeias de Markov , Modelos Biológicos , Modelos Estatísticos , Fenótipo , Probabilidade , Distribuição Aleatória , Telômero/ultraestrutura
10.
Nucleic Acids Res ; 37(7): 2096-104, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19223325

RESUMO

In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online.


Assuntos
Algoritmos , Proteínas Arqueais/classificação , Proteínas Arqueais/química , Proteínas Arqueais/genética , Análise por Conglomerados , Fenótipo , Filogenia , Análise de Sequência de Proteína , Software
11.
Bioinformatics ; 24(16): i7-13, 2008 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-18689842

RESUMO

MOTIVATION: A typical metagenome dataset generated using a 454 pyrosequencing platform consists of short reads sampled from the collective genome of a microbial community. The amount of sequence in such datasets is usually insufficient for assembly, and traditional gene prediction cannot be applied to unassembled short reads. As a result, analysis of such datasets usually involves comparisons in terms of relative abundances of various protein families. The latter requires assignment of individual reads to protein families, which is hindered by the fact that short reads contain only a fragment, usually small, of a protein. RESULTS: We have considered the assignment of pyrosequencing reads to protein families directly using RPS-BLAST against COG and Pfam databases and indirectly via proxygenes that are identified using BLASTx searches against protein sequence databases. Using simulated metagenome datasets as benchmarks, we show that the proxygene method is more accurate than the direct assignment. We introduce a clustering method which significantly reduces the size of a metagenome dataset while maintaining a faithful representation of its functional and taxonomic content.


Assuntos
Proteínas de Bactérias/genética , Mapeamento Cromossômico/métodos , Fases de Leitura Aberta/genética , Proteoma/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Análise por Conglomerados , Dados de Sequência Molecular
12.
PLoS One ; 3(7): e2607, 2008 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-18612393

RESUMO

BACKGROUND: Environments and their organic content are generally not static and isolated, but in a constant state of exchange and interaction with each other. Through physical or biological processes, organisms, especially microbes, may be transferred between environments whose characteristics may be quite different. The transferred microbes may not survive in their new environment, but their DNA will be deposited. In this study, we compare two environmental sequencing projects to find molecular evidence of transfer of microbes over vast geographical distances. METHODOLOGY: By studying synonymous nucleotide composition, oligomer frequency and orthology between predicted genes in metagenomics data from two environments, terrestrial and aquatic, and by correlating with phylogenetic mappings, we find that both environments are likely to contain trace amounts of microbes which have been far removed from their original habitat. We also suggest a bias in direction from soil to sea, which is consistent with the cycles of planetary wind and water. CONCLUSIONS: Our findings support the Baas-Becking hypothesis formulated in 1934, which states that due to dispersion and population sizes, microbes are likely to be found in widely disparate environments. Furthermore, the availability of genetic material from distant environments is a possible font of novel gene functions for lateral gene transfer.


Assuntos
Meio Ambiente , Genes Bacterianos , Ecologia , Ecossistema , Transferência Genética Horizontal , Filogenia , Microbiologia da Água
13.
Bioinformatics ; 24(11): 1332-8, 2008 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-18381402

RESUMO

MOTIVATION: The evolutionary distance inferred from gene-order comparisons of related bacteria is dependent on the model. Therefore, it is highly important to establish reliable assumptions before inferring its magnitude. RESULTS: We investigate the patterns of dotplots between species of bacteria with the purpose of model selection in gene-order problems. We find several categories of data which can be explained by carefully weighing the contributions of reversals, transpositions, symmetrical reversals, single gene transpositions and single gene reversals. We also derive method of moments distance estimates for some previously uncomputed cases, such as symmetrical reversals, single gene reversals and their combinations, as well as the single gene transpositions edit distance.


Assuntos
Evolução Biológica , Mapeamento Cromossômico/métodos , DNA Bacteriano/genética , Evolução Molecular , Genoma Bacteriano/genética , Modelos Genéticos , Análise de Sequência de DNA/métodos , Sequência de Bases , Simulação por Computador , Desequilíbrio de Ligação/genética , Dados de Sequência Molecular
14.
Nucleic Acids Res ; 36(Database issue): D534-8, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17932063

RESUMO

IMG/M is a data management and analysis system for microbial community genomes (metagenomes) hosted at the Department of Energy's (DOE) Joint Genome Institute (JGI). IMG/M consists of metagenome data integrated with isolate microbial genomes from the Integrated Microbial Genomes (IMG) system. IMG/M provides IMG's comparative data analysis tools extended to handle metagenome data, together with metagenome-specific analysis tools. IMG/M is available at http://img.jgi.doe.gov/m.


Assuntos
Bases de Dados Genéticas , Microbiologia Ambiental , Genoma Arqueal , Genoma Bacteriano , Sistemas de Gerenciamento de Base de Dados , Genômica , Internet , Software
15.
Nature ; 450(7169): 560-5, 2007 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-18033299

RESUMO

From the standpoints of both basic research and biotechnology, there is considerable interest in reaching a clearer understanding of the diversity of biological mechanisms employed during lignocellulose degradation. Globally, termites are an extremely successful group of wood-degrading organisms and are therefore important both for their roles in carbon turnover in the environment and as potential sources of biochemical catalysts for efforts aimed at converting wood into biofuels. Only recently have data supported any direct role for the symbiotic bacteria in the gut of the termite in cellulose and xylan hydrolysis. Here we use a metagenomic analysis of the bacterial community resident in the hindgut paunch of a wood-feeding 'higher' Nasutitermes species (which do not contain cellulose-fermenting protozoa) to show the presence of a large, diverse set of bacterial genes for cellulose and xylan hydrolysis. Many of these genes were expressed in vivo or had cellulase activity in vitro, and further analyses implicate spirochete and fibrobacter species in gut lignocellulose degradation. New insights into other important symbiotic functions including H2 metabolism, CO2-reductive acetogenesis and N2 fixation are also provided by this first system-wide gene analysis of a microbial community specialized towards plant lignocellulose degradation. Our results underscore how complex even a 1-microl environment can be.


Assuntos
Bactérias/metabolismo , Genoma Bacteriano/genética , Genômica , Intestinos/microbiologia , Isópteros/metabolismo , Isópteros/microbiologia , Madeira/metabolismo , Animais , Bactérias/enzimologia , Bactérias/genética , Bactérias/isolamento & purificação , Fontes de Energia Bioelétrica , Carbono/metabolismo , Domínio Catalítico , Celulose/metabolismo , Costa Rica , Genes Bacterianos/genética , Glicosídeo Hidrolases/química , Glicosídeo Hidrolases/genética , Glicosídeo Hidrolases/metabolismo , Hidrólise , Lignina/metabolismo , Modelos Biológicos , Dados de Sequência Molecular , Reação em Cadeia da Polimerase , Simbiose , Madeira/química , Xilanos/metabolismo
16.
BMC Bioinformatics ; 8: 402, 2007 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-17949484

RESUMO

BACKGROUND: Accurate taxonomy is best maintained if species are arranged as hierarchical groups in phylogenetic trees. This is especially important as trees grow larger as a consequence of a rapidly expanding sequence database. Hierarchical group names are typically manually assigned in trees, an approach that becomes unfeasible for very large topologies. RESULTS: We have developed an automated iterative procedure for delineating stable (monophyletic) hierarchical groups to large (or small) trees and naming those groups according to a set of sequentially applied rules. In addition, we have created an associated ungrouping tool for removing existing groups that do not meet user-defined criteria (such as monophyly). The procedure is implemented in a program called GRUNT (GRouping, Ungrouping, Naming Tool) and has been applied to the current release of the Greengenes (Hugenholtz) 16S rRNA gene taxonomy comprising more than 130,000 taxa. CONCLUSION: GRUNT will facilitate researchers requiring comprehensive hierarchical grouping of large tree topologies in, for example, database curation, microarray design and pangenome assignments. The application is available at the greengenes website 1.


Assuntos
Bases de Dados de Ácidos Nucleicos , Filogenia , Software , Algoritmos , Classificação , Sistemas de Gerenciamento de Base de Dados , RNA Ribossômico 16S/análise
17.
BMC Bioinformatics ; 8: 295, 2007 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-17686169

RESUMO

BACKGROUND: The translational efficiency of an mRNA can be modulated by upstream open reading frames (uORFs) present in certain genes. A uORF can attenuate translation of the main ORF by interfering with translational reinitiation at the main start codon. uORFs also occur by chance in the genome, in which case they do not have a regulatory role. Since the sequence determinants for functional uORFs are not understood, it is difficult to discriminate functional from spurious uORFs by sequence analysis. RESULTS: We have used comparative genomics to identify novel uORFs in yeast with a high likelihood of having a translational regulatory role. We examined uORFs, previously shown to play a role in regulation of translation in Saccharomyces cerevisiae, for evolutionary conservation within seven Saccharomyces species. Inspection of the set of conserved uORFs yielded the following three characteristics useful for discrimination of functional from spurious uORFs: a length between 4 and 6 codons, a distance from the start of the main ORF between 50 and 150 nucleotides, and finally a lack of overlap with, and clear separation from, neighbouring uORFs. These derived rules are inherently associated with uORFs with properties similar to the GCN4 locus, and may not detect most uORFs of other types. uORFs with high scores based on these rules showed a much higher evolutionary conservation than randomly selected uORFs. In a genome-wide scan in S. cerevisiae, we found 34 conserved uORFs from 32 genes that we predict to be functional; subsequent analysis showed the majority of these to be located within transcripts. A total of 252 genes were found containing conserved uORFs with properties indicative of a functional role; all but 7 are novel. Functional content analysis of this set identified an overrepresentation of genes involved in transcriptional control and development. CONCLUSION: Evolutionary conservation of uORFs in yeasts can be traced up to 100 million years of separation. The conserved uORFs have certain characteristics with respect to length, distance from each other and from the main start codon, and folding energy of the sequence. These newly found characteristics can be used to facilitate detection of other conserved uORFs.


Assuntos
Mapeamento Cromossômico/métodos , Evolução Molecular , Genoma Fúngico/genética , Fases de Leitura Aberta/genética , Sequências Reguladoras de Ácido Nucleico/genética , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Sequência Conservada/genética , Dados de Sequência Molecular , Biossíntese de Proteínas/genética
18.
Stat Appl Genet Mol Biol ; 5: Article8, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16646872

RESUMO

Recently Peres and Shields discovered a new method for estimating the order of a stationary fixed order Markov chain. They showed that the estimator is consistent by proving a threshold result. While this threshold is valid asymptotically in the limit, it is not very useful for DNA sequence analysis where data sizes are moderate. In this paper we give a novel interpretation of the Peres-Shields estimator as a sharp transition phenomenon. This yields a precise and powerful estimator that quickly identifies the core dependencies in data. We show that it compares favorably to other estimators, especially in the presence of variable dependencies. Motivated by this last point, we extend the Peres-Shields estimator to Variable Length Markov Chains. We compare it to a well-established estimator and show that it is superior in terms of the predictive likelihood. We give an application to the problem of detecting DNA sequence similarity in plasmids.


Assuntos
Cadeias de Markov , Análise de Sequência de DNA/métodos , Elementos de DNA Transponíveis , Modelos Estatísticos , Plasmídeos/química
19.
Bioinformatics ; 22(5): 517-22, 2006 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-16403797

RESUMO

MOTIVATION: Analyses of genomic signatures are gaining attention as they allow studies of species-specific relationships without involving alignments of homologous sequences. A naïve Bayesian classifier was built to discriminate between different bacterial compositions of short oligomers, also known as DNA words. The classifier has proven successful in identifying foreign genes in Neisseria meningitis. In this study we extend the classifier approach using either a fixed higher order Markov model (Mk) or a variable length Markov model (VLMk). RESULTS: We propose a simple algorithm to lock a variable length Markov model to a certain number of parameters and show that the use of Markov models greatly increases the flexibility and accuracy in prediction to that of a naïve model. We also test the integrity of classifiers in terms of false-negatives and give estimates of the minimal sizes of training data. We end the report by proposing a method to reject a false hypothesis of horizontal gene transfer. AVAILABILITY: Software and Supplementary information available at www.cs.chalmers.se/~dalevi/genetic_sign_classifiers/.


Assuntos
Mapeamento Cromossômico/métodos , Impressões Digitais de DNA/métodos , DNA Bacteriano/genética , Transferência Genética Horizontal/genética , Genoma Bacteriano/genética , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de DNA/métodos , Inteligência Artificial , Teorema de Bayes , Cadeias de Markov , Modelos Genéticos , Modelos Estatísticos , Especificidade da Espécie
20.
Bioinformatics ; 20(18): 3628-35, 2004 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-15297302

RESUMO

UNLABELLED: A set of new algorithms and software tools for automatic protein identification using peptide mass fingerprinting is presented. The software is automatic, fast and modular to suit different laboratory needs, and it can be operated either via a Java user interface or called from within scripts. The software modules do peak extraction, peak filtering and protein database matching, and communicate via XML. Individual modules can therefore easily be replaced with other software if desired, and all intermediate results are available to the user. The algorithms are designed to operate without human intervention and contain several novel approaches. The performance and capabilities of the software is illustrated on spectra from different mass spectrometer manufacturers, and the factors influencing successful identification are discussed and quantified. MOTIVATION: Protein identification with mass spectrometric methods is a key step in modern proteomics studies. Some tools are available today for doing different steps in the analysis. Only a few commercial systems integrate all the steps in the analysis, often for only one vendor's hardware, and the details of these systems are not public. RESULTS: A complete system for doing protein identification with peptide mass fingerprints is presented, including everything from peak picking to matching the database protein. The details of the different algorithms are disclosed so that academic researchers can have full control of their tools. AVAILABILITY: The described software tools are available from the Halmstad University website www.hh.se/staff/bioinf/ SUPPLEMENTARY INFORMATION: Details of the algorithms are described in supporting information available from the Halmstad University website www.hh.se/staff/bioinf/


Assuntos
Mapeamento de Peptídeos/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Interface Usuário-Computador , Algoritmos , Sistemas de Gerenciamento de Base de Dados , Documentação/métodos , Armazenamento e Recuperação da Informação/métodos , Linguagens de Programação , Proteínas/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...