Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Int J Cancer ; 137(1): 86-95, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25422082

RESUMO

Gastric cancer is one of the most prevalent and aggressive cancers worldwide, and its molecular mechanism remains largely elusive. Here we report the genomic landscape in primary gastric adenocarcinoma of human, based on the complete genome sequences of five pairs of cancer and matching normal samples. In total, 103,464 somatic point mutations, including 407 nonsynonymous ones, were identified and the most recurrent mutations were harbored by Mucins (MUC3A and MUC12) and transcription factors (ZNF717, ZNF595 and TP53). 679 genomic rearrangements were detected, which affect 355 protein-coding genes; and 76 genes show copy number changes. Through mapping the boundaries of the rearranged regions to the folded three-dimensional structure of human chromosomes, we determined that 79.6% of the chromosomal rearrangements happen among DNA fragments in close spatial proximity, especially when two endpoints stay in a similar replication phase. We demonstrated evidences that microhomology-mediated break-induced replication was utilized as a mechanism in inducing ∼40.9% of the identified genomic changes in gastric tumor. Our data analyses revealed potential integrations of Helicobacter pylori DNA into the gastric cancer genomes. Overall a large set of novel genomic variations were detected in these gastric cancer genomes, which may be essential to the study of the genetic basis and molecular mechanism of the gastric tumorigenesis.


Assuntos
Adenocarcinoma/genética , Aberrações Cromossômicas , Variação Genética , Infecções por Helicobacter/genética , Helicobacter pylori/fisiologia , Neoplasias Gástricas/genética , Adenocarcinoma/patologia , Adenocarcinoma/virologia , Idoso , Variações do Número de Cópias de DNA , DNA Viral/análise , Genoma Humano , Humanos , Masculino , Pessoa de Meia-Idade , Mutação Puntual , Polimorfismo de Nucleotídeo Único , Neoplasias Gástricas/patologia , Neoplasias Gástricas/virologia
2.
J Bioinform Comput Biol ; 12(1): 1350019, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24467758

RESUMO

The GPCR genes have a variety of exon-intron structures even though their proteins are all structurally homologous. We have examined all human GPCR genes with at least two functional protein isoforms, totaling 199, aiming to gain an understanding of what may have contributed to the large diversity of the exon-intron structures of the GPCR genes. The 199 genes have a total of 808 known protein splicing isoforms with experimentally verified functions. Our analysis reveals that 1301 (80.6%) adjacent exon-exon pairs out of the total of 1,613 in the 199 genes have either exactly one exon skipped or the intron in-between retained in at least one of the 808 protein splicing isoforms. This observation has a statistical significance p-value of 2.051762 * e(-09), assuming that the observed splicing isoforms are independent of the exon-intron structures. Our interpretation of this observation is that the exon boundaries of the GPCR genes are not randomly determined; instead they may be selected to facilitate specific alternative splicing for functional purposes.


Assuntos
Isoformas de Proteínas , Receptores Acoplados a Proteínas G/genética , Processamento Alternativo , Éxons , Humanos , Íntrons , Modelos Genéticos , Receptores Acoplados a Proteínas G/metabolismo
3.
PLoS One ; 8(2): e56726, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23457606

RESUMO

We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a) identification of horizontally transferred genes, (b) identification of genomic islands with special properties and (c) binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a) calculation of the k-mer based barcode image for a provided DNA sequence; (b) detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c) clustering of provided DNA sequences into groups having similar barcodes; and (d) homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode.


Assuntos
Gráficos por Computador , Genômica/métodos , Software , Algoritmos , Análise por Conglomerados , Mineração de Dados , Escherichia coli K12/genética , Escherichia coli O157/genética , Ilhas Genômicas/genética , Metagenômica , Análise de Sequência
5.
PLoS One ; 7(1): e29496, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22235300

RESUMO

Regulons, as groups of transcriptionally co-regulated operons, are the basic units of cellular response systems in bacterial cells. While the concept has been long and widely used in bacterial studies since it was first proposed in 1964, very little is known about how its component operons are arranged in a bacterial genome. We present a computational study to elucidate of the organizational principles of regulons in a bacterial genome, based on the experimentally validated regulons of E. coli and B. subtilis. Our results indicate that (1) genomic locations of transcriptional factors (TFs) are under stronger evolutionary constraints than those of the operons they regulate so changing a TF's genomic location will have larger impact to the bacterium than changing the genomic position of any of its target operons; (2) operons of regulons are generally not uniformly distributed in the genome but tend to form a few closely located clusters, which generally consist of genes working in the same metabolic pathways; and (3) the global arrangement of the component operons of all the regulons in a genome tends to minimize a simple scoring function, indicating that the global arrangement of regulons follows simple organizational principles.


Assuntos
Biologia Computacional , Genoma Bacteriano/genética , Regulon/genética , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Evolução Molecular , Óperon/genética , Fatores de Transcrição/metabolismo
6.
World J Gastroenterol ; 17(14): 1910-4, 2011 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-21528067

RESUMO

AIM: To identify and assess the novel makers for detection of Shiga toxin producing Escherichia coli (STEC) O157:H7 with an integrated computational and experimental approach. METHODS: High-throughput NCBI blast (E-value cutoff e-5) was used to search homologous genes among all sequenced prokaryotic genomes of each gene encoded in each of the three strains of STEC O157:H7 with complete genomes, aiming to find unique genes in O157:H7 as its potential markers. To ensure that the identified markers from the three strains of STEC O157:H7 can serve as general markers for all the STEC O157:H7 strains, a genomic barcode approach was used to select the markers to minimize the possibility of choosing a marker gene as part of a transposable element. Effectiveness of the markers predicted was then validated by running polymerase chain reaction (PCR) on 18 strains of O157:H7 with 5 additional genomes used as negative controls. RESULTS: The blast search identified 20, 16 and 20 genes, respectively, in the three sequenced strains of STEC O157:H7, which had no homologs in any of the other prokaryotic genomes. Three genes, wzy, Z0372 and Z0344, common to the three gene lists, were selected based on the genomic barcode approach. PCR showed an identification accuracy of 100% on the 18 tested strains and the 5 controls. CONCLUSION: The three identified novel markers, wzy, Z0372 and Z0344, are highly promising for the detection of STEC O157:H7, in complementary to the known markers.


Assuntos
Infecções por Escherichia coli/diagnóstico , Escherichia coli O157/genética , Marcadores Genéticos , Animais , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Dados de Sequência Molecular , Reação em Cadeia da Polimerase , Reprodutibilidade dos Testes , Toxinas Shiga
7.
Structure ; 19(4): 484-95, 2011 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-21481772

RESUMO

Nuclear magnetic resonance paramagnetic relaxation enhancement (PRE) measures long-range distances to isotopically labeled residues, providing useful constraints for protein structure prediction. The method usually requires labor-intensive conjugation of nitroxide labels to multiple locations on the protein, one at a time. Here a computational procedure, based on protein sequence and simple secondary structure models, is presented to facilitate optimal placement of a minimum number of labels needed to determine the correct topology of a helical transmembrane protein. Tests on DsbB (four helices) using just one label lead to correct topology predictions in four of five cases, with the predicted structures <6 Å to the native structure. Benchmark results using simulated PRE data show that we can generally predict the correct topology for five and six to seven helices using two and three labels, respectively, with an average success rate of 76% and structures of similar precision. The results show promise in facilitating experimentally constrained structure prediction of membrane proteins.


Assuntos
Biologia Computacional/métodos , Proteínas de Membrana/química , Mutação , Estrutura Secundária de Proteína , Animais , Sítios de Ligação/genética , Humanos , Espectroscopia de Ressonância Magnética , Proteínas de Membrana/genética , Modelos Moleculares , Reprodutibilidade dos Testes
8.
Nucleic Acids Res ; 39(4): 1197-207, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20965966

RESUMO

This report describes an integrated study on identification of potential markers for gastric cancer in patients' cancer tissues and sera based on: (i) genome-scale transcriptomic analyses of 80 paired gastric cancer/reference tissues and (ii) computational prediction of blood-secretory proteins supported by experimental validation. Our findings show that: (i) 715 and 150 genes exhibit significantly differential expressions in all cancers and early-stage cancers versus reference tissues, respectively; and a substantial percentage of the alteration is found to be influenced by age and/or by gender; (ii) 21 co-expressed gene clusters have been identified, some of which are specific to certain subtypes or stages of the cancer; (iii) the top-ranked gene signatures give better than 94% classification accuracy between cancer and the reference tissues, some of which are gender-specific; and (iv) 136 of the differentially expressed genes were predicted to have their proteins secreted into blood, 81 of which were detected experimentally in the sera of 13 validation samples and 29 found to have differential abundances in the sera of cancer patients versus controls. Overall, the novel information obtained in this study has led to identification of promising diagnostic markers for gastric cancer and can benefit further analyses of the key (early) abnormalities during its development.


Assuntos
Biomarcadores Tumorais/sangue , Neoplasias Gástricas/genética , Adulto , Fatores Etários , Idoso , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Biologia Computacional , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Fatores Sexuais , Neoplasias Gástricas/sangue , Neoplasias Gástricas/classificação
9.
PLoS One ; 5(10): e13696, 2010 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-21060876

RESUMO

A comparative study of public gene-expression data of seven types of cancers (breast, colon, kidney, lung, pancreatic, prostate and stomach cancers) was conducted with the aim of deriving marker genes, along with associated pathways, that are either common to multiple types of cancers or specific to individual cancers. The analysis results indicate that (a) each of the seven cancer types can be distinguished from its corresponding control tissue based on the expression patterns of a small number of genes, e.g., 2, 3 or 4; (b) the expression patterns of some genes can distinguish multiple cancer types from their corresponding control tissues, potentially serving as general markers for all or some groups of cancers; (c) the proteins encoded by some of these genes are predicted to be blood secretory, thus providing potential cancer markers in blood; (d) the numbers of differentially expressed genes across different cancer types in comparison with their control tissues correlate well with the five-year survival rates associated with the individual cancers; and (e) some metabolic and signaling pathways are abnormally activated or deactivated across all cancer types, while other pathways are more specific to certain cancers or groups of cancers. The novel findings of this study offer considerable insight into these seven cancer types and have the potential to provide exciting new directions for diagnostic and therapeutic development.


Assuntos
Perfilação da Expressão Gênica , Neoplasias/genética , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Taxa de Sobrevida
10.
BMC Genomics ; 11: 291, 2010 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-20459751

RESUMO

BACKGROUND: Osmotic stress is caused by sudden changes in the impermeable solute concentration around a cell, which induces instantaneous water flow in or out of the cell to balance the concentration. Very little is known about the detailed response mechanism to osmotic stress in marine Synechococcus, one of the major oxygenic phototrophic cyanobacterial genera that contribute greatly to the global CO2 fixation. RESULTS: We present here a computational study of the osmoregulation network in response to hyperosmotic stress of Synechococcus sp strain WH8102 using comparative genome analyses and computational prediction. In this study, we identified the key transporters, synthetases, signal sensor proteins and transcriptional regulator proteins, and found experimentally that of these proteins, 15 genes showed significantly changed expression levels under a mild hyperosmotic stress. CONCLUSIONS: From the predicted network model, we have made a number of interesting observations about WH8102. Specifically, we found that (i) the organism likely uses glycine betaine as the major osmolyte, and others such as glucosylglycerol, glucosylglycerate, trehalose, sucrose and arginine as the minor osmolytes, making it efficient and adaptable to its changing environment; and (ii) sigma38, one of the seven types of sigma factors, probably serves as a global regulator coordinating the osmoregulation network and the other relevant networks.


Assuntos
Polissacarídeos/metabolismo , Synechococcus/química , Synechococcus/metabolismo , Equilíbrio Hidroeletrolítico , Arginina/metabolismo , Betaína/metabolismo , Synechococcus/enzimologia
11.
Proc Natl Acad Sci U S A ; 107(14): 6310-5, 2010 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-20308592

RESUMO

It is generally known that bacterial genes working in the same biological pathways tend to group into operons, possibly to facilitate cotranscription and to provide stoichiometry. However, very little is understood about what may determine the global arrangement of bacterial genes in a genome beyond the operon level. Here we present evidence that the global arrangement of operons in a bacterial genome is largely influenced by the tendency that a bacterium keeps its operons encoding the same biological pathway in nearby genomic locations, and by the tendency to keep operons involved in multiple pathways in locations close to the other members of their participating pathways. We also observed that the activation frequencies of pathways also influence the genomic locations of their encoding operons, tending to have operons of the more frequently activated pathways more tightly clustered together. We have quantitatively assessed the influences on the global genomic arrangement of operons by different factors. We found that the current arrangements of operons in most of the bacterial genomes we studied tend to minimize the overall distance between consecutive operons of a same pathway across all pathways encoded in the genome.


Assuntos
Bacillus subtilis/genética , Escherichia coli/genética , Genoma Bacteriano , Óperon , Família Multigênica
12.
FEBS Lett ; 584(1): 194-8, 2010 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-19941858

RESUMO

The genome of lethal animal pathogenic bacterium Enterohemorrhagic Escherichia coli (EHEC) O157:H7 is characterized by the presence of multiple pathogenicity islands (PAIs). Computational methods have been developed to identify PAIs based on the distinguishing G+C levels in some PAI versus non-PAI regions. We observed that PAIs can have a very similar G+C level to that of the host chromosome, which may have led to false negative predictions using these methods. We have applied a novel method of genomic barcodes to identify PAIs. Using this technique, we have successfully identified both known and novel PAIs in the genomes of three strains of EHEC O157:H7.


Assuntos
Composição de Bases , Cromossomos Bacterianos/genética , Escherichia coli O157/patogenicidade , Ilhas Genômicas/genética , Genômica/métodos , Análise de Sequência de DNA/métodos , Escherichia coli O157/genética
13.
Artigo em Inglês | MEDLINE | ID: mdl-19407357

RESUMO

Large sets of bioinformatical data provide a challenge in time consumption while solving the cluster identification problem, and that is why a parallel algorithm is so needed for identifying dense clusters in a noisy background. Our algorithm works on a graph representation of the data set to be analyzed. It identifies clusters through the identification of densely intraconnected subgraphs. We have employed a minimum spanning tree (MST) representation of the graph and solve the cluster identification problem using this representation. The computational bottleneck of our algorithm is the construction of an MST of a graph, for which a parallel algorithm is employed. Our high-level strategy for the parallel MST construction algorithm is to first partition the graph, then construct MSTs for the partitioned subgraphs and auxiliary bipartite graphs based on the subgraphs, and finally merge these MSTs to derive an MST of the original graph. The computational results indicate that when running on 150 CPUs, our algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner. We have implemented the clustering algorithm as the software CLUMP.


Assuntos
Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Bases de Dados Genéticas , Reconhecimento Automatizado de Padrão/métodos , Modelos Lineares , Família Multigênica , Reprodutibilidade dos Testes , Software , Integração de Sistemas
14.
Genomics Proteomics Bioinformatics ; 7(4): 194-9, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20172492

RESUMO

Cellulases are important glycosyl hydrolases (GHs) that hydrolyze cellulose polymers into smaller oligosaccharides by breaking the cellulose beta (1-->4) bonds, and they are widely used to produce cellulosic ethanol from the plant biomass. N-linked and O-linked glycosylations were proposed to impact the catalytic efficiency, cellulose binding affinity and the stability of cellulases based on observations of individual cellulases. As far as we know, there has not been any systematic analysis of the distributions of N-linked and O-linked glycosylated residues in cellulases, mainly due to the limited annotations of the relevant functional domains and the glycosylated residues. We have computationally annotated the functional domains and glycosylated residues in cellulases, and conducted a systematic analysis of the distributions of the N-linked and O-linked glycosylated residues in these enzymes. Many N-linked glycosylated residues were known to be in the GH domains of cellulases, but they are there probably just by chance, since the GH domain usually occupies more than half of the sequence length of a cellulase. Our analysis indicates that the O-linked glycosylated residues are significantly enriched in the linker regions between the carbohydrate binding module (CBM) domains and GH domains of cellulases. Possible mechanisms are discussed.


Assuntos
Celulases/química , Celulases/metabolismo , Celulose/metabolismo , Glicosilação , Estrutura Terciária de Proteína
15.
Nucleic Acids Res ; 37(Database issue): D459-63, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18988623

RESUMO

We present a database DOOR (Database for prOkaryotic OpeRons) containing computationally predicted operons of all the sequenced prokaryotic genomes. All the operons in DOOR are predicted using our own prediction program, which was ranked to be the best among 14 operon prediction programs by a recent independent review. Currently, the DOOR database contains operons for 675 prokaryotic genomes, and supports a number of search capabilities to facilitate easy access and utilization of the information stored in it. (1) Querying the database: the database provides a search capability for a user to find desired operons and associated information through multiple querying methods. (2) Searching for similar operons: the database provides a search capability for a user to find operons that have similar composition and structure to a query operon. (3) Prediction of cis-regulatory motifs: the database provides a capability for motif identification in the promoter regions of a user-specified group of possibly coregulated operons, using motif-finding tools. (4) Operons for RNA genes: the database includes operons for RNA genes. (5) OperonWiki: the database provides a wiki page (OperonWiki) to facilitate interactions between users and the developer of the database. We believe that DOOR provides a useful resource to many biologists working on bacteria and archaea, which can be accessed at http://csbl1.bmb.uga.edu/OperonDB.


Assuntos
Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Óperon , Genômica , Software
16.
BMC Bioinformatics ; 9: 546, 2008 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-19091119

RESUMO

BACKGROUND: Each genome has a stable distribution of the combined frequency for each k-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1

Assuntos
Algoritmos , Sequência de Bases/genética , Biologia Computacional/métodos , Genoma/genética , Genômica/métodos , Especificidade da Espécie
17.
J Bioinform Comput Biol ; 6(3): 585-602, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18574864

RESUMO

As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However, the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletion, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as insertion/deletion (indel) frequency arrays (IFAs). By applying IFAs to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity. We have also demonstrated that the application of this information can lead to an improvement in fold recognition.


Assuntos
Biologia Computacional , Mutação INDEL , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Dados de Sequência Molecular , Mutagênese Insercional/métodos , Conformação Proteica , Deleção de Sequência , Software , Relação Estrutura-Atividade
18.
Comput Biol Chem ; 32(3): 176-84, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18440870

RESUMO

Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene functional annotations. We have developed a method to integrate genomic neighborhood information of genes with their sequence similarity information for the functional classification of prokaryotic genes. The application of our method to 93 proteobacterial genomes has shown that (i) the genomic neighborhoods are much more conserved across prokaryotic genomes than expected by chance, and such conservation can be utilized to improve functional classification of genes; (ii) while our method is consistent with the existing popular schemes as much as they are among themselves, it does provide functional classification at higher resolution and hence allows functional assignments of (new) genes at a more specific level; and (iii) our method is fairly stable when being applied to different genomes.


Assuntos
Algoritmos , Classificação/métodos , Biologia Computacional/métodos , Genes Bacterianos/fisiologia , Genômica/métodos , Células Procarióticas/fisiologia , Análise por Conglomerados , Simulação por Computador , Genes Bacterianos/genética , Sensibilidade e Especificidade
19.
BMC Genomics ; 9: 36, 2008 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-18218090

RESUMO

BACKGROUND: Mobile genetic elements (MGEs) play an essential role in genome rearrangement and evolution, and are widely used as an important genetic tool. RESULTS: In this article, we present genetic maps of recently active Insertion Sequence (IS) elements, the simplest form of MGEs, for all sequenced cyanobacteria and archaea, predicted based on the previously identified ~1,500 IS elements. Our predicted IS maps are consistent with the NCBI annotations of the IS elements. By linking the predicted IS elements to various characteristics of the organisms under study and the organism's living conditions, we found that (a) the activities of IS elements heavily depend on the environments where the host organisms live; (b) the number of recently active IS elements in a genome tends to increase with the genome size; (c) the flanking regions of the recently active IS elements are significantly enriched with genes encoding DNA binding factors, transporters and enzymes; and (d) IS movements show no tendency to disrupt operonic structures. CONCLUSION: This is the first genome-scale maps of IS elements with detailed structural information on the sequence level. These genetic maps of recently active IS elements and the several interesting observations would help to improve our understanding of how IS elements proliferate and how they are involved in the evolution of the host genomes.


Assuntos
Archaea/genética , Archaea/metabolismo , Cianobactérias/genética , Cianobactérias/metabolismo , Elementos de DNA Transponíveis , Mutagênese Insercional , Sequência de Bases , Mapeamento Cromossômico , Cromossomos Bacterianos , Genoma Arqueal , Genoma Bacteriano , Modelos Genéticos , Conformação de Ácido Nucleico , Fases de Leitura Aberta , Filogenia , Sequências Repetitivas de Ácido Nucleico , Moldes Genéticos , Sequências Repetidas Terminais
20.
Artigo em Inglês | MEDLINE | ID: mdl-17951836

RESUMO

As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletions, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as Insertion/Deletion (Indel) Frequency Arrays (IFA). By applying IFA to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity.


Assuntos
Algoritmos , Proteínas/química , Proteínas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Análise de Sequência/métodos , Sequência de Aminoácidos , Deleção de Genes , Mutação INDEL , Dados de Sequência Molecular , Relação Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...