Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Cell Death Differ ; 13(11): 1900-14, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16514418

RESUMO

Colony-stimulating factor-1 (CSF-1) is essential for macrophage growth, differentiation and survival. Myeloid cells expressing a CSF-1 receptor mutant (DeltaKI) show markedly impaired CSF-1-mediated proliferation and survival, accompanied by absent signal transducers and activators of transcription 3 (Stat3) phosphorylation and reduced PI3-kinase/Akt activity. Restoring phosphatidylinositol 3-kinase (PI3-kinase) but not Stat3 signals reverses the mitogenic defect. CSF-1-induced proliferation and survival are sensitive to glycolytic inhibitors, 2-deoxyglucose and 3-bromopyruvate. Consistent with a critical role for PI3-kinase-regulated glycolysis, DeltaKI cells reconstituted with active PI3-kinase or Akt are hypersensitive to these inhibitors. CSF-1 upregulates hexokinase II (HKII) expression through PI3-kinase, and PI3-kinase transcriptionally activates the HKII promoter. Moreover, HKII overexpression partially restores mitogenicity. In contrast, Bcl-x(L) expression does not enhance long-term proliferation, although short-term cell death is suppressed in a glycolysis-independent manner. This study identifies robust PI3-kinase activation as essential for optimal CSF-1-mediated mitogenesis in myeloid cells, in part through regulation of HKII and support of glycolysis.


Assuntos
Proliferação de Células/efeitos dos fármacos , Fator Estimulador de Colônias de Macrófagos/farmacologia , Células Mieloides/citologia , Células Mieloides/efeitos dos fármacos , Fosfatidilinositol 3-Quinases/metabolismo , Animais , Apoptose/efeitos dos fármacos , Caspases/metabolismo , Sobrevivência Celular/efeitos dos fármacos , Estabilidade Enzimática/efeitos dos fármacos , MAP Quinases Reguladas por Sinal Extracelular/metabolismo , Glicólise/efeitos dos fármacos , Hexoquinase/metabolismo , Humanos , Camundongos , Proteínas Mutantes/metabolismo , Proteínas Proto-Oncogênicas c-akt/metabolismo , Receptor de Fator Estimulador de Colônias de Macrófagos/metabolismo , Fator de Transcrição STAT3/metabolismo , Transdução de Sinais/efeitos dos fármacos , Proteína bcl-X/metabolismo
2.
Bioinformatics ; 17 Suppl 1: S262-9, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11473017

RESUMO

UNLABELLED: Physical map assembly is the inference of genome structure from experimental data. Map assembly depends on the integration of diverse data including sequence tagged site (STS) marker content, clone sizing, and restriction digest fingerprints (RDF). As experimentally measured data, these are uncertain and error prone. Physical map assembly from error free data is straightforward and can be accomplished in linear time in the number of clones, but the assembly of an optimal map from error prone data is an NP-hard problem. We present an alternative approach to physical map assembly that is based on a probabilistic view of the data and seeks to identify those features of the map that can be reliably inferred from the available data. With this approach, we achieve a number of goals. These include the use of multiple data sources, appropriate representation of uncertainties in the underlying data, the use of clone length information in fingerprint map assembly, and the use of higher order information in map assembly. By higher order information, we mean relationships that are not expressible in terms of neighbouring clone relationships. These include triplet and multiple clone overlaps, the uniqueness of STS position, and fingerprint marker locations. In a probabilistic view of physical mapping, we assert that all of the many possible map assemblies are equally likely a priori. Given experimental data, we can only state which assemblies are more likely than others given the experimental observations. Parameters of interest are then derived as likelihood weighted averages over map assemblies. Ideally these averages should be sums or integrals over all possible map assemblies, but computationally this is not feasible for real-world map assembly problems. Instead, sampling is used to asymptotically approach the desired parameters. Software implementing our probabilistic approach to mapping has been written. Assembly of mixed RDF and STS maps containing up to 60 clones can be accomplished on a desktop PC with run times under an hour. AVAILABILITY: http://stl.wustl.edu/software/gibbsmap/.


Assuntos
Clonagem Molecular , Modelos Estatísticos , Mapeamento Físico do Cromossomo/estatística & dados numéricos , Biologia Computacional , Técnicas Genéticas/estatística & dados numéricos , Mapeamento por Restrição/estatística & dados numéricos , Sitios de Sequências Rotuladas
3.
Bioinformatics ; 17(7): 622-33, 2001 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-11448880

RESUMO

MOTIVATION: Current methods for identifying sequence specific binding sites in DNA sequence using position specific weight matrices are limited in both sensitivity and specificity. Double strand DNA helix exhibits sequence dependent variations in conformation. Interactions between macromolecules result from complementarity of the two tertiary structures. We hypothesize that this conformational variation plays a role in transcription factor binding site recognition, and that the use of this structure information will improve the predictive power of transcription factor binding site models. RESULTS: Conformation models for the sequence dependence of DNA helix distortion have been developed. Using our conformational models, we defined a tertiary structure template for the met operon repressor MetJ binding site. Both naturally occurring sites and precursor binding sites identified through in vitro selection were used as the basis for template definition. The conformational model appears to recognize features of protein binding sites that are distinct from the features recognized by primary sequence based profiles. Combining the conformational model and primary sequence profile yields a hybrid model with improved discriminatory power compared with either the conformational model or sequence profile alone. Using our hybrid model, we searched the E.coli genome. We are able to identify the documented MetJ sites in the promoter regions of metA, metB, metC, metR and metF. In addition, we find several novel loci with characteristics suggesting that they are functional MetJ repressor binding sites. Novel MetJ binding sites are found upstream of the metK gene, as well as upstream of a gene, abc, a gene that encodes for a component of a multifunction transporter which may transport amino acids across the membrane. The false positive rate is significantly lower than the sequence profile method. AVAILABILITY: The programs of implementation of this algorithm are available upon request. The list of crystal structures used for compiling the mean base step parameters of DNA is available by anonymous ftp at http://stateslab.wustl.edu/pub/helix/StructureList.


Assuntos
Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , DNA Bacteriano/química , DNA Bacteriano/metabolismo , Proteínas de Escherichia coli , Escherichia coli/metabolismo , Proteínas Repressoras/química , Proteínas Repressoras/metabolismo , Sequência de Bases , Sítios de Ligação , Biologia Computacional , Cristalografia , DNA Bacteriano/genética , Escherichia coli/genética , Genoma Bacteriano , Modelos Moleculares , Conformação de Ácido Nucleico , Conformação Proteica , Estrutura Terciária de Proteína
4.
Genome Res ; 11(5): 889-900, 2001 May.
Artigo em Inglês | MEDLINE | ID: mdl-11337482

RESUMO

With the availability of a nearly complete sequence of the human genome, aligning expressed sequence tags (EST) to the genomic sequence has become a practical and powerful strategy for gene prediction. Elucidating gene structure is a complex problem requiring the identification of splice junctions, gene boundaries, and alternative splicing variants. We have developed a software tool, Transcript Assembly Program (TAP), to delineate gene structures using genomically aligned EST sequences. TAP assembles the joint gene structure of the entire genomic region from individual splice junction pairs, using a novel algorithm that uses the EST-encoded connectivity and redundancy information to sort out the complex alternative splicing patterns. A method called polyadenylation site scan (PASS) has been developed to detect poly-A sites in the genome. TAP uses these predictions to identify gene boundaries by segmenting the joint gene structure at polyadenylated terminal exons. Reconstructing 1007 known transcripts, TAP scored a sensitivity (Sn) of 60% and a specificity (Sp) of 92% at the exon level. The gene boundary identification process was found to be accurate 78% of the time. also reports alternative splicing patterns in EST alignments. An analysis of alternative splicing in 1124 genic regions suggested that more than half of human genes undergo alternative splicing. Surprisingly, we saw an absolute majority of the detected alternative splicing events affect the coding region. Furthermore, the evolutionary conservation of alternative splicing between human and mouse was analyzed using an EST-based approach. (See http://stl.wustl.edu/~zkan/TAP/)


Assuntos
Processamento Alternativo/genética , Biologia Computacional/métodos , Etiquetas de Sequências Expressas , Genes/genética , Alinhamento de Sequência/métodos , Biologia Computacional/instrumentação , Genoma Humano , Humanos , RNA Mensageiro/metabolismo , Alinhamento de Sequência/instrumentação , Software , Validação de Programas de Computador , Transcrição Gênica
6.
Mol Cell Biol ; 20(18): 6779-98, 2000 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-10958675

RESUMO

Colony-stimulating factor 1 (CSF-1) supports the proliferation, survival, and differentiation of bone marrow-derived cells of the monocytic lineage. In the myeloid progenitor 32D cell line expressing CSF-1 receptor (CSF-1R), CSF-1 activation of the extracellular signal-regulated kinase (ERK) pathway is both Ras and phosphatidylinositol 3-kinase (PI3-kinase) dependent. PI3-kinase inhibition did not influence events leading to Ras activation. Using the activity of the PI3-kinase effector, Akt, as readout, studies with dominant-negative and oncogenic Ras failed to place PI3-kinase downstream of Ras. Thus, PI3-kinase appears to act in parallel to Ras. PI3-kinase inhibitors enhanced CSF-1-stimulated A-Raf and c-Raf-1 activities, and dominant-negative A-Raf but not dominant-negative c-Raf-1 reduced CSF-1-provoked ERK activation, suggesting that A-Raf mediates a part of the stimulatory signal from Ras to MEK/ERK, acting in parallel to PI3-kinase. Unexpectedly, a CSF-1R lacking the PI3-kinase binding site (DeltaKI) remained capable of activating MEK/ERK in a PI3-kinase-dependent manner. To determine if Src family kinases (SFKs) are involved, we demonstrated that CSF-1 activated Fyn and Lyn in cells expressing wild-type (WT) or DeltaKI receptors. Moreover, CSF-1-induced Akt activity in cells expressing DeltaKI is SFK dependent since Akt activation was prevented by pharmacological or genetic inhibition of SFK activity. The docking protein Gab2 may link SFK to PI3-kinase. CSF-1 induced Gab2 tyrosyl phosphorylation and association with PI3-kinase in cells expressing WT or DeltaKI receptors. However, only in DeltaKI cells are these events prevented by PP1. Thus in myeloid progenitors, CSF-1 can activate the PI3-kinase/Akt pathway by at least two mechanisms, one involving direct receptor binding and one involving SFKs.


Assuntos
Proteínas Adaptadoras de Transdução de Sinal , Fator Estimulador de Colônias de Macrófagos/metabolismo , Proteína Quinase 1 Ativada por Mitógeno/metabolismo , Quinases de Proteína Quinase Ativadas por Mitógeno/metabolismo , Fosfatidilinositol 3-Quinases/metabolismo , Proteínas Serina-Treonina Quinases/metabolismo , Quinases da Família src/metabolismo , Animais , Sítios de Ligação , Linhagem Celular , AMP Cíclico/metabolismo , Ativação Enzimática , Proteína Adaptadora GRB2 , Humanos , Interleucina-3/metabolismo , Interleucina-3/farmacologia , MAP Quinase Quinase 1 , Camundongos , Inibidores de Fosfoinositídeo-3 Quinase , Proteínas/metabolismo , Proteínas Proto-Oncogênicas/metabolismo , Proteínas Proto-Oncogênicas c-akt , Proteínas Proto-Oncogênicas c-raf/metabolismo , Receptor de Fator Estimulador de Colônias de Macrófagos/metabolismo , Transdução de Sinais , Células-Tronco , Proteínas ras/metabolismo
7.
Artigo em Inglês | MEDLINE | ID: mdl-10786286

RESUMO

In the course of our efforts to build extended regions of human genomic sequence by assembling individual BAC sequences, we have encountered several instances where a region of the genome has been sequenced independently using reagents derived from two different individuals. Comparing these sequences allows us to analyze the frequency and distribution of single nucleotide polymorphisms (SNPs) in the human genome. The observed transition/transversion frequencies are consistent with a biological origin for the sequence discrepancies, and this suggests that the data produced by large sequencing centers are accurate enough to be used as the basis for SNP analysis. The observed distribution of single nucleotide polymorphisms in the human genome is not uniform. An apparent duplication in the human genome extending over more than 130 kb between chromosomes 1p34 and 16p13 is reported. Independently derived sequences covering these regions are more than 99.9% identical, indicating that this duplication event must have occurred quite recently. FISH mapping results reported by the relevant laboratories indicate that the human population may be polymorphic for this duplication. We present a population genetic theory for the expected distribution of SNPs and derive an algorithm for probabilistically segmenting genomic sequence into regions that are identical by descent (IBD) between two individuals based on this theory and the observed locations of polymorphisms. Based on these methods and a random mating model for the human population, estimates are made for the mutation rate in the human genome.


Assuntos
Genoma Humano , Polimorfismo de Nucleotídeo Único , Algoritmos , Cromossomos Humanos , Bases de Dados Factuais , Genética Populacional , Humanos , Hibridização in Situ Fluorescente , Modelos Estatísticos , Mutação , Análise de Sequência de DNA/métodos
8.
Artigo em Inglês | MEDLINE | ID: mdl-9783219

RESUMO

DNA sequence analysis depends on the accurate assembly of fragment reads for the determination of a consensus sequence. This report examines the possibility of analyzing multiple, independent restriction digests as a method for testing the fidelity of sequence assembly. A dynamic programming algorithm to determine the maximum likelihood alignment of error prone electrophoretic mobility data to the expected fragment mobilities given the consensus sequence and restriction enzymes is derived and used to assess the likelihood of detecting rearrangements in genomic sequencing projects. The method is shown to reliably detect errors in sequence fragment assembly without the necessity of making reference to an overlying physical map. An html form-based interface is available at http:/(/)www.ibc.wustl.edu/services/validate. html.


Assuntos
Mapeamento por Restrição/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Inteligência Artificial , Sequência de Bases , DNA/genética , Impressões Digitais de DNA , Reprodutibilidade dos Testes , Mapeamento por Restrição/estatística & dados numéricos , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos
9.
Bioinformatics ; 14(1): 40-7, 1998.
Artigo em Inglês | MEDLINE | ID: mdl-9520500

RESUMO

MOTIVATION: Searching a protein sequence database for homologs is a powerful tool for discovering the structure and function of a sequence. Two new methods for searching sequence databases have recently been described: Probabilistic Smith-Waterman (PSW), which is based on Hidden Markov models for a single sequence using a standard scoring matrix, and a new version of BLAST (WU-BLAST2), which uses Sum statistics for gapped alignments. RESULTS: This paper compares and contrasts the effectiveness of these methods with three older methods (Smith-Waterman: SSEARCH, FASTA and BLASTP). The analysis indicates that the new methods are useful, and often offer improved accuracy. These tools are compared using a curated (by Bill Pearson) version of the annotated portion of PIR 39. Three different statistical criteria are utilized: equivalence number, minimum errors and the receiver operating characteristic. For complete-length protein query sequences from large families, PSW's accuracy is superior to that of the other methods, but its accuracy is poor when used with partial-length query sequences. False negatives are twice as common as false positives irrespective of the search methods if a family-specific threshold score that minimizes the total number of errors (i.e. the most favorable threshold score possible) is used. Thus, sensitivity, not selectivity, is the major problem. Among the analyzed methods using default parameters, the best accuracy was obtained from SSEARCH and PSW for complete-length proteins, and the two BLAST programs, plus SSEARCH, for partial-length proteins.


Assuntos
Bases de Dados Factuais , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos , Armazenamento e Recuperação da Informação , Computação Matemática , Proteínas/química
10.
IEEE Trans Biomed Eng ; 45(4): 422-8, 1998 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-9556959

RESUMO

In four-color fluourescence-based automated DNA sequencing, a 4 x 4 filter matrix parameterizes the relationship between the dye-intensity signals of interest and the data collected by an optical imaging system. The filter matrix is important because the estimated DNA sequence is based on the dye intensities that can only be recovered via inversion of the matrix. In this paper, we present a calibration method for the estimation of the columns of this matrix, using data generated through a special experiment in which DNA samples are labeled with only one fluorescent dye at a time. Simulations and applications of the method to real data are provided, with promising results.


Assuntos
Processamento de Imagem Assistida por Computador , Análise de Sequência de DNA/métodos , Algoritmos , Corantes , Simulação por Computador , Modelos Lineares , Modelos Genéticos , Óptica e Fotônica , Distribuição Aleatória , Processamento de Sinais Assistido por Computador
11.
Electrophoresis ; 18(1): 23-5, 1997 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-9059816

RESUMO

In a previous paper (Yin et al., Electrophoresis 1996, 17, 1143-1150), an automated method for matrix determination in four-dye fluorescence-based DNA sequencing was presented. As a continuation of that work, we have developed an alternative method to estimate the matrix from raw sequence data. The method uses an iterative clustering technique to associate each 4 x 1 data vector with one column of the desired filter matrix, using Kullback's I-divergence as a distance measure. The method requires less preprocessing of the data and less computation than the approach described by Yin et al. (Electrophoresis 1996, 17, 1143-1150). An example demonstrating applicability of the proposed method to Applied Biosystems sequencer data is given.


Assuntos
Corantes Fluorescentes , Análise de Sequência de DNA/métodos , Algoritmos , Matemática
12.
Genome Res ; 6(11): 1110-7, 1996 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-8938435

RESUMO

Software to track sample lanes automatically in four-color, fluorescence-based, electrophoretic gel images has been developed for application in large-scale DNA sequencing projects. Lanes and lane boundaries are tracked by analyzing a first difference approximation to the gradient of a vertically integrated and processed "brightness" profile. Initially lanes are located in a region of the gel image selected for good horizontal lane spacing and signal strength. The software uses models of expected lane and interlane spacing and lateral lane behavior to maintain accurate tracking on imperfect gels. In areas where intensity-based tracking is difficult, interpixel column correlation is also used to locate and define lane features. Summary statistics and compressed-in-time images are generated for user evaluation of tracking performance. The software developed has been tested successfully on gel images with degradations including significant horizontal lane motion (curving) and image artifacts, and is now in full-scale use in our sequencing projects.


Assuntos
Eletroforese em Gel de Poliacrilamida/métodos , Análise de Sequência/métodos , Software , Algoritmos , Fluorescência , Estatística como Assunto
13.
Artigo em Inglês | MEDLINE | ID: mdl-8877521

RESUMO

Determining whether two DNA sequences are similar is an essential component of DNA sequence analysis. Dynamic programming is the algorithm of choice if computational time is not the most important consideration. Heuristic search tools, such as BLAST, are computationally more efficient, but they may miss some of the sequence similarities (Altschul et al., 1990). These tools often use common k-tuples (words) between the two sequences to determine anchor points for the alignment, and spend most of their computational time extending the alignment beyond these anchor points. We discuss and provide a DNA sequence similarity search implementation (called SENSEI) that improves upon the performance of BLASTN by almost an order of magnitude for comparable sensitivity. This improvement is a result of using compactly encoded scoring tables for k-tuples, encoding bases with a single bit, filtering the sequence to remove the simple sequence repeats using XNUN, and masking the known species-specific repeats in the query sequence. To reduce memory requirements, especially for large genomic DNA query sequences, we recommend generating the neighborhood words from the target sequence at run-time, instead of generating them by preprocessing the query sequence.


Assuntos
Análise de Sequência de DNA/métodos , Homologia de Sequência do Ácido Nucleico , Sequência de Bases , Bases de Dados Factuais , Biblioteca Gênica , Glucosefosfato Desidrogenase/genética , Humanos , Dados de Sequência Molecular , Sequências Repetitivas de Ácido Nucleico , Software
14.
J Comput Biol ; 3(1): 1-17, 1996.
Artigo em Inglês | MEDLINE | ID: mdl-8697232

RESUMO

There is an inherent relationship between the process of pairwise sequence alignment and the estimation of evolutionary distance. This relationship is explored and made explicit. Assuming an evolutionary model and given a specific pattern of observed base mismatches, the relative probabilities of evolution at each evolutionary distance are computed using a Bayesian framework. The mean or the median of this probability distribution provides a robust estimate of the central value. The evolutionary distance has traditionally been computed as zero for an observed homology of 20 bases with no mismatches; we prove that it is highly probable that the distance is greater than 0.01. The mean of the distribution is 0.047, which is a better estimate of the evolutionary distance. Bayesian estimates of the evolutionary distance incorporate arbitrary prior information about variable mutation rates both over time and along sequence position, thus requiring only a weak form of the molecular-clock hypothesis. The endpoints of the similarity between genomic DNA sequences are often ambiguous. The probability of evolution at each evolutionary distance can be estimated over the entire set of alignments by choosing the best alignment at each distance and the corresponding probability of duplication at that evolutionary distance. A central value of this distribution provides a robust evolutionary distance estimate. We provide an efficient algorithm for computing the parametric alignment, considering evolutionary distance as the only parameter. These techniques and estimates are used to infer the duplication history of the genomic sequence in C. elegans and in S. cerevisiae. Our results indicate that repeats discovered using a single scoring matrix show a considerable bias in subsequent evolutionary distance estimates.


Assuntos
Teorema de Bayes , Evolução Biológica , Alinhamento de Sequência , Animais , Sequência de Bases , Caenorhabditis elegans/genética , Simulação por Computador , Análise Mutacional de DNA , Dados de Sequência Molecular , Probabilidade , Saccharomyces cerevisiae/genética , Homologia de Sequência do Ácido Nucleico
15.
Artigo em Inglês | MEDLINE | ID: mdl-7584377

RESUMO

Over 3.6 million bases of DNA sequence from chromosome III of the C. elegans have been determined. The availability of this extended region of contiguous sequence has allowed us to analyze the nature and prevalence of repetitive sequences in the genome of a eukaryotic organism with a high gene density. We have assembled a Repeat Pattern Toolkit (RPT) to analyze the patterns of repeats occurring in DNA. The tools include identifying significant local alignments (utilizing both two-way and three-way alignments), dividing the set of alignments into connected components (signifying repeat families), computing evolutionary distance between repeat family members, constructing minimum spanning trees from the connected components, and visualizing the evolution of the repeat families. Over 7000 families of repetitive sequences were identified. The size of the families ranged from isolated pairs to over 1600 segments of similar sequence. Approximately 12.3% of the analyzed sequence participates in a repeat element.


Assuntos
Caenorhabditis elegans/genética , Sequências Repetitivas de Ácido Nucleico/genética , Animais , Evolução Biológica , Genoma , Modelos Teóricos , Análise de Sequência
16.
J Comput Biol ; 1(1): 39-50, 1994.
Artigo em Inglês | MEDLINE | ID: mdl-8790452

RESUMO

A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity. A rationale for combining these diverse information sources was derived, and analyses of the information available from codon utilization in several species were performed, with wide variation seen. Codon bias information was found on average to improve the sensitivity of detection of short coding regions of human origin by about a factor of 5. The implications of combining information sources on the interpretation of positive findings are discussed.


Assuntos
Códon , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Aminoácidos , Animais , Bacillus subtilis , Sequência de Bases , Bases de Dados Factuais , Drosophila melanogaster , Escherichia coli , Humanos , Dados de Sequência Molecular , Saccharomyces cerevisiae , Schizosaccharomyces , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido Nucleico
17.
Nat Genet ; 3(3): 266-72, 1993 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-8485583

RESUMO

Sequence similarity between a translated nucleotide sequence and a known biological protein can provide strong evidence for the presence of a homologous coding region, even between distantly related genes. The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step. We characterized the sensitivity of BLASTX recognition to the presence of substitution, insertion and deletion errors in the query sequence and to sequence divergence. Reading frames were reliably identified in the presence of 1% query errors, a rate that is typical for primary sequence data. BLASTX is appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.


Assuntos
Bases de Dados Factuais , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Animais , Dados de Sequência Molecular , Mutação , Probabilidade , Ratos , Proteínas Ribossômicas/genética , Homologia de Sequência de Aminoácidos , Software
18.
Artigo em Inglês | MEDLINE | ID: mdl-7584362

RESUMO

Molecular sequence megaclassification is a technique for automated protein sequence analysis and annotation. Implementation of the method has been limited by the need to store and randomly access a database of all the sequence pair similarities. More than 80,000 protein sequences are now present in the public databases, and the pair similarity data table for the full protein sequence database requires over 1 gigabyte of storage. In this paper we present a computationally efficient representation of groups based on a graph theory approach where sequence clusters are described by a minimal spanning tree of highest scoring similarity pairs. This representation allows a classification of N proteins to be stored in order(N) memory. The use of this minimal spanning tree representation simplifies analysis of groups, the description of group characteristics and the manual correction of artifacts resulting from false hits. The new tree representation also introduces new possibilities for artifact generation in sequence classification. Methods for detecting and removing these artifacts are discussed.


Assuntos
Algoritmos , Análise por Conglomerados , Proteínas de Escherichia coli , Computação Matemática , Proteínas/classificação , Análise de Sequência/métodos , Reprodutibilidade dos Testes , Proteínas Ribossômicas/classificação , Design de Software
20.
Trends Genet ; 8(2): 52-5, 1992 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-1566371

RESUMO

Molecular sequences are experimentally derived data that can be expected to contain errors as a result of diverse phenomena such as biological variation, molecular cloning artifacts, imperfect sequence determination, and data handling during contig assembly. Errors will affect the reliability of database searches and sequence alignments, but their impact may be minimized by the use of analytical techniques that anticipate that the data will be imperfect.


Assuntos
Interpretação Estatística de Dados , Dados de Sequência Molecular , Sequência de Aminoácidos , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...