Pesquisa | Portal Regional da BVS

GeneGenerator--a flexible algorithm for gene prediction and its application to maize sequences.

Kleffe, J; Hermann, K; Vahrson, W; Wittig, B; Brendel, V.

Bioinformatics ; 14(3): 232-43, 1998.

Artigo em Inglês | MEDLINE | ID: mdl-9614266

RESUMO

MOTIVATION: We developed GeneGenerator because of the need for a tool to predict gene structure without knowing in advance how to score potential exons and introns in order to obtain the best results, pertinent in particular to less well-studied organisms for which suitable training sets are small. GeneGenerator is a very flexible algorithm which for a given genomic sequence generates a number of feasible gene structures satisfying user-defined constraints. The specific implementation described in detail requires minimum scoring for translation start and donor and acceptor splice sites according to previously trained logitlinear models. In addition, potential exons and introns are required to exceed specified minimal lengths and threshold scores for coding or non-coding potential derived as log-likelihood ratios of appropriate Markov sequence models. RESULTS: A database of 46 non-redundant genomic sequences from maize is used for illustration. It is shown that the correct gene structures do not always maximize the considered target function. However, in most cases, the correct or nearly correct structures are found in a small set of high-scoring structures. A critical review of the generated structures sometimes allows the choices to be narrowed by considering additional variables such as predicted splice site strength or local optimality of splice site scores. Summary statistics for prediction accuracy over all 46 maize genes are derived under cross-validation and non-cross-validation training conditions for the Markov sequence models. The algorithm achieved exon sensitivity of 0.81 and specificity of 0.75 on an independent set of 14 novel maize genomic segments. AVAILABILITY: GeneGenerator runs under Borland-Pascal 7.0 using MS-DOS and C on UNIX work stations. The source code is available upon request. CONTACT: jkleffe@euler.grumed.fu-berlin-de

Assuntos

Algoritmos , Genes de Plantas/genética , Análise de Sequência de DNA/métodos , Software , Zea mays/genética , Biologia Computacional/métodos , Proteínas de Ligação a DNA/genética , Éxons , Glucosiltransferases/genética , Íntrons , Zíper de Leucina , Modelos Logísticos , Cadeias de Markov , Modelos Genéticos , Proteínas de Plantas , Validação de Programas de Computador , Fatores de Transcrição/genética

Logitlinear models for the prediction of splice sites in plant pre-mRNA sequences.

Kleffe, J; Hermann, K; Vahrson, W; Wittig, B; Brendel, V.

Nucleic Acids Res ; 24(23): 4709-18, 1996 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-8972857

RESUMO

Pre-mRNA splicing in plants, while generally similar to the processes in vertebrates and yeast, is thought to involve plant specific cis-acting elements. Both monocot and dicot introns are typically strongly enriched in U nucleotides, and AU- or U-rich segments are thought to be involved in intron recognition, splice site selection, and splicing efficiency. We have applied logitlinear models to find optimal combinations of splice site variables for the purpose of separating true splice sites from a large excess of potential sites. It is shown that plant splice site prediction from sequence inspection is greatly improved when compositional contrast between exons and introns is considered in addition to degree of matching to the splice site consensus (signal quality). The best model involves subclassification of splice sites according to the identity of the base immediately upstream of the GU and AG signals and gives substantial performance gains compared with conventional profile methods.

Assuntos

Modelos Lineares , Precursores de RNA/química , Splicing de RNA , RNA Mensageiro/química , RNA de Plantas/química , Algoritmos , Arabidopsis/genética , Éxons , Íntrons , Dados de Sequência Molecular , Zea mays/genética

Object-oriented sequence analysis: SCL--a C++ class library.

Vahrson, W; Hermann, K; Kleffe, J; Wittig, B.

Comput Appl Biosci ; 12(2): 119-27, 1996 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-8744774

RESUMO

SCL (Sequence Class Library) is a class library written in the C++ programming language. Designed using object-oriented programming principles, SCL consists of classes of objects performing tasks typically needed for analyzing DNA or protein sequences. Among them are very flexible sequence classes, classes accessing databases in various formats, classes managing collections of sequences, as well as classes performing higher-level tasks like calculating a pairwise sequence alignment. SCL also includes classes that provide general programming support, like a dynamically growing array, sets, matrices, strings, classes performing file input/output, and utilities for error handling. By providing these components, SCL fosters an explorative programming style: experimenting with algorithms and alternative implementations is encouraged rather than punished. A description of SCL's overall structure as well as an overview of its classes is given. Important aspects of the work with SCL are discussed in the context of a sample program.

Assuntos

Linguagens de Programação , Análise de Sequência/métodos , Algoritmos , Bases de Dados Factuais , Estudos de Avaliação como Assunto , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência/estatística & dados numéricos

DNASTAT: a Pascal unit for the statistical analysis of DNA and protein sequences.

Kleffe, J; Hermann, K; Gunia, W; Vahrson, W; Wittig, B.

Comput Appl Biosci ; 11(4): 449-55, 1995 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-8521055

RESUMO

DNASTAT is a collection of Pascal routines for researchers who develop their own application programs for statistical analysis of DNA and protein sequences. Dynamic and file-based data structures allow users to process sets of sequences by simple loop control without limitations on the number of sequences and their individual sizes. This frees the programmer from potentially error-prone tasks like dynamic memory allocation and controlling array sizes. Sequences can be stored in databases along with biological and statistical attributes. Individual sequences can be accessed by column name and row number as with spread-sheets. DNASTAT allows large sets of sequences to be processed using a PC with standard configuration. Its small size, simplicity and free availability make it attractive to students of mathematical biology. Use of DNASTAT is illustrated by two sample programs that generate a database of coding regions from the GenBank entry of the tobacco chloroplast genome. A version of DNASTAT written in ANSI-C for PCs and Unix workstations is also available.

Assuntos

DNA/genética , Linguagens de Programação , Proteínas/genética , Análise de Sequência de DNA/métodos , Análise de Sequência/métodos , Sequência de Bases , Interpretação Estatística de Dados , Bases de Dados Factuais , Genoma de Planta , Dados de Sequência Molecular , Plantas Tóxicas , Software , Nicotiana/genética

Transcription of human c-myc in permeabilized nuclei is associated with formation of Z-DNA in three discrete regions of the gene.

Wittig, B; Wölfl, S; Dorbic, T; Vahrson, W; Rich, A.

EMBO J ; 11(12): 4653-63, 1992 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-1330542

RESUMO

When human U937 cells are placed in agarose microbeads and treated with a detergent, the cytoplasmic membrane is lysed and the nuclear membrane is permeabilized. However, the nuclei remain intact and maintain both replication and transcription. Biotin labeled monoclonal antibodies against Z-DNA have been diffused into this system and used to measure the amount of Z-DNA present in the nuclei. It has previously been shown that the amount of Z-DNA present decreases due to relaxation by topoisomerase I and increases as the level of transcription increases. Here we measure the formation of Z-DNA in the c-myc gene by crosslinking the antibodies to DNA using laser radiation at 266 nm for 10 ns. The crosslinked DNA is isolated by restriction digestion, separation of antibody labeled fractions through the biotin residue, and subsequent proteolysis to remove the crosslinked antibody. Three AluI restriction fragments of the c-myc gene are shown to form Z-DNA when the cell is transcribing c-myc. The Z-DNA forming segments are near the promoter regions of the gene. However, when U937 cells start to differentiate and transcription of the c-myc gene is down-regulated, the Z-DNA content goes to undetectable levels within 30-60 min.

Assuntos

DNA/metabolismo , Proteínas Proto-Oncogênicas c-myc/genética , Transcrição Gênica , Sequência de Bases , Camptotecina/farmacologia , Linhagem Celular , Núcleo Celular/metabolismo , Reagentes de Ligações Cruzadas , DNA/química , Eletroforese em Gel de Ágar , Humanos , Dados de Sequência Molecular , Hibridização de Ácido Nucleico , Permeabilidade , Reação em Cadeia da Polimerase , Regiões Promotoras Genéticas , Inibidores da Topoisomerase I

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA