Pesquisa | Portal Regional da BVS

Revisiting mutagenesis at non-B DNA motifs in the human genome.

McGinty, R J; Sunyaev, S R.

Nat Struct Mol Biol ; 30(4): 417-424, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36914796

RESUMO

Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.

Assuntos

DNA , Genoma Humano , Humanos , Motivos de Nucleotídeos/genética , Mutagênese , DNA/genética , DNA/química , Nucleotídeos

Guidelines for investigating causality of sequence variants in human disease.

MacArthur, D G; Manolio, T A; Dimmock, D P; Rehm, H L; Shendure, J; Abecasis, G R; Adams, D R; Altman, R B; Antonarakis, S E; Ashley, E A; Barrett, J C; Biesecker, L G; Conrad, D F; Cooper, G M; Cox, N J; Daly, M J; Gerstein, M B; Goldstein, D B; Hirschhorn, J N; Leal, S M; Pennacchio, L A; Stamatoyannopoulos, J A; Sunyaev, S R; Valle, D; Voight, B F; Winckler, W; Gunter, C.

Nature ; 508(7497): 469-76, 2014 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-24759409

RESUMO

The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.

Assuntos

Doença , Predisposição Genética para Doença/genética , Variação Genética/genética , Guias como Assunto , Reações Falso-Positivas , Genes/genética , Humanos , Disseminação de Informação , Editoração , Reprodutibilidade dos Testes , Projetos de Pesquisa , Pesquisa Translacional Biomédica/normas

SNP frequencies in human genes an excess of rare alleles and differing modes of selection.

Sunyaev, S R; Lathe, W C; Ramensky, V E; Bork, P.

Trends Genet ; 16(8): 335-7, 2000 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-10904261

Assuntos

Alelos , Frequência do Gene , Polimorfismo de Nucleotídeo Único/genética , Seleção Genética , Humanos , Modelos Genéticos

PSIC: profile extraction from sequence alignments with position-specific counts of independent observations.

Sunyaev, S R; Eisenhaber, F; Rodchenkov, I V; Eisenhaber, B; Tumanyan, V G; Kuznetsov, E N.

Protein Eng ; 12(5): 387-94, 1999 May.

Artigo em Inglês | MEDLINE | ID: mdl-10360979

RESUMO

Sequence weighting techniques are aimed at balancing redundant observed information from subsets of similar sequences in multiple alignments. Traditional approaches apply the same weight to all positions of a given sequence, hence equal efficiency of phylogenetic changes is assumed along the whole sequence. This restrictive assumption is not required for the new method PSIC (position-specific independent counts) described in this paper. The number of independent observations (counts) of an amino acid type at a given alignment position is calculated from the overall similarity of the sequences that share the amino acid type at this position with the help of statistical concepts. This approach allows the fast computation of position-specific sequence weights even for alignments containing hundreds of sequences. The PSIC approach has been applied to profile extraction and to the fold family assignment of protein sequences with known structures. Our method was shown to be very productive in finding distantly related sequences and more powerful than Hidden Markov Models or the profile methods in WiseTools and PSI-BLAST in many cases. The profile extraction routine is available on the WWW (http://www.bork.embl-heidelberg. de/PSIC or http://www.imb.ac.ru/PSIC).

Assuntos

Proteínas/química , Alinhamento de Sequência/estatística & dados numéricos , Algoritmos , Sequência de Aminoácidos , Aminoácidos/química , Sequência Conservada , Bases de Dados Factuais , Internet , Dados de Sequência Molecular , Dobramento de Proteína

Are knowledge-based potentials derived from protein structure sets discriminative with respect to amino acid types?

Sunyaev, S R; Eisenhaber, F; Argos, P; Kuznetsov, E N; Tumanyan, V G.

Proteins ; 31(3): 225-46, 1998 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-9593195

RESUMO

The parametric description of residue environments through solvent accessibility, backbone conformation, or pairwise residue-residue distances is the key to the comparison between amino acid types at protein sequence positions and residue locations in structural templates (condition of protein sequence-structure match). For the first time, the research results presented in this study clarify and allow to quantify, on a rigorous statistical basis, to what extent the amino acid type-specific distributions of commonly used environment parameters are discriminative with respect to the 20 amino acid types. Relying on the Bahadur theory, we estimate the probability of error in a single-sequence-structure alignment based on weak or absent discriminative power in a learning database of protein structure. We present the results for many residue environment variables and demonstrate that each fold description parameter is sensitive with respect to only a few amino acid types while indifferent to most of the other amino acid types. Even complex structural characteristics combining solvent-accessible surface area, backbone conformation, and pairwise distances distinguish only some amino acid types, whereas the others remain nondiscriminated. We find that the knowledge-based potentials currently in use treat especially Ala, Asp, Gln, His, Ser, Thr, and Tyr as essentially "average" amino acids. Thus, highly discriminative amino acid types define the alignment register in gapless sequence-structure alignments. The introduction of gaps leads to alignment ambiguities at sequence positions occupied by nondiscriminated amino acid types. Therefore, local sequence-structure alignments produced by techniques with gaps cannot be reliable. Conceptionally new and more sensitive environment parameters must be invented.

Assuntos

Aminoácidos/química , Conformação Proteica , Fenômenos Químicos , Físico-Química , Bases de Dados Factuais , Matemática , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Alinhamento de Sequência , Solventes , Moldes Genéticos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA