Pesquisa | Portal Regional da BVS (teste)

Comparison and benchmark of structural variants detected from long read and long-read assembly.

Lin, Jiadong; Jia, Peng; Wang, Songbo; Kosters, Walter; Ye, Kai.

Brief Bioinform ; 24(4)2023 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-37200087

RESUMO

Structural variant (SV) detection is essential for genomic studies, and long-read sequencing technologies have advanced our capacity to detect SVs directly from read or de novo assembly, also known as read-based and assembly-based strategy. However, to date, no independent studies have compared and benchmarked the two strategies. Here, on the basis of SVs detected by 20 read-based and eight assembly-based detection pipelines from six datasets of HG002 genome, we investigated the factors that influence the two strategies and assessed their performance with well-curated SVs. We found that up to 80% of the SVs could be detected by both strategies among different long-read datasets, whereas variant type, size, and breakpoint detected by read-based strategy were greatly affected by aligners. For the high-confident insertions and deletions at non-tandem repeat regions, a remarkable subset of them (82% in assembly-based calls and 93% in read-based calls), accounting for around 4000 SVs, could be captured by both reads and assemblies. However, discordance between two strategies was largely caused by complex SVs and inversions, which resulted from inconsistent alignment of reads and assemblies at these loci. Finally, benchmarking with SVs at medically relevant genes, the recall of read-based strategy reached 77% on 5X coverage data, whereas assembly-based strategy required 20X coverage data to achieve similar performance. Therefore, integrating SVs from read and assembly is suggested for general-purpose detection because of inconsistently detected complex SVs and inversions, whereas assembly-based strategy is optional for applications with limited resources.

Assuntos

Benchmarking , Genoma Humano , Humanos , Análise de Sequência , Genômica/métodos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos

A Boolean algebra for genetic variants.

Vis, Jonathan K; Santcroos, Mark A; Kosters, Walter A; Laros, Jeroen F J.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36594541

RESUMO

MOTIVATION: Beyond identifying genetic variants, we introduce a set of Boolean relations, which allows for a comprehensive classification of the relations of every pair of variants by taking all minimal alignments into account. We present an efficient algorithm to compute these relations, including a novel way of efficiently computing all minimal alignments within the best theoretical complexity bounds. RESULTS: We show that these relations are common, and many non-trivial, for variants of the CFTR gene in dbSNP. Ultimately, we present an approach for the storing and indexing of variants in the context of a database that enables efficient querying for all these relations. AVAILABILITY AND IMPLEMENTATION: A Python implementation is available at https://github.com/mutalyzer/algebra/tree/v0.2.0 as well as an interface at https://mutalyzer.nl/algebra.

Assuntos

Algoritmos , Gerenciamento de Dados , Bases de Dados Factuais , Software

SVision: a deep learning approach to resolve complex structural variants.

Lin, Jiadong; Wang, Songbo; Audano, Peter A; Meng, Deyu; Flores, Jacob I; Kosters, Walter; Yang, Xiaofei; Jia, Peng; Marschall, Tobias; Beck, Christine R; Ye, Kai.

Nat Methods ; 19(10): 1230-1233, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-36109679

RESUMO

Complex structural variants (CSVs) encompass multiple breakpoints and are often missed or misinterpreted. We developed SVision, a deep-learning-based multi-object-recognition framework, to automatically detect and haracterize CSVs from long-read sequencing data. SVision outperforms current callers at identifying the internal structure of complex events and has revealed 80 high-quality CSVs with 25 distinct structures from an individual genome. SVision directly detects CSVs without matching known structures, allowing sensitive detection of both common and previously uncharacterized complex rearrangements.

Assuntos

Aprendizado Profundo , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA

Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants.

Lin, Jiadong; Yang, Xiaofei; Kosters, Walter; Xu, Tun; Jia, Yanyan; Wang, Songbo; Zhu, Qihui; Ryan, Mallory; Guo, Li; Zhang, Chengsheng; Lee, Charles; Devine, Scott E; Eichler, Evan E; Ye, Kai.

Genomics Proteomics Bioinformatics ; 20(1): 205-218, 2022 02.

Artigo em Inglês | MEDLINE | ID: mdl-34224879

RESUMO

Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.

Assuntos

Algoritmos , Genômica , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Mutação , Análise de Sequência de DNA

Multiplex network motifs as building blocks of corporate networks.

Takes, Frank W; Kosters, Walter A; Witte, Boyd; Heemskerk, Eelke M.

Appl Netw Sci ; 3(1): 39, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30839798

RESUMO

In corporate networks, firms are connected through links of corporate ownership and shared directors, connecting the control over major economic actors in our economies in meaningful and consequential ways. Most research thus far focused on the connectedness of firms as a result of one particular link type, analyzing node-specific metrics or global network-based methods to gain insights in the modelled corporate system. In this paper, we aim to understand multiplex corporate networks with multiple types of connections, specifically investigating the network's essential building blocks: multiplex network motifs. Motifs, which are small subgraph patterns occurring at significantly higher frequencies than in similar random networks, have demonstrated their usefulness in understanding the structure of many types of real-world networks. However, detecting motifs in multiplex networks is nontrivial for two reasons. First of all, there are no out-of-the-box subgraph enumeration algorithms for multiplex networks. Second, existing null models to test network motif significance, are unable to incorporate the interlayer dependencies in the multiplex network. We solve these two issues by introducing a layer encoding algorithm that incorporates the multiplex aspect in the subgraph enumeration phase. In addition, we propose a null model that is able to preserve the interlayer connectedness, while taking into account that one of the link types is actually the result of a projection of an underlying bipartite network. The experimental section considers the corporate network of Germany, in which tens of thousands of firms are connected through several hundred thousand links. We demonstrate how incorporating the multiplex aspect in motif detection is able to reveal new insights that could not be obtained by studying only one type of relationship. In a general sense, the motifs reflect known corporate governance practices related to the monitoring of investments and the concentration of ownership. A substantial fraction of the discovered motifs is typical for an industrialized country such as Germany, whereas others seem specific for certain economic sectors. Interestingly, we find that motifs involving financial firms are over-represented amongst the larger and more complex motifs. This demonstrates the prominent role of the financial sector in Germany's largely industry-oriented corporate network.

An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences.

Ye, Kai; Kosters, Walter A; Ijzerman, Adriaan P.

Bioinformatics ; 23(6): 687-93, 2007 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-17237070

RESUMO

MOTIVATION: Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. RESULTS: In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. AVAILABILITY: The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.

Assuntos

Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Sequência de Aminoácidos , Sistemas de Gerenciamento de Base de Dados , Dados de Sequência Molecular

A caged lanthanide complex as a paramagnetic shift agent for protein NMR.

Prudêncio, Miguel; Rohovec, Jan; Peters, Joop A; Tocheva, Elitza; Boulanger, Martin J; Murphy, Michael E P; Hupkes, Hermen-Jan; Kosters, Walter; Impagliazzo, Antonietta; Ubbink, Marcellus.

Chemistry ; 10(13): 3252-60, 2004 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-15224334

RESUMO

A lanthanide complex, named CLaNP (caged lanthanide NMR probe) has been developed for the characterisation of proteins by paramagnetic NMR spectroscopy. The probe consists of a lanthanide chelated by a derivative of DTPA (diethylenetriaminepentaacetic acid) with two thiol reactive functional groups. The CLaNP molecule is attached to a protein by two engineered, surface-exposed, Cys residues in a bidentate manner. This drastically limits the dynamics of the metal relative to the protein and enables measurements of pseudocontact shifts. NMR spectroscopy experiments on a diamagnetic control and the crystal structure of the probe-protein complex demonstrate that the protein structure is not affected by probe attachment. The probe is able to induce pseudocontact shifts to at least 40 A from the metal and causes residual dipolar couplings due to alignment at a high magnetic field. The molecule exists in several isomeric forms with different paramagnetic tensors; this provides a fast way to obtain long-range distance restraints.

Assuntos

Azurina/análogos & derivados , Elementos da Série dos Lantanídeos/química , Ressonância Magnética Nuclear Biomolecular/métodos , Ácido Pentético/química , Azurina/química , Cristalografia por Raios X , Elementos da Série dos Lantanídeos/síntese química , Espectrometria de Massas , Modelos Moleculares

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA