Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Biomed Inform ; 148: 104552, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37995844

RESUMO

Pangenomics was originally defined as the problem of comparing the composition of genes into gene families within a set of bacterial isolates belonging to the same species. The problem requires the calculation of sequence homology among such genes. When combined with metagenomics, namely for human microbiome composition analysis, gene-oriented pangenome detection becomes a promising method to decipher ecosystem functions and population-level evolution. Established computational tools are able to investigate the genetic content of isolates for which a complete genomic sequence is available. However, there is a plethora of incomplete genomes that are available on public resources, which only a few tools may analyze. Incomplete means that the process for reconstructing their genomic sequence is not complete, and only fragments of their sequence are currently available. However, the information contained in these fragments may play an essential role in the analyses. Here, we present PanDelos-frags, a computational tool which exploits and extends previous results in analyzing complete genomes. It provides a new methodology for inferring missing genetic information and thus for managing incomplete genomes. PanDelos-frags outperforms state-of-the-art approaches in reconstructing gene families in synthetic benchmarks and in a real use case of metagenomics. PanDelos-frags is publicly available at https://github.com/InfOmics/PanDelos-frags.


Assuntos
Genômica , Microbiota , Humanos , Ecossistema , Genoma , Genômica/métodos , Metagenômica/métodos , Software , Microbiota/genética
2.
Sci Rep ; 13(1): 3422, 2023 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-36854792
3.
Gigascience ; 112022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35946989

RESUMO

BACKGROUND: Spatial transcriptomics (ST) combines stained tissue images with spatially resolved high-throughput RNA sequencing. The spatial transcriptomic analysis includes challenging tasks like clustering, where a partition among data points (spots) is defined by means of a similarity measure. Improving clustering results is a key factor as clustering affects subsequent downstream analysis. State-of-the-art approaches group data by taking into account transcriptional similarity and some by exploiting spatial information as well. However, it is not yet clear how much the spatial information combined with transcriptomics improves the clustering result. RESULTS: We propose a new clustering method, Stardust, that easily exploits the combination of space and transcriptomic information in the clustering procedure through a manual or fully automatic tuning of algorithm parameters. Moreover, a parameter-free version of the method is also provided where the spatial contribution depends dynamically on the expression distances distribution in the space. We evaluated the proposed methods results by analyzing ST data sets available on the 10x Genomics website and comparing clustering performances with state-of-the-art approaches by measuring the spots' stability in the clusters and their biological coherence. Stability is defined by the tendency of each point to remain clustered with the same neighbors when perturbations are applied. CONCLUSIONS: Stardust is an easy-to-use methodology allowing to define how much spatial information should influence clustering on different tissues and achieving more stable results than state-of-the-art approaches.


Assuntos
Análise de Dados , Transcriptoma , Algoritmos , Análise por Conglomerados
4.
PLoS One ; 17(6): e0269687, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35679235

RESUMO

The Covid19 pandemic has significantly impacted on our lives, triggering a strong reaction resulting in vaccines, more effective diagnoses and therapies, policies to contain the pandemic outbreak, to name but a few. A significant contribution to their success comes from the computer science and information technology communities, both in support to other disciplines and as the primary driver of solutions for, e.g., diagnostics, social distancing, and contact tracing. In this work, we surveyed the Italian computer science and engineering community initiatives against the Covid19 pandemic. The 128 responses thus collected document the response of such a community during the first pandemic wave in Italy (February-May 2020), through several initiatives carried out by both single researchers and research groups able to promptly react to Covid19, even remotely. The data obtained by the survey are here reported, discussed and further investigated by Natural Language Processing techniques, to generate semantic clusters based on embedding representations of the surveyed activity descriptions. The resulting clusters have been then used to extend an existing Covid19 taxonomy with the classification of related research activities in computer science and information technology areas, summarizing this work contribution through a reproducible survey-to-taxonomy methodology.


Assuntos
COVID-19 , COVID-19/epidemiologia , Análise por Conglomerados , Surtos de Doenças , Humanos , Itália/epidemiologia , Pandemias/prevenção & controle , Distanciamento Físico
5.
Bioinformatics ; 38(9): 2631-2632, 2022 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-35289871

RESUMO

MOTIVATION: Computational tools for pangenomic analysis have gained increasing interest over the past two decades in various applications such as evolutionary studies and vaccine development. Synthetic benchmarks are essential for the systematic evaluation of their performance. Currently, benchmarking tools represent a genome as a set of genetic sequences and fail to simulate the complete information of the genomes, which is essential for evaluating pangenomic detection between fragmented genomes. RESULTS: We present PANPROVA, a benchmark tool to simulate prokaryotic pangenomic evolution by evolving the complete genomic sequence of an ancestral isolate. In this way, the possibility of operating in the preassembly phase is enabled. Gene set variations, sequence variation and horizontal acquisition from a pool of external genomes are the evolutionary features of the tool. AVAILABILITY AND IMPLEMENTATION: PANPROVA is publicly available at https://github.com/InfOmics/PANPROVA. The manuscript explicitelly refers to the github repository. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Software , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Benchmarking
6.
Artif Intell Med ; 122: 102212, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34823837

RESUMO

Computational approaches to detect the signals of adverse drug reactions are powerful tools to monitor the unattended effects that users experience and report, also preventing death and serious injury. They apply statistical indices to affirm the validity of adverse reactions reported by users. The methodologies that scan fixed duration intervals in the lifetime of drugs are among the most used. Here we present a method, called TEDAR, in which ranges of varying length are taken into account. TEDAR has the advantage to detect a greater number of true signals without significantly increasing the number of false positives, which are a major concern for this type of tools. Furthermore, early detection of signals is a key feature of methods to prevent the safety of the population. The results show that TEDAR detects adverse reactions many months earlier than methodologies based on a fixed interval length.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Farmacovigilância , Sistemas de Notificação de Reações Adversas a Medicamentos , Bases de Dados Factuais , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/diagnóstico , Humanos
7.
PLoS Comput Biol ; 17(9): e1009444, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34570769

RESUMO

Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.


Assuntos
Variação Genética , Motivos de Nucleotídeos , Software , Fatores de Transcrição/metabolismo , Sequência de Bases , Sítios de Ligação/genética , Biologia Computacional , Gráficos por Computador , Genoma Humano , Genômica , Haplótipos , Humanos , Ligação Proteica/genética
8.
BMC Bioinformatics ; 22(1): 209, 2021 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-33888059

RESUMO

BACKGROUND: Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions. RESULTS: An existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time. CONCLUSION: The use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures.


Assuntos
Vitis , Indexação e Redação de Resumos , Algoritmos , Bases de Dados Factuais
9.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32893299

RESUMO

Given a group of genomes, represented as the sets of genes that belong to them, the discovery of the pangenomic content is based on the search of genetic homology among the genes for clustering them into families. Thus, pangenomic analyses investigate the membership of the families to the given genomes. This approach is referred to as the gene-oriented approach in contrast to other definitions of the problem that takes into account different genomic features. In the past years, several tools have been developed to discover and analyse pangenomic contents. Because of the hardness of the problem, each tool applies a different strategy for discovering the pangenomic content. This results in a differentiation of the performance of each tool that depends on the composition of the input genomes. This review reports the main analysis instruments provided by the current state of the art tools for the discovery of pangenomic contents. Moreover, unlike previous works, the presented study compares pangenomic tools from a methodological perspective, analysing the causes that lead a given methodology to outperform other tools. The analysis is performed by taking into account different bacterial populations, which are synthetically generated by changing evolutionary parameters. The benchmarks used to compare the pangenomic tools, in addition to the computational pipeline developed for this purpose, are available at https://github.com/InfOmics/pangenes-review. Contact: V. Bonnici, R. Giugno Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genoma Bacteriano/genética , Genoma/genética , Genômica/métodos , Bactérias/classificação , Bactérias/genética , Evolução Biológica , Mycoplasma/classificação , Mycoplasma/genética , Filogenia , Software
10.
Methods Mol Biol ; 1970: 121-167, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30963492

RESUMO

This chapter is devoted to illustrate the usage of state-of-the-art methodologies for miRNA regulatory network construction and analysis. Advantages in understanding the role of miRNAs in regulating gene expression are increasing the possibility of developing targeted therapies and drugs. This new possibility can be exploited by gaining new knowledge through analyzing interactions between a specific miRNA and a targeted gene.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , MicroRNAs/genética , RNA Mensageiro/genética , Software , Regulação da Expressão Gênica , Humanos , MicroRNAs/metabolismo , RNA Mensageiro/metabolismo
11.
Interdiscip Sci ; 11(1): 21-32, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30790228

RESUMO

Many scientific applications entail solving the subgraph isomorphism problem, i.e., given an input pattern graph, find all the subgraphs of a (usually much larger) target graph that are structurally equivalent to that input. Because subgraph isomorphism is NP-complete, methods to solve it have to use heuristics. This work evaluates subgraph isomorphism methods to assess their computational behavior on a wide range of synthetic and real graphs. Surprisingly, our experiments show that, among the leading algorithms, certain heuristics based only on pattern graphs are the most efficient.


Assuntos
Algoritmos , Biologia Computacional/métodos , Heurística Computacional , Humanos , Software
12.
BMC Bioinformatics ; 19(Suppl 15): 437, 2018 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-30497358

RESUMO

BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations. RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm. CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos .


Assuntos
Dicionários como Assunto , Genoma Bacteriano , Software , Bactérias/genética , Bases de Dados Genéticas , Duplicação Gênica , Filogenia , Fatores de Tempo
13.
BMC Bioinformatics ; 19(1): 456, 2018 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30482173

RESUMO

After publication of this supplement article [1], it was brought to our attention that reference 10 and reference 12 in the article are incorrect.

14.
BMC Bioinformatics ; 19(Suppl 10): 356, 2018 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-30367572

RESUMO

BACKGROUND: R has become the de-facto reference analysis environment in Bioinformatics. Plenty of tools are available as packages that extend the R functionality, and many of them target the analysis of biological networks. Several algorithms for graphs, which are the most adopted mathematical representation of networks, are well-known examples of applications that require high-performance computing, and for which classic sequential implementations are becoming inappropriate. In this context, parallel approaches targeting GPU architectures are becoming pervasive to deal with the execution time constraints. Although R packages for parallel execution on GPUs are already available, none of them provides graph algorithms. RESULTS: This work presents cuRnet, a R package that provides a parallel implementation for GPUs of the breath-first search (BFS), the single-source shortest paths (SSSP), and the strongly connected components (SCC) algorithms. The package allows offloading computing intensive applications to GPU devices for massively parallel computation and to speed up the runtime up to one order of magnitude with respect to the standard sequential computations on CPU. We have tested cuRnet on a benchmark of large protein interaction networks and for the interpretation of high-throughput omics data thought network analysis. CONCLUSIONS: cuRnet is a R package to speed up graph traversal and analysis through parallel computation on GPUs. We show the efficiency of cuRnet applied both to biological network analysis, which requires basic graph algorithms, and to complex existing procedures built upon such algorithms.


Assuntos
Algoritmos , Biologia Computacional/métodos , Gráficos por Computador , Metodologias Computacionais
15.
BMC Bioinformatics ; 19(Suppl 10): 350, 2018 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-30367585

RESUMO

BACKGROUND: High throughput technologies have provided the scientific community an unprecedented opportunity for large-scale analysis of genomes. Non-coding RNAs (ncRNAs), for a long time believed to be non-functional, are emerging as one of the most important and large family of gene regulators and key elements for genome maintenance. Functional studies have been able to assign to ncRNAs a wide spectrum of functions in primary biological processes, and for this reason they are assuming a growing importance as a potential new family of cancer therapeutic targets. Nevertheless, the number of functionally characterized ncRNAs is still too poor if compared to the number of new discovered ncRNAs. Thus platforms able to merge information from available resources addressing data integration issues are necessary and still insufficient to elucidate ncRNAs biological roles. RESULTS: In this paper, we describe a platform called Arena-Idb for the retrieval of comprehensive and non-redundant annotated ncRNAs interactions. Arena-Idb provides a framework for network reconstruction of ncRNA heterogeneous interactions (i.e., with other type of molecules) and relationships with human diseases which guide the integration of data, extracted from different sources, via mapping of entities and minimization of ambiguity. CONCLUSIONS: Arena-Idb provides a schema and a visualization system to integrate ncRNA interactions that assists in discovering ncRNA functions through the extraction of heterogeneous interaction networks. The Arena-Idb is available at http://arenaidb.ba.itb.cnr.it.


Assuntos
Redes Reguladoras de Genes , RNA não Traduzido/genética , Software , Bases de Dados Genéticas , Humanos , Interface Usuário-Computador
16.
Entropy (Basel) ; 20(12)2018 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-33266658

RESUMO

In this paper, by extending some results of informational genomics, we present a new randomness test based on the empirical entropy of strings and some properties of the repeatability and unrepeatability of substrings of certain lengths. We give the theoretical motivations of our method and some experimental results of its application to a wide class of strings: decimal representations of real numbers, roulette outcomes, logistic maps, linear congruential generators, quantum measurements, natural language texts, and genomes. It will be evident that the evaluation of randomness resulting from our tests does not distinguish among the different sources of randomness (natural, or pseudo-casual).

17.
Nucleic Acids Res ; 46(D1): D354-D359, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29036351

RESUMO

miRandola (http://mirandola.iit.cnr.it/) is a database of extracellular non-coding RNAs (ncRNAs) that was initially published in 2012, foreseeing the relevance of ncRNAs as non-invasive biomarkers. An increasing amount of experimental evidence shows that ncRNAs are frequently dysregulated in diseases. Further, ncRNAs have been discovered in different extracellular forms, such as exosomes, which circulate in human body fluids. Thus, miRandola 2017 is an effort to update and collect the accumulating information on extracellular ncRNAs that is spread across scientific publications and different databases. Data are manually curated from 314 articles that describe miRNAs, long non-coding RNAs and circular RNAs. Fourteen organisms are now included in the database, and associations of ncRNAs with 25 drugs, 47 sample types and 197 diseases. miRandola also classifies extracellular RNAs based on their extracellular form: Argonaute2 protein, exosome, microvesicle, microparticle, membrane vesicle, high density lipoprotein and circulating. We also implemented a new web interface to improve the user experience.


Assuntos
Bases de Dados Genéticas , Bases de Conhecimento , RNA não Traduzido , Biomarcadores , Ácidos Nucleicos Livres , Curadoria de Dados , Humanos , MicroRNAs , RNA , RNA Circular , RNA Longo não Codificante , Interface Usuário-Computador
18.
Artigo em Inglês | MEDLINE | ID: mdl-26761859

RESUMO

Graphs are mathematical structures to model several biological data. Applications to analyze them require to apply solutions for the subgraph isomorphism problem, which is NP-complete. Here, we investigate the existing strategies to reduce the subgraph isomorphism algorithm running time with emphasis on the importance of the order with which the graph vertices are taken into account during the search, called variable ordering, and its incidence on the total running time of the algorithms. We focus on two recent solutions, which are based on an effective variable ordering strategy. We discuss their comparison both with the variable ordering strategies reviewed in the paper and the other algorithms present in the ICPR2014 contest on graph matching algorithms for pattern search in biological databases.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Modelos Estatísticos , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador
19.
Sci Rep ; 6: 28840, 2016 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-27354155

RESUMO

In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.


Assuntos
Genoma , Modelos Genéticos , Algoritmos , Animais , Biologia Computacional , Entropia , Evolução Molecular , Humanos
20.
Bioinformatics ; 32(14): 2159-66, 2016 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153658

RESUMO

MOTIVATION: Biological network querying is a problem requiring a considerable computational effort to be solved. Given a target and a query network, it aims to find occurrences of the query in the target by considering topological and node similarities (i.e. mismatches between nodes, edges, or node labels). Querying tools that deal with similarities are crucial in biological network analysis because they provide meaningful results also in case of noisy data. In addition, as the size of available networks increases steadily, existing algorithms and tools are becoming unsuitable. This is rising new challenges for the design of more efficient and accurate solutions. RESULTS: This paper presents APPAGATO, a stochastic and parallel algorithm to find approximate occurrences of a query network in biological networks. APPAGATO handles node, edge and node label mismatches. Thanks to its randomic and parallel nature, it applies to large networks and, compared with existing tools, it provides higher performance as well as statistically significant more accurate results. Tests have been performed on protein-protein interaction networks annotated with synthetic and real gene ontology terms. Case studies have been done by querying protein complexes among different species and tissues. AVAILABILITY AND IMPLEMENTATION: APPAGATO has been developed on top of CUDA-C ++ Toolkit 7.0 framework. The software is available online http://profs.sci.univr.it/∼bombieri/APPAGATO CONTACT: rosalba.giugno@univr.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Software , Algoritmos , Animais , Humanos , Mapas de Interação de Proteínas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...