Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Algorithms Mol Biol ; 18(1): 17, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-38037088

RESUMO

We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, [Formula: see text] ([Formula: see text]), we define the basic structure-informed rearrangement measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to that of the reference gene order. Then, a structure-informed genome rearrangement distance is computed between the ordered PQ-tree and the target gene order. The second problem, [Formula: see text] ([Formula: see text]), generalizes [Formula: see text], where the gene order members are not necessarily permutations and the structure informed rearrangement measure is extended to also consider up to [Formula: see text] and [Formula: see text] gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference gene order to the target gene order. The first algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of [Formula: see text] is 0, then the algorithm runs in [Formula: see text] time and [Formula: see text] space. The second algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively, and allowing up to [Formula: see text] deletions from the tree and up to [Formula: see text] deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of [Formula: see text] is 0) in [Formula: see text] time and [Formula: see text] space. The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1487 prokaryotic genomes.

2.
BMC Bioinformatics ; 23(1): 253, 2022 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-35751023

RESUMO

BACKGROUND: The human body is inhabited by a diverse community of commensal non-pathogenic bacteria, many of which are essential for our health. By contrast, pathogenic bacteria have the ability to invade their hosts and cause a disease. Characterizing the differences between pathogenic and commensal non-pathogenic bacteria is important for the detection of emerging pathogens and for the development of new treatments. Previous methods for classification of bacteria as pathogenic or non-pathogenic used either raw genomic reads or protein families as features. Using protein families instead of reads provided a better interpretability of the resulting model. However, the accuracy of protein-families-based classifiers can still be improved. RESULTS: We developed a wide scope pathogenicity classifier (WSPC), a new protein-content-based machine-learning classification model. We trained WSPC on a newly curated dataset of 641 bacterial genomes, where each genome belongs to a different species. A comparative analysis we conducted shows that WSPC outperforms existing models on two benchmark test sets. We observed that the most discriminative protein-family features in WSPC are widely spread among bacterial species. These features correspond to proteins that are involved in the ability of bacteria to survive and replicate during an infection, rather than proteins that are directly involved in damaging or invading the host.


Assuntos
Genoma Bacteriano , Genômica , Bactérias/genética , Genômica/métodos , Humanos , Aprendizado de Máquina , Filogenia , Virulência/genética
3.
Algorithms Mol Biol ; 16(1): 16, 2021 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-34243815

RESUMO

Gene clusters are groups of genes that are co-locally conserved across various genomes, not necessarily in the same order. Their discovery and analysis is valuable in tasks such as gene annotation and prediction of gene interactions, and in the study of genome organization and evolution. The discovery of conserved gene clusters in a given set of genomes is a well studied problem, but with the rapid sequencing of prokaryotic genomes a new problem is inspired. Namely, given an already known gene cluster that was discovered and studied in one genomic dataset, to identify all the instances of the gene cluster in a given new genomic sequence. Thus, we define a new problem in comparative genomics, denoted PQ-TREE SEARCH that takes as input a PQ-tree T representing the known gene orders of a gene cluster of interest, a gene-to-gene substitution scoring function h, integer arguments [Formula: see text] and [Formula: see text], and a new sequence of genes S. The objective is to identify in S approximate new instances of the gene cluster; These instances could vary from the known gene orders by genome rearrangements that are constrained by T, by gene substitutions that are governed by h, and by gene deletions and insertions that are bounded from above by [Formula: see text] and [Formula: see text], respectively. We prove that PQ-TREE SEARCH is NP-hard and propose a parameterized algorithm that solves the optimization variant of PQ-TREE SEARCH in [Formula: see text] time, where [Formula: see text] is the maximum degree of a node in T and [Formula: see text] is used to hide factors polynomial in the input size. The algorithm is implemented as a search tool, denoted PQFinder, and applied to search for instances of chromosomal gene clusters in plasmids, within a dataset of 1,487 prokaryotic genomes. We report on 29 chromosomal gene clusters that are rearranged in plasmids, where the rearrangements are guided by the corresponding PQ-trees. One of these results, coding for a heavy metal efflux pump, is further analysed to exemplify how PQFinder can be harnessed to reveal interesting new structural variants of known gene clusters.

4.
Bioinformatics ; 36(Suppl_1): i21-i29, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657415

RESUMO

MOTIVATION: An important task in comparative genomics is to detect functional units by analyzing gene-context patterns. Colinear syntenic blocks (CSBs) are groups of genes that are consistently encoded in the same neighborhood and in the same order across a wide range of taxa. Such CSBs are likely essential for the regulation of gene expression in prokaryotes. Recent results indicate that colinearity can be conserved across multiple operons, thus motivating the discovery of multi-operon CSBs. This computational task raises scalability challenges in large datasets. RESULTS: We propose an efficient algorithm for the discovery of cross-strand multi-operon CSBs in large genomic datasets. The proposed algorithm uses match-point arithmetic, which is scalable for large datasets of microbial genomes in terms of running time and space requirements. The algorithm is implemented and incorporated into a tool with a graphical user interface, called CSBFinder-S. We applied CSBFinder-S to data mine 1485 prokaryotic genomes and analyzed the identified cross-strand CSBs. Our results indicate that most of the syntenic blocks are exclusively colinear. Additional results indicate that transcriptional regulation by overlapping transcriptional genes is abundant in bacteria. We demonstrate the utility of CSBFinder-S to identify common function of the gene-pair PulEF in multiple contexts, including Type 2 Secretion System, Type 4 Pilus System and DNA uptake machinery. AVAILABILITY AND IMPLEMENTATION: CSBFinder-S software and code are publicly available at https://github.com/dinasv/CSBFinder. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Microbiano , Genômica , Algoritmos , Óperon , Software , Sintenia
5.
J Comput Biol ; 27(11): 1561-1580, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32250165

RESUMO

An important goal in microbial computational genomics is to identify crucial events in the evolution of a gene that severely alter the duplication, loss, and mobilization patterns of the gene within the genomes in which it disseminates. In this article, we formalize this microbiological goal as a new pattern-matching problem in the domain of gene tree and species tree reconciliation, denoted "Reconciliation-Scenario Altering Mutation (RSAM) Discovery." We propose an [Formula: see text] time algorithm to solve this new problem, wheremandnare the number of vertices of the input gene tree and species tree, respectively, andkis a user-specified parameter that bounds from above the number of optimal solutions of interest. The algorithm first constructs a hypergraph representing thekhighest scoring reconciliation scenarios between the given gene tree and species tree, and then interrogates this hypergraph for subtrees matching a prespecified RSAM pattern. Our algorithm is optimal in the sense that the number of hypernodes in the hypergraph can be lower bounded by [Formula: see text]. We implement the new algorithm as a tool, called RSAM-finder, and demonstrate its application to the identification of RSAMs in toxins and drug resistance elements across a data set spanning hundreds of species.


Assuntos
Bactérias/genética , Genômica , Modelos Genéticos , Mutação , Algoritmos , Evolução Molecular , Filogenia
6.
J Comput Biol ; 26(7): 745-766, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31140838

RESUMO

Recent advances in Next Generation Sequencing techniques, combined with global efforts to study infectious diseases, yield huge and rapidly-growing databases of microbial genomes. These big new data statistically empower genomic-context based approaches to functional analysis: the idea is that groups of genes that are clustered locally together across many genomes usually express protein products that interact in the same biological pathway (e.g., operons). The problem of finding such conserved "gene blocks" in a given genomic data has been studied extensively. In this work, we propose a new gene block discovery problem variant: find conserved gene blocks abiding by a user specification of biological functional constraints. We take advantage of the biological constraints to efficiently prune the search space. This is achieved by modeling the new problem as a special constrained variant of the well-studied "Closed Frequent Itemset Mining" problem, generalized here to handle item duplications. We exemplify the application of the tool we developed for this problem with two different case studies related to microbial ATP (adenosine triphosphate)-binding cassette (ABC) transporters.


Assuntos
Estudos de Associação Genética , Genoma , Família Multigênica , Células Procarióticas/metabolismo , Transportadores de Cassetes de Ligação de ATP/genética , Fatores de Tempo
7.
Bioinformatics ; 35(10): 1634-1643, 2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-30321308

RESUMO

MOTIVATION: Identification of conserved syntenic blocks across microbial genomes is important for several problems in comparative genomics such as gene annotation, study of genome organization and evolution and prediction of gene interactions. Current tools for syntenic block discovery do not scale up to the large quantity of prokaryotic genomes available today. RESULTS: We present a novel methodology for the discovery, ranking and taxonomic distribution analysis of colinear syntenic blocks (CSBs)-groups of genes that are consistently located close to each other, in the same order, across a wide range of taxa. We present an efficient algorithm that identifies CSBs in large genomic datasets. The algorithm is implemented and incorporated in a novel tool with a graphical user interface, denoted CSBFinder, that ranks the discovered CSBs according to a probabilistic score and clusters them to families according to their gene content similarity. We apply CSBFinder to data mine 1487 prokaryotic genomes including chromosomes and plasmids. For post-processing analysis, we generate heatmaps for visualizing the distribution of CSB family members across various taxa. We exemplify the utility of CSBFinder in operon prediction, in deciphering unknown gene function and in taxonomic analysis of colinear syntenic blocks. AVAILABILITY AND IMPLEMENTATION: CSBFinder software and code are publicly available at https://github.com/dinasv/CSBFinder. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Algoritmos , Genoma Microbiano , Sintenia
8.
Bioinformatics ; 35(12): 2001-2008, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30407484

RESUMO

MOTIVATION: Bacterial infections are a major cause of illness worldwide. However, most bacterial strains pose no threat to human health and may even be beneficial. Thus, developing powerful diagnostic bioinformatic tools that differentiate pathogenic from commensal bacteria are critical for effective treatment of bacterial infections. RESULTS: We propose a machine-learning approach for classifying human-hosted bacteria as pathogenic or non-pathogenic based on their genome-derived proteomes. Our approach is based on sparse Support Vector Machines (SVM), which autonomously selects a small set of genes that are related to bacterial pathogenicity. We implement our approach as a tool-'Bacterial Pathogenicity Classification via sparse-SVM' (BacPaCS)-which is fully automated and handles datasets significantly larger than those previously used. BacPaCS shows high accuracy in distinguishing pathogenic from non-pathogenic bacteria, in a clinically relevant dataset, comprising only human-hosted bacteria. Among the genes that received the highest positive weight in the resulting classifier, we found genes that are known to be related to bacterial pathogenicity, in addition to novel candidates, whose involvement in bacterial virulence was never reported. AVAILABILITY AND IMPLEMENTATION: The code and the resulting model are available at: https://github.com/barashe/bacpacs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Máquina de Vetores de Suporte , Bactérias , Humanos , Proteoma , Virulência
9.
J Comput Biol ; 25(2): 214-235, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29028176

RESUMO

We formalize a new problem variant in gene-block discovery, denoted Reference-Anchored Gene Blocks (RAGB), given a query sequence Q of length n, representing the gene array of a DNA element, a window size bound d on the length of a substring of interest in Q, and a set of target gene sequences [Formula: see text]. Our objective is to identify gene blocks in [Formula: see text] that are centered in a subset q of co-localized genes from Q, and contain genomes from [Formula: see text] in which the corresponding orthologs of the genes from q are also co-localized. We cast RAGB as a variant of a (colored) biclique problem in bipartite graphs, and analyze its parameterized complexity, as well as the parameterized complexity of other related problems. We give an [Formula: see text] time algorithm for the uncolored variant of our biclique problem, where m is the number of areas of interest that are parsed from the target sequences, and n and d are as defined earlier. Our algorithm can be adapted to compute all maximal bicliques in the graph within the same time complexity, and to handle edge weights with a slight [Formula: see text] increase to its time complexity. For the colored version of the problem, our algorithm has a time complexity of [Formula: see text]. We implement the algorithm and exemplify its application to the data mining of proteobacterial gene blocks that are centered in predicted proteobacterial genomic islands, leading to the identification of putatively mobilized clusters of virulence, pathogenicity, and resistance genes.


Assuntos
Genoma Bacteriano , Família Multigênica , Análise de Sequência de DNA/métodos , Algoritmos , Mapeamento de Sequências Contíguas/métodos , Mapeamento de Sequências Contíguas/normas , Padrões de Referência , Análise de Sequência de DNA/normas
10.
Bioinformatics ; 33(12): 1907-1909, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28165111

RESUMO

SUMMARY: Network motifs are small topological patterns that recur in a network significantly more often than expected by chance. Their identification emerged as a powerful approach for uncovering the design principles underlying complex networks. However, available tools for network motif analysis typically require download and execution of computationally intensive software on a local computer. We present MotifNet, the first open-access web-server for network motif analysis. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labeled, and to search for motifs of up to eight nodes. The output motifs are presented graphically and the user can interactively filter them by their significance, number of instances, node and edge labels, and node identities, and view their instances. MotifNet also allows the user to distinguish between motifs that are centered on specific nodes and motifs that recur in distinct parts of the network. AVAILABILITY AND IMPLEMENTATION: MotifNet is freely available at http://netbio.bgu.ac.il/motifnet . The website was implemented using ReactJs and supports all major browsers. The server interface was implemented in Python with data stored on a MySQL database. CONTACT: estiyl@bgu.ac.il or michaluz@cs.bgu.ac.il. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Software , Bases de Dados Factuais , Internet
11.
PLoS Comput Biol ; 13(1): e1005221, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-28135269

RESUMO

Protein phosphorylation underlies cellular response pathways across eukaryotes and is governed by the opposing actions of phosphorylating kinases and de-phosphorylating phosphatases. While kinases and phosphatases have been extensively studied, their organization and the mechanisms by which they balance each other are not well understood. To address these questions we performed quantitative analyses of large-scale 'omics' datasets from yeast, fly, plant, mouse and human. We uncovered an asymmetric balance of a previously-hidden scale: Each organism contained many different kinase genes, and these were balanced by a small set of highly abundant phosphatase proteins. Kinases were much more responsive to perturbations at the gene and protein levels. In addition, kinases had diverse scales of phenotypic impact when manipulated. Phosphatases, in contrast, were stable, highly robust and flatly organized, with rather uniform impact downstream. We validated aspects of this organization experimentally in nematode, and supported additional aspects by theoretic analysis of the dynamics of protein phosphorylation. Our analyses explain the empirical bias in the protein phosphorylation field toward characterization and therapeutic targeting of kinases at the expense of phosphatases. We show quantitatively and broadly that this is not only a historical bias, but stems from wide-ranging differences in their organization and impact. The asymmetric balance between these opposing regulators of protein phosphorylation is also common to opposing regulators of two other post-translational modification systems, suggesting its fundamental value.


Assuntos
Evolução Molecular , Regulação Enzimológica da Expressão Gênica/fisiologia , Monoéster Fosfórico Hidrolases/genética , Monoéster Fosfórico Hidrolases/metabolismo , Fosfotransferases/genética , Fosfotransferases/metabolismo , Animais , Arabidopsis/genética , Arabidopsis/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Ativação Enzimática/genética , Variação Genética/genética , Camundongos , Monoéster Fosfórico Hidrolases/classificação , Fosforilação , Fosfotransferases/classificação , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Especificidade da Espécie , Leveduras
12.
J Comput Biol ; 23(3): 165-79, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26953875

RESUMO

Network querying is a powerful approach to mine molecular interaction networks. Most state-of-the-art network querying tools either confine the search to a prespecified topology in the form of some template subnetwork, or do not specify any topological constraints at all. Another approach is grammar-based queries, which are more flexible and expressive as they allow for expressing the topology of the sought pattern according to some grammar-based logic. Previous grammar-based network querying tools were confined to the identification of paths. In this article, we extend the patterns identified by grammar-based query approaches from paths to trees. For this, we adopt a higher order query descriptor in the form of a regular tree grammar (RTG). We introduce a novel problem and propose an algorithm to search a given graph for the k highest scoring subgraphs matching a tree accepted by an RTG. Our algorithm is based on the combination of dynamic programming with color coding, and includes an extension of previous k-best parsing optimization approaches to avoid isomorphic trees in the output. We implement the new algorithm and exemplify its application to mining viral infection patterns within molecular interaction networks. Our code is available online.


Assuntos
Algoritmos , Mineração de Dados/métodos , Interações Hospedeiro-Patógeno , Vírus/patogenicidade , Humanos
13.
J Comput Biol ; 21(11): 799-808, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25302568

RESUMO

One of the core classical problems in computational biology is that of constructing the most parsimonious phylogenetic tree interpreting an input set of sequences from the genomes of evolutionarily related organisms. We reexamine the classical maximum parsimony (MP) optimization problem for the general (asymmetric) scoring matrix case, where rooted phylogenies are implied, and analyze the worst case bounds of three approaches to MP: The approach of Cavalli-Sforza and Edwards, the approach of Hendy and Penny, and a new agglomerative, "bottom-up" approach we present in this article. We show that the second and third approaches are faster than the first one by a factor of Θ(√n) and Θ(n), respectively, where n is the number of species.


Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Modelos Estatísticos , Filogenia , Animais , Humanos
14.
Methods ; 69(3): 326-34, 2014 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-25009129

RESUMO

The discovery and functional analysis of noncoding RNA (ncRNA) systems in different organisms motivates the development of tools for aiding ncRNA research. Several tools exist that search for occurrences of a given RNA structural profile in genomic sequences. Yet, there is a need for an "RNA BLAST" tool, i.e., a tool that takes a putative functional RNA sequence as input, and efficiently searches for similar sequences in genomic databases, taking into consideration potential secondary structure features of the input query sequence. This work aims at providing such a tool. Our tool, denoted StemSearch, is based on a structural representation of an RNA sequence by its potential stems. Potential stems in genomic sequences are identified in a preprocessing stage, and indexed. A user-provided query sequence is likewise processed, and stems from the target genomes that are similar to the query stems are retrieved from the index. Then, relevant genomic regions are identified and ranked according to their similarity to the query stem-set while enforcing conservation of cross-stem topology. Experiments using RFAM families show significantly improved recall for StemSearch over BLAST, with small loss of precision. We further demonstrate our system's capability to handle eukaryotic genomes by successfully searching for members of the 7SK family in chromosome 2 of the human genome. StemSearch is freely available on the web at: http://www.cs.bgu.ac.il/∼negevcb/StemSearch.


Assuntos
RNA não Traduzido/genética , Ferramenta de Busca/métodos , Análise de Sequência de RNA/métodos , Software , Algoritmos , Humanos , Conformação de Ácido Nucleico , Alinhamento de Sequência/métodos
15.
Algorithms Mol Biol ; 8(1): 27, 2013 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-24168705

RESUMO

: We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, MP(n)=On3log3lognlog2n due to an algorithm by Chan).For integer cost functions, the running time is further improved to O|Σ|n3log2n. In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1log|D|n<λ<1, the algorithm preprocesses an n × n matrix in On2+λ|D| time and On2+λ|D|λ2log|D|2n space. Then, it may multiply the matrix with any given n-length vector in On2λ2log|D|2n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest On3log2n time algorithm for single-strand RNA folding with discrete cost functions.Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O|Σ|nMP(ñ)ñ time and O(|Σ|nñ) space, where ñ is the length of the run-length encoding of the strings.

16.
Bioinformatics ; 29(13): i210-6, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812986

RESUMO

MOTIVATION: A major challenge in systems biology is to reveal the cellular pathways that give rise to specific phenotypes and behaviours. Current techniques often rely on a network representation of molecular interactions, where each node represents a protein or a gene and each interaction is assigned a single static score. However, the use of single interaction scores fails to capture the tendency of proteins to favour different partners under distinct cellular conditions. RESULTS: Here, we propose a novel context-sensitive network model, in which genes and protein nodes are assigned multiple contexts based on their gene ontology annotations, and their interactions are associated with multiple context-sensitive scores. Using this model, we developed a new approach and a corresponding tool, ContextNet, based on a dynamic programming algorithm for identifying signalling paths linking proteins to their downstream target genes. ContextNet finds high-ranking context-sensitive paths in the interactome, thereby revealing the intermediate proteins in the path and their path-specific contexts. We validated the model using 18 348 manually curated cellular paths derived from the SPIKE database. We next applied our framework to elucidate the responses of human primary lung cells to influenza infection. Top-ranking paths were much more likely to contain infection-related proteins, and this likelihood was highly correlated with path score. Moreover, the contexts assigned by the algorithm pointed to putative, as well as previously known responses to viral infection. Thus, context sensitivity is an important extension to current network biology models and can be efficiently used to elucidate cellular response mechanisms. AVAILABILITY: ContextNet is publicly available at http://netbio.bgu.ac.il/ContextNet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Reguladoras de Genes , Mapeamento de Interação de Proteínas/métodos , Transdução de Sinais , Algoritmos , Ontologia Genética , Humanos , Pulmão/metabolismo , Pulmão/virologia , Modelos Biológicos , Orthomyxoviridae/metabolismo , Proteínas Virais/metabolismo
17.
Algorithms Mol Biol ; 8(1): 13, 2013 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-23590940

RESUMO

: We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem (HSA), and present a new algorithm which applies to several modes, combining global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that for input trees T and S, our algorithm has an O(nTnS + min(dT,dS)LTLS) time complexity, where nT,LT and dT are the number of nodes, the number of leaves, and the maximum node degree in T, respectively (satisfying dT ≤ LT ≤ nT), and similarly for nS,LS and dS with respect to the tree S. This improves the time complexity of previous algorithms for less general variants of the problem.In order to obtain this time bound for HSA, we developed new algorithms for a generalized variant of the Min-Cost Bipartite Matching problem (MCM), as well as to two derivatives of this problem, entitled All-Cavity-MCM and All-Pairs-Cavity-MCM. For two input sets of size n and m, where n ≤ m, MCM and both its cavity derivatives are solved in O(n3 + nm) time, without the usage of priority queues (e.g. Fibonacci heaps) or other complex data structures. This gives the first cubic time algorithm for All-Pairs-Cavity-MCM, and improves the running times of MCM and All-Cavity-MCM problems in the unbalanced case where n ≪ m.We implemented the algorithm (in all modes mentioned above) as a graphical software tool which computes and displays similarities between secondary structures of RNA given as input, and employed it to a preliminary experiment in which we ran all-against-all inter-family pairwise alignments of RNAse P and Hammerhead RNA family members, exposing new similarities which could not be detected by the traditional rooted ordered alignment approaches. The results demonstrate that our approach can be used to expose structural similarity between some RNAs with higher sensitivity than the traditional rooted ordered alignment approaches. Source code and web-interface for our tool can be found in http://www.cs.bgu.ac.il/\~negevcb/FRUUT.

18.
BMC Bioinformatics ; 13: 322, 2012 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-23206407

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are important regulators of gene expression encoded by a variety of organisms, including viruses. Although the function of most of the viral miRNAs is currently unknown, there is evidence that both viral and host miRNAs contribute to the interactions between viruses and their hosts. miRNAs constitute a complex combinatorial network, where one miRNA may target many genes and one gene may be targeted by multiple miRNAs. In particular, viral and host miRNAs may also have mutual target genes. Based on published evidence linking viral and host miRNAs there are three modes of mutual regulation: competing, cooperating, and compensating modes. RESULTS: In this paper we explore the compensating mode of mutual regulation upon Human Cytomegalovirus (HCMV) infection, when host miRNAs are down regulated and viral miRNAs compensate by mimicking their function. To achieve this, we develop a new algorithm which finds groups, called quasi-modules, of viral and host miRNAs and their mutual target genes, and use a new host miRNA expression data for HCMV-infected and uninfected cells. For two of the reported quasi-modules, supporting evidence from biological and medical literature is provided. CONCLUSIONS: The modules found by our method may advance the understanding of the role of miRNAs in host-viral interactions, and the genes in these modules may serve as candidates for further experimental validation.


Assuntos
Citomegalovirus/genética , Regulação da Expressão Gênica/fisiologia , MicroRNAs/fisiologia , RNA Viral/fisiologia , Algoritmos , Citomegalovirus/fisiologia , Regulação para Baixo , Humanos
19.
Open Virol J ; 6: 38-48, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22715351

RESUMO

The purpose of the present study was to characterize the microRNA transcriptome (miRNAome) of the human cytomegalovirus (HCMV or HHV5). We used deep sequencing and real time PCR (qPCR) together with bioinformatics to analyze the pattern of small RNA expression in cells infected with low-passage isolates of HCMV as well as in plasma and amniotic fluid. We report here on the discovery of four new precursors and ten new miRNAs as well as eleven microRNA-offset-RNAs (moRs) that are all encoded by HCMV. About eighty percent of the total HCMV reads were perfectly mapped to HCMV miRNAs, strongly suggestive of their important biological role that in large remains still to be defined and characterized. Taken altogether, the results of this study demonstrate the power and usefulness of the combined bioinformatics/biological approach in discovering additional important members of HCMV- encoded small RNAs and can be applied to the study of other viruses as well.

20.
J Comput Biol ; 18(11): 1525-42, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22035327

RESUMO

Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally measured thermodynamic parameters, to machine-learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant. We study the potential contribution of increasing the amount of information utilized by RNA folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F1-measure over correctly-predicted base-pairs (i.e., 16% error rate), compared to the previously best reported score of 70% (i.e., 30% error rate). That is, the new model yields an error reduction of about 50%. Trained models and source code are available at www.cs.bgu.ac.il/?negevcb/contextfold.


Assuntos
Algoritmos , Modelos Moleculares , RNA/química , Inteligência Artificial , Pareamento de Bases , Sequência de Bases , Simulação por Computador , Conformação de Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...