Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Cell Rep Methods ; 3(1): 100373, 2023 01 23.
Article in English | MEDLINE | ID: mdl-36814834

ABSTRACT

A limitation of pooled CRISPR-Cas9 screens is the high false-positive rate in detecting essential genes arising from copy-number-amplified genomics regions. To solve this issue, we previously developed CRISPRcleanR: a computational method implemented as R/python package and in a dockerized version. CRISPRcleanR detects and corrects biased responses to CRISPR-Cas9 targeting in an unsupervised fashion, accurately reducing false-positive signals while maintaining sensitivity in identifying relevant genetic dependencies. Here, we present CRISPRcleanR WebApp , a web application enabling access to CRISPRcleanR through an intuitive interface. CRISPRcleanR WebApp removes the complexity of R/python language user interactions; provides user-friendly access to a complete analytical pipeline, not requiring any data pre-processing and generating gene-level summaries of essentiality with associated statistical scores; and offers a range of interactively explorable plots while supporting a more comprehensive range of CRISPR guide RNAs' libraries than the original package. CRISPRcleanR WebApp is available at https://crisprcleanr-webapp.fht.org/.


Subject(s)
CRISPR-Cas Systems , Genome , CRISPR-Cas Systems/genetics , Genomics/methods , Software
2.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36669133

ABSTRACT

MOTIVATION: Binary (or Boolean) matrices provide a common effective data representation adopted in several domains of computational biology, especially for investigating cancer and other human diseases. For instance, they are used to summarize genetic aberrations-copy number alterations or mutations-observed in cancer patient cohorts, effectively highlighting combinatorial relations among them. One of these is the tendency for two or more genes not to be co-mutated in the same sample or patient, i.e. a mutual-exclusivity trend. Exploiting this principle has allowed identifying new cancer driver protein-interaction networks and has been proposed to design effective combinatorial anti-cancer therapies rationally. Several tools exist to identify and statistically assess mutual-exclusive cancer-driver genomic events. However, these tools need to be equipped with robust/efficient methods to sort rows and columns of a binary matrix to visually highlight possible mutual-exclusivity trends. RESULTS: Here, we formalize the mutual-exclusivity-sorting problem and present MutExMatSorting: an R package implementing a computationally efficient algorithm able to sort rows and columns of a binary matrix to highlight mutual-exclusivity patterns. Particularly, our algorithm minimizes the extent of collective vertical overlap between consecutive non-zero entries across rows while maximizing the number of adjacent non-zero entries in the same row. Here, we demonstrate that existing tools for mutual-exclusivity analysis are suboptimal according to these criteria and are outperformed by MutExMatSorting. AVAILABILITY AND IMPLEMENTATION: https://github.com/AleVin1995/MutExMatSorting. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Heuristics , Neoplasms , Humans , Algorithms , Neoplasms/genetics , Genomics , Computational Biology/methods , Mutation
3.
Cell Rep ; 40(4): 111145, 2022 07 26.
Article in English | MEDLINE | ID: mdl-35905712

ABSTRACT

Pooled genome-wide CRISPR-Cas9 screens are furthering our mechanistic understanding of human biology and have allowed us to identify new oncology therapeutic targets. Scale-limited CRISPR-Cas9 screens-typically employing guide RNA libraries targeting subsets of functionally related genes, biological pathways, or portions of the druggable genome-constitute an optimal setting for investigating narrow hypotheses and are easier to execute on complex models, such as organoids and in vivo models. Different supervised methods are used for computational analysis of genome-wide CRISPR-Cas9 screens; most are not well suited for scale-limited screens, as they require large sets of positive/negative control genes (gene templates) to be included among the screened ones. Here, we develop a computational framework identifying optimal subsets of known essential and nonessential genes (at different subsampling percentages) that can be used as templates for supervised analyses of scale-limited CRISPR-Cas9 screens, while having a reduced impact on the size of the employed library.


Subject(s)
CRISPR-Cas Systems , RNA, Guide, Kinetoplastida , CRISPR-Cas Systems/genetics , Gene Library , Genome , Humans , RNA, Guide, Kinetoplastida/genetics
4.
BMC Genomics ; 22(1): 828, 2021 Nov 17.
Article in English | MEDLINE | ID: mdl-34789150

ABSTRACT

BACKGROUND: CRISPR-Cas9 genome-wide screens are being increasingly performed, allowing systematic explorations of cancer dependencies at unprecedented accuracy and scale. One of the major computational challenges when analysing data derived from such screens is to identify genes that are essential for cell survival invariantly across tissues, conditions, and genomic-contexts (core-fitness genes), and to distinguish them from context-specific essential genes. This is of paramount importance to assess the safety profile of candidate therapeutic targets and for elucidating mechanisms involved in tissue-specific genetic diseases. RESULTS: We have developed CoRe: an R package implementing existing and novel methods for the identification of core-fitness genes (at two different level of stringency) from joint analyses of multiple CRISPR-Cas9 screens. We demonstrate, through a fully reproducible benchmarking pipeline, that CoRe outperforms state-of-the-art tools, yielding more reliable and biologically relevant sets of core-fitness genes. CONCLUSIONS: CoRe offers a flexible pipeline, compatible with many pre-processing methods for the analysis of CRISPR data, which can be tailored onto different use-cases. The CoRe package can be used for the identification of high-confidence novel core-fitness genes, as well as a means to filter out potentially cytotoxic hits while analysing cancer dependency datasets for identifying and prioritising novel selective therapeutic targets.


Subject(s)
CRISPR-Cas Systems , Neoplasms , Benchmarking , Genes, Essential , Humans , Neoplasms/genetics
5.
PLoS Comput Biol ; 17(1): e1008561, 2021 01.
Article in English | MEDLINE | ID: mdl-33406072

ABSTRACT

Phylogeographic inference allows reconstruction of past geographical spread of pathogens or living organisms by integrating genetic and geographic data. A popular model in continuous phylogeography-with location data provided in the form of latitude and longitude coordinates-describes spread as a Brownian motion (Brownian Motion Phylogeography, BMP) in continuous space and time, akin to similar models of continuous trait evolution. Here, we show that reconstructions using this model can be strongly affected by sampling biases, such as the lack of sampling from certain areas. As an attempt to reduce the effects of sampling bias on BMP, we consider the addition of sequence-free samples from under-sampled areas. While this approach alleviates the effects of sampling bias, in most scenarios this will not be a viable option due to the need for prior knowledge of an outbreak's spatial distribution. We therefore consider an alternative model, the spatial Λ-Fleming-Viot process (ΛFV), which has recently gained popularity in population genetics. Despite the ΛFV's robustness to sampling biases, we find that the different assumptions of the ΛFV and BMP models result in different applicabilities, with the ΛFV being more appropriate for scenarios of endemic spread, and BMP being more appropriate for recent outbreaks or colonizations.


Subject(s)
Genetics, Population/methods , Models, Genetic , Phylogeography/methods , Selection Bias , Bayes Theorem , Computational Biology , Disease Outbreaks/statistics & numerical data , Flavivirus/genetics , Flavivirus Infections/epidemiology , Flavivirus Infections/virology , Humans , Markov Chains
6.
Syst Biol ; 70(1): 21-32, 2021 01 01.
Article in English | MEDLINE | ID: mdl-32353118

ABSTRACT

How can we best learn the history of a protein's evolution? Ideally, a model of sequence evolution should capture both the process that generates genetic variation and the functional constraints determining which changes are fixed. However, in practical terms the most suitable approach may simply be the one that combines the convenience of easily available input data with the ability to return useful parameter estimates. For example, we might be interested in a measure of the strength of selection (typically obtained using a codon model) or an ancestral structure (obtained using structural modeling based on inferred amino acid sequence and side chain configuration). But what if data in the relevant state-space are not readily available? We show that it is possible to obtain accurate estimates of the outputs of interest using an established method for handling missing data. Encoding observed characters in an alignment as ambiguous representations of characters in a larger state-space allows the application of models with the desired features to data that lack the resolution that is normally required. This strategy is viable because the evolutionary path taken through the observed space contains information about states that were likely visited in the "unseen" state-space. To illustrate this, we consider two examples with amino acid sequences as input. We show that $$\omega$$, a parameter describing the relative strength of selection on nonsynonymous and synonymous changes, can be estimated in an unbiased manner using an adapted version of a standard 61-state codon model. Using simulated and empirical data, we find that ancestral amino acid side chain configuration can be inferred by applying a 55-state empirical model to 20-state amino acid data. Where feasible, combining inputs from both ambiguity-coded and fully resolved data improves accuracy. Adding structural information to as few as 12.5% of the sequences in an amino acid alignment results in remarkable ancestral reconstruction performance compared to a benchmark that considers the full rotamer state information. These examples show that our methods permit the recovery of evolutionary information from sequences where it has previously been inaccessible. [Ancestral reconstruction; natural selection; protein structure; state-spaces; substitution models.].


Subject(s)
Evolution, Molecular , Selection, Genetic , Amino Acid Sequence , Models, Genetic , Phylogeny , Proteins
7.
8.
Mol Biol Evol ; 36(9): 2086-2103, 2019 09 01.
Article in English | MEDLINE | ID: mdl-31114882

ABSTRACT

Few models of sequence evolution incorporate parameters describing protein structure, despite its high conservation, essential functional role and increasing availability. We present a structurally aware empirical substitution model for amino acid sequence evolution in which proteins are expressed using an expanded alphabet that relays both amino acid identity and structural information. Each character specifies an amino acid as well as information about the rotamer configuration of its side-chain: the discrete geometric pattern of permitted side-chain atomic positions, as defined by the dihedral angles between covalently linked atoms. By assigning rotamer states in 251,194 protein structures and identifying 4,508,390 substitutions between closely related sequences, we generate a 55-state "Dayhoff-like" model that shows that the evolutionary properties of amino acids depend strongly upon side-chain geometry. The model performs as well as or better than traditional 20-state models for divergence time estimation, tree inference, and ancestral state reconstruction. We conclude that not only is rotamer configuration a valuable source of information for phylogenetic studies, but that modeling the concomitant evolution of sequence and structure may have important implications for understanding protein folding and function.


Subject(s)
Evolution, Molecular , Models, Biological , Protein Conformation , Amino Acid Substitution , Markov Chains
9.
BMC Bioinformatics ; 18(Suppl 5): 144, 2017 Mar 23.
Article in English | MEDLINE | ID: mdl-28361701

ABSTRACT

BACKGROUND: In recent years long non coding RNAs (lncRNAs) have been the subject of increasing interest. Thanks to many recent functional studies, the existence of a large class of lncRNAs with potential regulatory functions is now widely accepted. Although an increasing number of lncRNAs is being characterized and shown to be involved in many biological processes, the functions of the vast majority lncRNA genes is still unknown. Therefore computational methods able to take advantage of the increasing amount of publicly available data to predict lncRNA functions could be very useful. RESULTS: Since coding genes are much better annotated than lncRNAs, we attempted to project known functional information regarding proteins onto non coding genes using the guilt by association principle: if a gene shows an expression profile that correlates with those of a set of coding genes involved in a given function, that gene is probably involved in the same function. We computed gene coexpression for 30 human tissues and 9 vertebrates and mined the resulting networks with a methodology inspired by the rank product algorithm used to identify differentially expressed genes. Using different types of reference data we can predict putative new annotations for thousands of lncRNAs and proteins, ranging from cellular localization to relevance for disease and cancer. CONCLUSIONS: New function of coding genes and lncRNA can be profitably predicted using tissue specific coexpression, as well as expression of orthologous genes in different species. The data are available for download and through a user-friendly web interface at www.funcpred.com .


Subject(s)
Computational Biology/methods , Computer Simulation , Models, Genetic , RNA, Long Noncoding/genetics , Transcriptome , Algorithms , Animals , Evolution, Molecular , Humans , Organ Specificity , RNA, Long Noncoding/physiology
SELECTION OF CITATIONS
SEARCH DETAIL
...