Search | VHL Regional Portal

A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction.

Yaish, Ofir; Asif, Maor; Orenstein, Yaron.

Brief Bioinform ; 23(5)2022 09 20.

Article in English | MEDLINE | ID: mdl-35595297

ABSTRACT

CRISPR/Cas9 system is widely used in a broad range of gene-editing applications. While this editing technique is quite accurate in the target region, there may be many unplanned off-target sites (OTSs). Consequently, a plethora of computational methods have been developed to predict off-target cleavage sites given a guide RNA and a reference genome. However, these methods are based on small-scale datasets (only tens to hundreds of OTSs) produced by experimental techniques to detect OTSs with a low signal-to-noise ratio. Recently, CHANGE-seq, a new in vitro experimental technique to detect OTSs, was used to produce a dataset of unprecedented scale and quality (>200 000 OTS over 110 guide RNAs). In addition, the same study included in cellula GUIDE-seq experiments for 58 of the guide RNAs. Here, we fill the gap in previous computational methods by utilizing these data to systematically evaluate data processing and formulation of the CRISPR OTSs prediction problem. Our evaluations show that data transformation as a pre-processing phase is critical prior to model training. Moreover, we demonstrate the improvement gained by adding potential inactive OTSs to the training datasets. Furthermore, our results point to the importance of adding the number of mismatches between guide RNAs and their OTSs as a feature. Finally, we present predictive off-target in cellula models based on both in vitro and in cellula data and compare them to state-of-the-art methods in predicting true OTSs. Our conclusions will be instrumental in any future development of an off-target predictor based on high-throughput datasets.

Subject(s)

CRISPR-Cas Systems , RNA, Guide, Kinetoplastida , Gene Editing/methods , RNA, Guide, Kinetoplastida/genetics , Research Design

DeepSELEX: inferring DNA-binding preferences from HT-SELEX data using multi-class CNNs.

Asif, Maor; Orenstein, Yaron.

Bioinformatics ; 36(Suppl_2): i634-i642, 2020 12 30.

Article in English | MEDLINE | ID: mdl-33381817

ABSTRACT

MOTIVATION: Transcription factor (TF) DNA-binding is a central mechanism in gene regulation. Biologists would like to know where and when these factors bind DNA. Hence, they require accurate DNA-binding models to enable binding prediction to any DNA sequence. Recent technological advancements measure the binding of a single TF to thousands of DNA sequences. One of the prevailing techniques, high-throughput SELEX, measures protein-DNA binding by high-throughput sequencing over several cycles of enrichment. Unfortunately, current computational methods to infer the binding preferences from high-throughput SELEX data do not exploit the richness of these data, and are under-using the most advanced computational technique, deep neural networks. RESULTS: To better characterize the binding preferences of TFs from these experimental data, we developed DeepSELEX, a new algorithm to infer intrinsic DNA-binding preferences using deep neural networks. DeepSELEX takes advantage of the richness of high-throughput sequencing data and learns the DNA-binding preferences by observing the changes in DNA sequences through the experimental cycles. DeepSELEX outperforms extant methods for the task of DNA-binding inference from high-throughput SELEX data in binding prediction in vitro and is on par with the state of the art in in vivo binding prediction. Analysis of model parameters reveals it learns biologically relevant features that shed light on TFs' binding mechanism. AVAILABILITY AND IMPLEMENTATION: DeepSELEX is available through github.com/OrensteinLab/DeepSELEX/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

DNA , High-Throughput Nucleotide Sequencing , Binding Sites , DNA/genetics , DNA/metabolism , Protein Binding , Sequence Analysis, DNA

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL