Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
J Chem Inf Model ; 64(12): 4897-4911, 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38838358

ABSTRACT

The recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations such as large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In this study, we aim to elucidate the unique protein features associated with Cas9 and Cas12 families and identify the features distinguishing each family from non-Cas proteins. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,494 features) encoding various physiochemical, topological, constitutional, and coevolutionary information on Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and non-Cas proteins. All the models were evaluated rigorously on the test and independent data sets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 92% and 95% on their respective independent data sets, while the multiclass classifier achieved an F1 score of close to 0.98. We observed that Quasi-Sequence-Order (QSO) descriptors like Schneider.lag and Composition descriptors like charge, volume, and polarizability are predominant in the Cas12 family. Conversely Amino Acid Composition descriptors, especially Tripeptide Composition (TPC), predominate the Cas9 family. Four of the top 10 descriptors identified in Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all Cas9 proteins and located within different catalytically important domains of the Streptococcus pyogenes Cas9 (SpCas9) structure. Among these, DHI and HHA are well-known to be involved in the DNA cleavage activity of the SpCas9 protein. Mutation studies have highlighted the significance of the PWN tripeptide in PAM recognition and DNA cleavage activity of SpCas9, while Y450 from the PYY tripeptide plays a crucial role in reducing off-target effects and improving the specificity in SpCas9. Leveraging our machine learning (ML) pipeline, we identified numerous Cas9 and Cas12 family-specific features. These features offer valuable insights for future experimental and computational studies aiming at designing Cas systems with enhanced gene-editing properties. These features suggest plausible structural modifications that can effectively guide the development of Cas proteins with improved editing capabilities.


Subject(s)
CRISPR-Associated Protein 9 , Machine Learning , CRISPR-Associated Protein 9/chemistry , CRISPR-Associated Protein 9/metabolism , CRISPR-Associated Protein 9/genetics , CRISPR-Associated Proteins/chemistry , CRISPR-Associated Proteins/metabolism , CRISPR-Cas Systems
2.
bioRxiv ; 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38328240

ABSTRACT

The recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations like large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In the current study, we aim to elucidate the unique protein attributes associated with Cas9 and Cas12 families and identify the features that distinguish each family from the other. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,495 features) encoding various physiochemical, topological, constitutional, and coevolutionary information of Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and Non-Cas proteins. All the models were evaluated rigorously on the test and independent datasets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 95% and 97% on their respective independent datasets, while the multiclass classifier achieved a high F1 score of 0.97. We observed that Quasi-sequence-order descriptors like Schneider-lag descriptors and Composition descriptors like charge, volume, and polarizability are essential for the Cas12 family. More interestingly, we discovered that Amino Acid Composition descriptors, especially the Tripeptide Composition (TPC) descriptors, are important for the Cas9 family. Four of the identified important descriptors of Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all the Cas9 proteins and were located within different catalytically important domains of the Cas9 protein structure. Among these four tripeptides, tripeptides DHI and HHA are well-known to be involved in the DNA cleavage activity of the Cas9 protein. We therefore propose the the other two tripeptides, PWN and PYY, may also be essential for the Cas9 family. Our identified important descriptors enhanced the understanding of the catalytic mechanisms of Cas9 and Cas12 proteins and provide valuable insights into design of novel Cas systems to achieve enhanced gene-editing properties.

3.
ACS Omega ; 8(23): 20379-20388, 2023 Jun 13.
Article in English | MEDLINE | ID: mdl-37323377

ABSTRACT

The nuclear receptor (NR) superfamily includes phylogenetically related ligand-activated proteins, which play a key role in various cellular activities. NR proteins are subdivided into seven subfamilies based on their function, mechanism, and nature of the interacting ligand. Developing robust tools to identify NR could give insights into their functional relationships and involvement in disease pathways. Existing NR prediction tools only use a few types of sequence-based features and are tested on relatively similar independent datasets; thus, they may suffer from overfitting when extended to new genera of sequences. To address this problem, we developed Nuclear Receptor Prediction Tool (NRPreTo), a two-level NR prediction tool with a unique training approach where in addition to the sequence-based features used by existing NR prediction tools, six additional feature groups depicting various physiochemical, structural, and evolutionary features of proteins were utilized. The first level of NRPreTo allows for the successful prediction of a query protein as NR or non-NR and further subclassifies the protein into one of the seven NR subfamilies in the second level. We developed Random Forest classifiers to test on benchmark datasets, as well as the entire human protein datasets from RefSeq and Human Protein Reference Database (HPRD). We observed that using additional feature groups improved the performance. We also observed that NRPreTo achieved high performance on the external datasets and predicted 59 novel NRs in the human proteome. The source code of NRPreTo is publicly available at https://github.com/bozdaglab/NRPreTo.

4.
Mol Divers ; 26(3): 1675-1695, 2022 Jun.
Article in English | MEDLINE | ID: mdl-34468898

ABSTRACT

Development of potential antitubercular molecules is a challenging task due to the rapidly emerging drug-resistant strains of Mycobacterium tuberculosis (M.tb). Structure-based approaches hold greater benefit in identifying compounds/drugs with desired polypharmacological profiles. These methods can be employed based on the knowledge of protein binding sites to identify the complementary ligands. In this study, polypharmacology guided computational drug repurposing approach was applied to identify potential antitubercular drugs. 20 important druggable protein targets in M.tb were considered from the target library of Molecular Property Diagnostic Suite-Tuberculosis (MPDSTB- http://mpds.neist.res.in:8084 ) for virtual screening. FDA approved drugs were collected, preprocessed and docked in the active sites of the 20 M.tb targets. The top 300 drug molecules from each target (20 × 300) were filtered-in and subsequently screened for possible antitubercular and antimycobacterial activity using PASS tool. Using this approach, 34 drugs with predicted antitubercular and anti-mycobacterial activity were identified along with good binding affinity against multiple M.tb targets. Interestingly, 21 out of the 34 identified drugs are antibiotics while 4 drug molecules (nitrofural, stavudine, quinine and quinidine) are non-antibiotics showing promising predicted antitubercular activity. Most of these molecules have the similar privileged antimycobacterial drugs scaffold. Further drug likeness properties were calculated to get deeper insights to M.tb lead molecules. Interestingly, it was also observed that the drugs identified from the study are under different stages of drug discovery (i.e., in vitro, clinical trials) for the effective treatment of various diseases including cancer, degenerative diseases, dengue virus infection, tuberculosis, etc. Krasavin et al., 2017 synthesized nitrofuran analogues with appreciable MICs (22-23 µM) against M.tb H37Rv. These experiments further add to the credibility of the drugs identified in this study (TB).


Subject(s)
Mycobacterium tuberculosis , Tuberculosis , Antitubercular Agents/chemistry , Drug Repositioning , Humans , Polypharmacology , Tuberculosis/drug therapy
5.
Comput Biol Med ; 138: 104856, 2021 11.
Article in English | MEDLINE | ID: mdl-34555571

ABSTRACT

Machine learning and data-driven approaches are currently being widely used in drug discovery and development due to their potential advantages in decision-making based on the data leveraged from existing sources. Applying these approaches to drug repurposing (DR) studies can identify new relationships between drug molecules, therapeutic targets and diseases that will eventually help in generating new insights for developing novel therapeutics. In the current study, a dataset of 1671 approved drugs is analyzed using a combined approach involving unsupervised Machine Learning (ML) techniques (Principal Component Analysis (PCA) followed by k-means clustering) and Structure-Activity Relationships (SAR) predictions for DR. PCA is applied on all the two dimensional (2D) molecular descriptors of the dataset and the first five Principal Components (PC) were subsequently used to cluster the drugs into nine well separated clusters using k-means algorithm. We further predicted the biological activities for the drug-dataset using the PASS (Predicted Activities Spectra of Substances) tool. These predicted activity values are analyzed systematically to identify repurposable drugs for various diseases. Clustering patterns obtained from k-means showed that every cluster contains subgroups of structurally similar drugs that may or may not have similar therapeutic indications. We hypothesized that such structurally similar but therapeutically different drugs can be repurposed for the native indications of other drugs of the same cluster based on their high predicted biological activities obtained from PASS analysis. In line with this, we identified 66 drugs from the nine clusters which are structurally similar but have different therapeutic uses and can therefore be repurposed for one or more native indications of other drugs of the same cluster. Some of these drugs not only share a common substructure but also bind to the same target and may have a similar mechanism of action, further supporting our hypothesis. Furthermore, based on the analysis of predicted biological activities, we identified 1423 drugs that can be repurposed for 366 new indications against several diseases. In this study, an integrated approach of unsupervised ML and SAR analysis have been used to identify new indications for approved drugs and the study provides novel insights into clustering patterns generated through descriptor level analysis of approved drugs.


Subject(s)
Drug Repositioning , Pharmaceutical Preparations , Cluster Analysis , Machine Learning , Unsupervised Machine Learning
6.
Langmuir ; 32(3): 889-99, 2016 Jan 26.
Article in English | MEDLINE | ID: mdl-26727635

ABSTRACT

The current study reports the one-step synthesis and gelation properties of cyclohexane-based bis(acyl-semicarbazide) gelators with an additional -NH group incorporated into urea moieties and carrying hydrophobic chains of varying length (C8-C18). The gels exhibited thermoreversibility and could be tuned in the presence of anions at different concentrations in addition their the ultrasound-responsive nature, thus making them multi-stimuli-responsive. The combined experimental and computational study on these gels reveals that the balance between two noncovalent interactions, viz., hydrogen bonding between the amide groups in acyl-semicarbazide moieties and van der Waals forces between long hydrocarbon tails, is found to be the determining factor in the process of organogelation. A systematic increase in alkyl chain length leads to equilibrium between these two types of noncovalent forces that is manifested in the spectral and thermal properties of the gels. The H-bonding interactions dominated up to a certain chain length, and further increases in the alkyl chain length led to increased van der Waals interactions as observed by IR, XRD, and thermal studies. Computational calculations were carried out on dimer structures of C8-C18 to understand the variation in noncovalent forces responsible for aggregate formation in the gel state as a function of the alkyl chain length. The results indicate that both intermolecular and intramolecular hydrogen bonding stabilize the aggregate structures. Supramolecular aggregation in the gel state led to the viscoelastic nature of the gels, and the addition of anions led to the disruption of self-assembly, which was studied by rheology.

SELECTION OF CITATIONS
SEARCH DETAIL
...