RESUMO
Peptides have emerged as promising therapeutic agents. However, their potential is hindered by hemotoxicity. Understanding the hemotoxicity of peptides is crucial for developing safe and effective peptide-based therapeutics. Here, we employed chemical space complex networks (CSNs) to unravel the hemotoxicity tapestry of peptides. CSNs are powerful tools for visualizing and analyzing the relationships between peptides based on their physicochemical properties and structural features. We constructed CSNs from the StarPepDB database, encompassing 2004 hemolytic peptides, and explored the impact of seven different (dis)similarity measures on network topology and cluster (communities) distribution. Our findings revealed that each CSN extracts orthogonal information, enhancing the motif discovery and enrichment process. We identified 12 consensus hemolytic motifs, whose amino acid composition unveiled a high abundance of lysine, leucine, and valine residues, while aspartic acid, methionine, histidine, asparagine and glutamine were depleted. Additionally, physicochemical properties were used to characterize clusters/communities of hemolytic peptides. To predict hemolytic activity directly from peptide sequences, we constructed multi-query similarity searching models (MQSSMs), which outperformed cutting-edge machine learning (ML)-based models, demonstrating robust hemotoxicity prediction capabilities. Overall, this novel in silico approach uses complex network science as its central strategy to develop robust model classifiers, to characterize the chemical space and to discover new motifs from hemolytic peptides. This will help to enhance the design/selection of peptides with potential therapeutic activity and low toxicity.
RESUMO
Antiviral peptides (AVPs) represent a promising strategy for addressing the global challenges of viral infections and their growing resistances to traditional drugs. Lab-based AVP discovery methods are resource-intensive, highlighting the need for efficient computational alternatives. In this study, we developed five non-trained but supervised multi-query similarity search models (MQSSMs) integrated into the StarPep toolbox. Rigorous testing and validation across diverse AVP datasets confirmed the models' robustness and reliability. The top-performing model, M13+, demonstrated impressive results, with an accuracy of 0.969 and a Matthew's correlation coefficient of 0.71. To assess their competitiveness, the top five models were benchmarked against 14 publicly available machine-learning and deep-learning AVP predictors. The MQSSMs outperformed these predictors, highlighting their efficiency in terms of resource demand and public accessibility. Another significant achievement of this study is the creation of the most comprehensive dataset of antiviral sequences to date. In general, these results suggest that MQSSMs are promissory tools to develop good alignment-based models that can be successfully applied in the screening of large datasets for new AVP discovery.
RESUMO
Peptide-based drugs are promising anticancer candidates due to their biocompatibility and low toxicity. In particular, tumor-homing peptides (THPs) have the ability to bind specifically to cancer cell receptors and tumor vasculature. Despite their potential to develop antitumor drugs, there are few available prediction tools to assist the discovery of new THPs. Two webservers based on machine learning models are currently active, the TumorHPD and the THPep, and more recently the SCMTHP. Herein, a novel method based on network science and similarity searching implemented in the starPep toolbox is presented for THP discovery. The approach leverages from exploring the structural space of THPs with Chemical Space Networks (CSNs) and from applying centrality measures to identify the most relevant and non-redundant THP sequences within the CSN. Such THPs were considered as queries (Qs) for multi-query similarity searches that apply a group fusion (MAX-SIM rule) model. The resulting multi-query similarity searching models (SSMs) were validated with three benchmarking datasets of THPs/non-THPs. The predictions achieved accuracies that ranged from 92.64 to 99.18% and Matthews Correlation Coefficients between 0.894-0.98, outperforming state-of-the-art predictors. The best model was applied to repurpose AMPs from the starPep database as THPs, which were subsequently optimized for the TH activity. Finally, 54 promising THP leads were discovered, and their sequences were analyzed to encounter novel motifs. These results demonstrate the potential of CSNs and multi-query similarity searching for the rapid and accurate identification of THPs.