Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
J Comput Chem ; 45(13): 937-952, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38174834

ABSTRACT

Design of new drugs is a challenging process: a candidate molecule should satisfy multiple conditions to act properly and make the least side-effect-perfect candidates selectively attach to and influence only targets, leaving off-targets intact. The amount of experimental data about various properties of molecules constantly grows, promoting data-driven approaches. However, the applicability of typical predictive machine learning techniques can be substantially limited by a lack of experimental data about a particular target. For example, there are many known Thrombin inhibitors (acting as anticoagulants), but a very limited number of known Protein C inhibitors (coagulants). In this study, we present our approach to suggest new inhibitor candidates by building an effective representation of chemical space. For this aim, we developed a deep learning model-autoencoder, trained on a large set of molecules in the SMILES format to map the chemical space. Further, we applied different sampling strategies to generate novel coagulant candidates. Symmetrically, we tested our approach on anticoagulant candidates, where we were able to predict their inhibition towards Thrombin. We also compare our approach with MegaMolBART-another deep learning generative model, but exploiting similar principles of navigation in a chemical space.


Subject(s)
Machine Learning , Thrombin
2.
Biosemiotics ; 14(2): 253-269, 2021.
Article in English | MEDLINE | ID: mdl-33613787

ABSTRACT

The aim of the study is to analyze viruses using parameters obtained from distributions of nucleotide sequences in the viral RNA. Seeking for the input data homogeneity, we analyze single-stranded RNA viruses only. Two approaches are used to obtain the nucleotide sequences; In the first one, chunks of equal length (four nucleotides) are considered. In the second approach, the whole RNA genome is divided into parts by adenine or the most frequent nucleotide as a "space". Rank-frequency distributions are studied in both cases. The defined nucleotide sequences are signs comparable to a certain extent to syllables or words as seen from the nature of their rank-frequency distributions. Within the first approach, the Pólya and the negative hypergeometric distribution yield the best fit. For the distributions obtained within the second approach, we have calculated a set of parameters, including entropy, mean sequence length, and its dispersion. The calculated parameters became the basis for the classification of viruses. We observed that proximity of viruses on planes spanned on various pairs of parameters corresponds to related species. In certain cases, such a proximity is observed for unrelated species as well calling thus for the expansion of the set of parameters used in the classification. We also observed that the fifth most frequent nucleotide sequences obtained within the second approach are of different nature in case of human coronaviruses (different nucleotides for MERS, SARS-CoV, and SARS-CoV-2 versus identical nucleotides for four other coronaviruses). We expect that our findings will be useful as a supplementary tool in the classification of diseases caused by RNA viruses with respect to severity and contagiousness.

SELECTION OF CITATIONS
SEARCH DETAIL
...