Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
NPJ Syst Biol Appl ; 9(1): 63, 2023 Dec 18.
Article in English | MEDLINE | ID: mdl-38110446

ABSTRACT

Assessing the mutagenicity of chemicals is an essential task in the drug development process. Usually, databases and other structured sources for AMES mutagenicity exist, which have been carefully and laboriously curated from scientific publications. As knowledge accumulates over time, updating these databases is always an overhead and impractical. In this paper, we first propose the problem of predicting the mutagenicity of chemicals from textual information in scientific publications. More simply, given a chemical and evidence in the natural language form from publications where the mutagenicity of the chemical is described, the goal of the model/algorithm is to predict if it is potentially mutagenic or not. For this, we first construct a golden standard data set and then propose MutaPredBERT, a prediction model fine-tuned on BioLinkBERT based on a question-answering formulation of the problem. We leverage transfer learning and use the help of large transformer-based models to achieve a Macro F1 score of >0.88 even with relatively small data for fine-tuning. Our work establishes the utility of large language models for the construction of structured sources of knowledge bases directly from scientific publications.


Subject(s)
Mutagens , Mutagens/toxicity , Databases, Factual
2.
Mutagenesis ; 37(3-4): 191-202, 2022 10 26.
Article in English | MEDLINE | ID: mdl-35554560

ABSTRACT

Assessing a compound's mutagenicity using machine learning is an important activity in the drug discovery and development process. Traditional methods of mutagenicity detection, such as Ames test, are expensive and time and labor intensive. In this context, in silico methods that predict a compound mutagenicity with high accuracy are important. Recently, machine-learning (ML) models are increasingly being proposed to improve the accuracy of mutagenicity prediction. While these models are used in practice, there is further scope to improve the accuracy of these models. We hypothesize that choosing the right features to train the model can further lead to better accuracy. We systematically consider and evaluate a combination of novel structural and molecular features which have the maximal impact on the accuracy of models. We rigorously evaluate these features against multiple classification models (from classical ML models to deep neural network models). The performance of the models was assessed using 5- and 10-fold cross-validation and we show that our approach using the molecule structure, molecular properties, and structural alerts as feature sets successfully outperform the state-of-the-art methods for mutagenicity prediction for the Hansen et al. benchmark dataset with an area under the receiver operating characteristic curve of 0.93. More importantly, our framework shows how combining features could benefit model accuracy improvements.


Subject(s)
Machine Learning , Mutagens , Mutagens/toxicity , Mutagens/chemistry , Neural Networks, Computer , Mutagenesis
3.
Int J Mol Sci ; 23(7)2022 Mar 28.
Article in English | MEDLINE | ID: mdl-35409081

ABSTRACT

VHH, i.e., VH domains of camelid single-chain antibodies, are very promising therapeutic agents due to their significant physicochemical advantages compared to classical mammalian antibodies. The number of experimentally solved VHH structures has significantly improved recently, which is of great help, because it offers the ability to directly work on 3D structures to humanise or improve them. Unfortunately, most VHHs do not have 3D structures. Thus, it is essential to find alternative ways to get structural information. The methods of structure prediction from the primary amino acid sequence appear essential to bypass this limitation. This review presents the most extensive overview of structure prediction methods applied for the 3D modelling of a given VHH sequence (a total of 21). Besides the historical overview, it aims at showing how model software programs have been shaping the structural predictions of VHHs. A brief explanation of each methodology is supplied, and pertinent examples of their usage are provided. Finally, we present a structure prediction case study of a recently solved VHH structure. According to some recent studies and the present analysis, AlphaFold 2 and NanoNet appear to be the best tools to predict a structural model of VHH from its sequence.


Subject(s)
Camelids, New World , Immunoglobulin Heavy Chains , Amino Acid Sequence , Animals , Antibodies , Immunoglobulin Heavy Chains/chemistry , Models, Structural
4.
Data Brief ; 29: 105383, 2020 Apr.
Article in English | MEDLINE | ID: mdl-32195305

ABSTRACT

Intrinsic Disorder Proteins (IDPs) have become a hot topic since their characterisation in the 90s. The data presented in this article are related to our research entitled "A structural entropy index to analyse local conformations in Intrinsically Disordered Proteins" published in Journal of Structural Biology [1]. In this study, we quantified, for the first time, continuum from rigidity to flexibility and finally disorder. Non-disordered regions were also highlighted in the ensemble of disordered proteins. This work was done using the Protein Ensemble Database (PED), which is a useful database collecting series of protein structures considered as IDPs. The data set consists of a collection of cleaned protein files in classical pdb format that can be readily used as an input with most automatic analysis software. The accompanying data include the coding of all structural information in terms of a structural alphabet, namely Protein Blocks (PBs). An entropy index derived from PBs that allows apprehending the continuum between protein rigidity to flexibility to disorder is included, with information from secondary structure assignment, protein accessibility and prediction of disorder from the sequences. The data may be used for further structural bioinformatics studies of IDPs. It can also be used as a benchmark for evaluating disorder prediction methods.

5.
Int J Mol Sci ; 21(6)2020 Mar 24.
Article in English | MEDLINE | ID: mdl-32213914

ABSTRACT

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein-protein, protein-DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.


Subject(s)
Databases, Protein , Molecular Docking Simulation/methods , Sequence Analysis, Protein/methods , Software , Binding Sites , Humans , Ligands , Protein Binding
6.
PeerJ ; 8: e8408, 2020.
Article in English | MEDLINE | ID: mdl-32185102

ABSTRACT

Antigen binding by antibodies requires precise orientation of the complementarity- determining region (CDR) loops in the variable domain to establish the correct contact surface. Members of the family Camelidae have a modified form of immunoglobulin gamma (IgG) with only heavy chains, called Heavy Chain only Antibodies (HCAb). Antigen binding in HCAbs is mediated by only three CDR loops from the single variable domain (VHH) at the N-terminus of each heavy chain. This feature of the VHH, along with their other important features, e.g., easy expression, small size, thermo-stability and hydrophilicity, made them promising candidates for therapeutics and diagnostics. Thus, to design better VHH domains, it is important to thoroughly understand their sequence and structure characteristics and relationship. In this study, sequence characteristics of VHH domains have been analysed in depth, along with their structural features using innovative approaches, namely a structural alphabet. An elaborate summary of various studies proposing structural models of VHH domains showed diversity in the algorithms used. Finally, a case study to elucidate the differences in structural models from single and multiple templates is presented. In this case study, along with the above-mentioned aspects of VHH, an exciting view of various factors in structure prediction of VHH, like template framework selection, is also discussed.

7.
J Struct Biol ; 210(1): 107464, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31978465

ABSTRACT

Sequence - structure - function paradigm has been revolutionized by the discovery of disordered regions and disordered proteins more than two decades ago. While the definition of rigidity is simple with X-ray structures, the notion of flexibility is linked to high experimental B-factors. The definition of disordered regions is more complex as in these same X-ray structures; it is associated to the position of missing residues. Thus a continuum so seems to exist between rigidity, flexibility and disorder. However, it had not been precisely described. In this study, we used an ensemble of disordered proteins (or regions) and, we applied a structural alphabet to analyse their local conformation. This structural alphabet, namely Protein Blocks, had been efficiently used to highlight rigid local domains within flexible regions and so discriminates deformability and mobility concepts. Using an entropy index derived from this structural alphabet, we underlined its interest to measure these local dynamics, and to quantify, for the first time, continuum states from rigidity to flexibility and finally disorder. We also highlight non-disordered regions in the ensemble of disordered proteins in our study.


Subject(s)
Intrinsically Disordered Proteins/chemistry , Entropy , Protein Conformation
8.
J Biomol Struct Dyn ; 38(10): 2988-3002, 2020 Jul.
Article in English | MEDLINE | ID: mdl-31361191

ABSTRACT

Protein structures are highly dynamic macromolecules. This dynamics is often analysed through experimental and/or computational methods only for an isolated or a limited number of proteins. Here, we explore large-scale protein dynamics simulation to observe dynamics of local protein conformations using different perspectives. We analysed molecular dynamics to investigate protein flexibility locally, using classical approaches such as RMSf, solvent accessibility, but also innovative approaches such as local entropy. First, we focussed on classical secondary structures and analysed specifically how ß-strand, ß-turns, and bends evolve during molecular simulations. We underlined interesting specific bias between ß-turns and bends, which are considered as the same category, while their dynamics show differences. Second, we used a structural alphabet that is able to approximate every part of the protein structures conformations, namely protein blocks (PBs) to analyse (i) how each initial local protein conformations evolve during dynamics and (ii) if some exchange can exist among these PBs. Interestingly, the results are largely complex than simple regular/rigid and coil/flexible exchange. AbbreviationsNeqnumber of equivalentPBProtein BlocksPDBProtein DataBankRMSfroot mean square fluctuationsCommunicated by Ramaswamy H. Sarma.


Subject(s)
Molecular Dynamics Simulation , Proteins , Entropy , Protein Conformation , Protein Structure, Secondary , Proteins/genetics
9.
J Med Chem ; 62(21): 9341-9356, 2019 11 14.
Article in English | MEDLINE | ID: mdl-31117513

ABSTRACT

Halogen atoms have been at the center of many rational medicinal chemistry applications in drug design. While fluorine and chlorine atoms are often added to enhance physicochemical properties, bromine and iodine elements are generally inserted to improve selectivity. Favorable halogen interactions such as halogen bond have been thoroughly studied through quantum mechanics and statistical analyses. Although most of the studies focus on halogen interaction through its σ-hole, hydrogen bonding also has a significant impact. Here, we present an analysis describing the interacting environment of halogen atoms in protein-ligand context. With consideration of structural redundancy in the PDB, tendencies toward specific molecular interactions consideration have been refined and implications for rational drug design with halogens further discussed. Finally, we highlight the moderate occurrence of halogen bonding and present the other roles of halogen in protein-ligand complexes, completing the medicinal chemistry guide to rational halogen interactions.


Subject(s)
Drug Design , Halogens/chemistry , Proteins/metabolism , Databases, Protein , Ligands , Protein Binding , Proteins/chemistry
10.
Amino Acids ; 49(4): 705-713, 2017 04.
Article in English | MEDLINE | ID: mdl-28185014

ABSTRACT

About half of the globular proteins are composed of regular secondary structures, α-helices, and ß-sheets, while the rest are constituted of irregular secondary structures, such as turns or coil conformations. Other regular secondary structures are often ignored, despite their importance in biological processes. Among such structures, the polyproline II helix (PPII) has interesting behaviours. PPIIs are not usually associated with conventional stabilizing interactions, and recent studies have observed that PPIIs are more frequent than anticipated. In addition, it is suggested that they may have an important functional role, particularly in protein-protein or protein-nucleic acid interactions and recognition. Residues associated with PPII conformations represent nearly 5% of the total residues, but the lack of PPII assignment approaches prevents their systematic analysis. This short review will present current knowledge and recent research in PPII area. In a first step, the different methodologies able to assign PPII are presented. In the second step, recent studies that have shown new perspectives in PPII analysis in terms of structure and function are underlined with three cases: (1) PPII in protein structures. For instance, the first crystal structure of an oligoproline adopting an all-trans polyproline II (PPII) helix had been presented; (2) the involvement of PPII in different diseases and drug designs; and (3) an interesting extension of PPII study in the protein dynamics. For instance, PPIIs are often linked to disorder region analysis and the precise analysis of a potential PPII helix in hypogonadism shows unanticipated PPII formations in the patient mutation, while it is not observed in the wild-type form of KISSR1 protein.


Subject(s)
Peptides/chemistry , Humans , Models, Molecular , Protein Structure, Secondary
11.
Front Mol Biosci ; 2: 20, 2015.
Article in English | MEDLINE | ID: mdl-26075209

ABSTRACT

Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases.

SELECTION OF CITATIONS
SEARCH DETAIL
...