RESUMO
Determining the redox potentials of protein cofactors and how they are influenced by their molecular neighborhoods is essential for basic research and many biotechnological applications, from biosensors and biocatalysis to bioremediation and bioelectronics. The laborious determination of redox potential with current experimental technologies pushes forward the need for computational approaches that can reliably predict it. Although current computational approaches based on quantum and molecular mechanics are accurate, their large computational costs hinder their usage. In this work, we explored the possibility of using more efficient QSPR models based on machine learning (ML) for the prediction of protein redox potential, as an alternative to classical approaches. As a proof of concept, we focused on flavoproteins, one of the most important families of enzymes directly involved in redox processes. To train and test different ML models, we retrieved a dataset of flavoproteins with a known midpoint redox potential (Em) and 3D structure. The features of interest, accounting for both short- and long-range effects of the protein matrix on the flavin cofactor, have been automatically extracted from each protein PDB file. Our best ML model (XGB) has a performance error below 1 kcal/mol (â¼36 mV), comparing favorably to more sophisticated computational approaches. We also provided indications on the features that mostly affect the Em value, and when possible, we rationalized them on the basis of previous studies.
Assuntos
Flavinas , Flavoproteínas , Flavinas/química , Flavinas/metabolismo , Flavoproteínas/química , Aprendizado de Máquina , OxirreduçãoRESUMO
Metabolism is directly and indirectly fine-tuned by a complex web of interacting regulatory mechanisms that fall into two major classes. On the one hand, the expression level of the catalyzing enzyme sets the maximal theoretical flux level (i.e., the net rate of the reaction) for each enzyme-controlled reaction. On the other hand, metabolic regulation controls the metabolic flux through the interactions of metabolites (substrates, cofactors, allosteric modulators) with the responsible enzyme. High-throughput data, such as metabolomics and transcriptomics data, if analyzed separately, do not accurately characterize the hierarchical regulation of metabolism outlined above. They must be integrated to disassemble the interdependence between different regulatory layers controlling metabolism. To this aim, we propose INTEGRATE, a computational pipeline that integrates metabolomics and transcriptomics data, using constraint-based stoichiometric metabolic models as a scaffold. We compute differential reaction expression from transcriptomics data and use constraint-based modeling to predict if the differential expression of metabolic enzymes directly originates differences in metabolic fluxes. In parallel, we use metabolomics to predict how differences in substrate availability translate into differences in metabolic fluxes. We discriminate fluxes regulated at the metabolic and/or gene expression level by intersecting these two output datasets. We demonstrate the pipeline using a set of immortalized normal and cancer breast cell lines. In a clinical setting, knowing the regulatory level at which a given metabolic reaction is controlled will be valuable to inform targeted, truly personalized therapies in cancer patients.