Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
PLoS Comput Biol ; 16(9): e1008191, 2020 09.
Article in English | MEDLINE | ID: mdl-32970665

ABSTRACT

Environmental toxicants affect human health in various ways. Of the thousands of chemicals present in the environment, those with adverse effects on the endocrine system are referred to as endocrine-disrupting chemicals (EDCs). Here, we focused on a subclass of EDCs that impacts the estrogen receptor (ER), a pivotal transcriptional regulator in health and disease. Estrogenic activity of compounds can be measured by many in vitro or cell-based high throughput assays that record various endpoints from large pools of cells, and increasingly at the single-cell level. To simultaneously capture multiple mechanistic ER endpoints in individual cells that are affected by EDCs, we previously developed a sensitive high throughput/high content imaging assay that is based upon a stable cell line harboring a visible multicopy ER responsive transcription unit and expressing a green fluorescent protein (GFP) fusion of ER. High content analysis generates voluminous multiplex data comprised of minable features that describe numerous mechanistic endpoints. In this study, we present a machine learning pipeline for rapid, accurate, and sensitive assessment of the endocrine-disrupting potential of benchmark chemicals based on data generated from high content analysis. The multidimensional imaging data was used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, both linear logistic regression and nonlinear Random Forest classifiers were benchmarked and evaluated for predicting the estrogenic activity of unknown compounds. Furthermore, through feature selection, data visualization, and model discrimination, the most informative features were identified for the classification of ER agonists/antagonists. The results of this data-driven study showed that highly accurate and generalized classification models with a minimum number of features can be constructed without loss of generality, where these machine learning models serve as a means for rapid mechanistic/phenotypic evaluation of the estrogenic potential of many chemicals.


Subject(s)
Algorithms , Estrogens/classification , Machine Learning , Cell Line , Estrogens/metabolism , Humans , Receptors, Estrogen/metabolism
2.
AIChE J ; 66(10)2020 Oct.
Article in English | MEDLINE | ID: mdl-32921798

ABSTRACT

Support Vector Machines (SVMs) based optimization framework is presented for the data-driven optimization of numerically infeasible Differential Algebraic Equations (DAEs) without the full discretization of the underlying first-principles model. By formulating the stability constraint of the numerical integration of a DAE system as a supervised classification problem, we are able to demonstrate that SVMs can accurately map the boundary of numerical infeasibility. The necessity of this data-driven approach is demonstrated on a 2-dimensional motivating example, where highly accurate SVM models are trained, validated, and tested using the data collected from the numerical integration of DAEs. Furthermore, this methodology is extended and tested for a multi-dimensional case study from reaction engineering (i.e., thermal cracking of natural gas liquids). The data-driven optimization of this complex case study is explored through integrating the SVM models with a constrained global grey-box optimization algorithm, namely the ARGONAUT framework.

3.
J Glob Optim ; 78(1): 1-36, 2020 Sep.
Article in English | MEDLINE | ID: mdl-32753792

ABSTRACT

The Data-driven Optimization of bi-level Mixed-Integer NOnlinear problems (DOMINO) framework is presented for addressing the optimization of bi-level mixed-integer nonlinear programming problems. In this framework, bi-level optimization problems are approximated as single-level optimization problems by collecting samples of the upper-level objective and solving the lower-level problem to global optimality at those sampling points. This process is done through the integration of the DOMINO framework with a grey-box optimization solver to perform design of experiments on the upper-level objective, and to consecutively approximate and optimize bi-level mixed-integer nonlinear programming problems that are challenging to solve using exact methods. The performance of DOMINO is assessed through solving numerous bi-level benchmark problems, a land allocation problem in Food-Energy-Water Nexus, and through employing different data-driven optimization methodologies, including both local and global methods. Although this data-driven approach cannot provide a theoretical guarantee to global optimality, we present an algorithmic advancement that can guarantee feasibility to large-scale bi-level optimization problems when the lower-level problem is solved to global optimality at convergence.

4.
Ind Eng Chem Res ; 59(6): 2291-2306, 2020 Feb 12.
Article in English | MEDLINE | ID: mdl-32549652

ABSTRACT

We propose a novel active fault-tolerant control strategy that combines machine learning based process monitoring and explicit/multiparametric model predictive control (mp-MPC). The strategy features (i) data-driven fault detection and diagnosis models by using the support vector machine (SVM) algorithm, (ii) ranking via a nonlinear, kernel-dependent, SVM-based feature selection algorithm, (iii) data-driven regression models for fault magnitude estimation via the random forest algorithm, and (iv) a parametric optimization and control (PAROC) framework for the design of the explicit/multiparametric model predictive controller. The resulting explicit control strategies correspond to affine functions of the system states and the magnitude of the detected fault. A semibatch process, an example for penicillin production, is presented to demonstrate how the proposed framework ensures smart operation for which rapid switches between a priori computed explicit control action strategies are enabled by continuous process monitoring information.

5.
ESCAPE ; 46: 967-972, 2019.
Article in English | MEDLINE | ID: mdl-31612156

ABSTRACT

The National Institute of Environmental Health Sciences (NIEHS) Superfund Research Program (SRP) aims to support university-based multidisciplinary research on human health and environmental issues related to hazardous substances and pollutants. The Texas A&M Superfund Research Program comprehensively evaluates the complexities of hazardous chemical mixtures and their potential adverse health impacts due to exposure through a number of multi-disciplinary projects and cores. One of the essential components of the Texas A&M Superfund Research Center is the Data Science Core, which serves as the basis for translating the data produced by the multi-disciplinary research projects into useful knowledge for the community via data collection, quality control, analysis, and model generation. In this work, we demonstrate the Texas A&M Superfund Research Program computational platform, which houses and integrates large-scale, diverse datasets generated across the Center, provides basic visualization service to facilitate interpretation, monitors data quality, and finally implements a variety of state-of-the-art statistical analysis for model/tool development. The platform is aimed to facilitate effective integration and collaboration across the Center and acts as an enabler for the dissemination of comprehensive ad-hoc tools and models developed to address the environmental and health effects of chemical mixture exposure during environmental emergency-related contamination events.

6.
PLoS One ; 14(10): e0223517, 2019.
Article in English | MEDLINE | ID: mdl-31600275

ABSTRACT

A detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the choices for their detailed analytical characterization. Yet, in lieu of exact chemical composition of complex substances, evaluation of the degree of similarity is a sensible path toward decision-making in environmental health regulations. Grouping of similar complex substances is a challenge that can be addressed via advanced analytical methods and streamlined data analysis and visualization techniques. Here, we propose a framework with unsupervised and supervised analyses to optimally group complex substances based on their analytical features. We test two data sets of complex oil-derived substances. The first data set is from gas chromatography-mass spectrometry (GC-MS) analysis of 20 Standard Reference Materials representing crude oils and oil refining products. The second data set consists of 15 samples of various gas oils analyzed using three analytical techniques: GC-MS, GC×GC-flame ionization detection (FID), and ion mobility spectrometry-mass spectrometry (IM-MS). We use hierarchical clustering using Pearson correlation as a similarity metric for the unsupervised analysis and build classification models using the Random Forest algorithm for the supervised analysis. We present a quantitative comparative assessment of clustering results via Fowlkes-Mallows index, and classification results via model accuracies in predicting the group of an unknown complex substance. We demonstrate the effect of (i) different grouping methodologies, (ii) data set size, and (iii) dimensionality reduction on the grouping quality, and (iv) different analytical techniques on the characterization of the complex substances. While the complexity and variability in chemical composition are an inherent feature of complex substances, we demonstrate how the choices of the data analysis and visualization methods can impact the communication of their characteristics to delineate sufficient similarity.


Subject(s)
Chemistry Techniques, Analytical/methods , Gas Chromatography-Mass Spectrometry , Petroleum/analysis , Principal Component Analysis , Reference Standards , Sample Size
7.
AIChE J ; 65(3): 992-1005, 2019 Mar.
Article in English | MEDLINE | ID: mdl-32377021

ABSTRACT

In this article, we present (1) a feature selection algorithm based on nonlinear support vector machine (SVM) for fault detection and diagnosis in continuous processes and (2) results for the Tennessee Eastman benchmark process. The presented feature selection algorithm is derived from the sensitivity analysis of the dual C-SVM objective function. This enables simultaneous modeling and feature selection paving the way for simultaneous fault detection and diagnosis, where feature ranking guides fault diagnosis. We train fault-specific two-class SVM models to detect faulty operations, while using the feature selection algorithm to improve the accuracy and perform the fault diagnosis. Our results show that the developed SVM models outperform the available ones in the literature both in terms of detection accuracy and latency. Moreover, it is shown that the loss of information is minimized with the use of feature selection techniques compared to feature extraction techniques such as principal component analysis (PCA). This further facilitates a more accurate interpretation of the results.

8.
ESCAPE ; 43: 421-426, 2018.
Article in English | MEDLINE | ID: mdl-30534632

ABSTRACT

The ultimate goal of the Texas A&M Superfund program is to develop comprehensive tools and models for addressing exposure to chemical mixtures during environmental emergency-related contamination events. With that goal, we aim to design a framework for optimal grouping of chemical mixtures based on their chemical characteristics and bioactivity properties, and facilitate comparative assessment of their human health impacts through read-across. The optimal clustering of the chemical mixtures guides the selection of sorption material in such a way that the adverse health effects of each group are mitigated. Here, we perform (i) hierarchical clustering of complex substances using chemical and biological data, and (ii) predictive modeling of the sorption activity of broad-acting materials via regression techniques. Dimensionality reduction techniques are also incorporated to further improve the results. We adopt several recent examples of chemical substances of Unknown or Variable composition Complex reaction products and Biological materials (UVCB) as benchmark complex substances, where the grouping of them is optimized by maximizing the Fowlkes-Mallows (FM) index. The effect of clustering method and different visualization techniques are shown to influence the communication of the groupings for read-across.

9.
Int Symp Process Syst Eng ; 44: 2077-2082, 2018.
Article in English | MEDLINE | ID: mdl-30534633

ABSTRACT

Rapid detection and identification of process faults in industrial applications is crucial to sustain a safe and profitable operation. Today, the advances in sensor technologies have facilitated large amounts of chemical process data collection in real time which subsequently broadened the use of data-driven process monitoring techniques via machine learning and multivariate statistical analysis. One of the well-known machine learning techniques is Support Vector Machines (SVM) which allows the use of high dimensional feature sets for learning problems such as classification and regression. In this paper, we present the application of a novel nonlinear (kernel-dependent) SVM-based feature selection algorithm to process monitoring and fault detection of continuous processes. The developed methodology is derived from sensitivity analysis of the dual SVM objective and utilizes existing and novel greedy algorithms to rank features that also guides fault diagnosis. Specifically, we train fault-specific two-class SVM models to detect faulty operations, while using the feature selection algorithm to improve the accuracy of the fault detection models and perform fault diagnosis. We present results for the Tennessee Eastman process as a case study and compare our approach to existing approaches for fault detection, diagnosis and identification.

10.
Comput Chem Eng ; 115: 46-63, 2018 Jul 12.
Article in English | MEDLINE | ID: mdl-30386002

ABSTRACT

This paper presents a novel data-driven framework for process monitoring in batch processes, a critical task in industry to attain a safe operability and minimize loss of productivity and profit. We exploit high dimensional process data with nonlinear Support Vector Machine-based feature selection algorithm, where we aim to retrieve the most informative process measurements for accurate and simultaneous fault detection and diagnosis. The proposed framework is applied to an extensive benchmark dataset which includes process data describing 22,200 batches with 15 faults. We train fault and time-specific models on the prealigned batch data trajectories via three distinct time horizon approaches: one-step rolling, two-step rolling, and evolving which varies the amount of data incorporation during modeling. The results show that two-step rolling and evolving time horizon approaches perform superior to the other. Regardless of the approach, proposed framework provides a promising decision support tool for online simultaneous fault detection and diagnosis for batch processes.

11.
Sci Rep ; 8(1): 9939, 2018 07 02.
Article in English | MEDLINE | ID: mdl-29967418

ABSTRACT

Every two years groups worldwide participate in the Critical Assessment of Protein Structure Prediction (CASP) experiment to blindly test the strengths and weaknesses of their computational methods. CASP has significantly advanced the field but many hurdles still remain, which may require new ideas and collaborations. In 2012 a web-based effort called WeFold, was initiated to promote collaboration within the CASP community and attract researchers from other fields to contribute new ideas to CASP. Members of the WeFold coopetition (cooperation and competition) participated in CASP as individual teams, but also shared components of their methods to create hybrid pipelines and actively contributed to this effort. We assert that the scale and diversity of integrative prediction pipelines could not have been achieved by any individual lab or even by any collaboration among a few partners. The models contributed by the participating groups and generated by the pipelines are publicly available at the WeFold website providing a wealth of data that remains to be tapped. Here, we analyze the results of the 2014 and 2016 pipelines showing improvements according to the CASP assessment as well as areas that require further adjustments and research.


Subject(s)
Caspase 12/metabolism , Caspases/metabolism , Computational Biology/methods , Models, Molecular , Software , Caspase 12/chemistry , Caspases/chemistry , Humans , Protein Conformation
12.
Biochem J ; 474(5): 781-795, 2017 02 20.
Article in English | MEDLINE | ID: mdl-28082425

ABSTRACT

Conjugation of Nedd8 (neddylation) to Cullins (Cul) in Cul-RING E3 ligases (CRLs) stimulates ubiquitination and polyubiquitination of protein substrates. CRL is made up of two Cul-flanked arms: one consists of the substrate-binding and adaptor proteins and the other consists of E2 and Ring-box protein (Rbx). Polyubiquitin chain length and topology determine the substrate fate. Here, we ask how polyubiquitin chains are accommodated in the limited space available between the two arms and what determines the polyubiquitin linkage topology. We focus on Cul5 and Rbx1 in three states: before Cul5 neddylation (closed state), after neddylation (open state), and after deneddylation, exploiting molecular dynamics simulations and the Gaussian Network Model. We observe that regulation of substrate ubiquitination and polyubiquitination takes place through Rbx1 rotations, which are controlled by Nedd8-Rbx1 allosteric communication. Allosteric propagation proceeds from Nedd8 via Cul5 dynamic hinges and hydrogen bonds between the C-terminal domain of Cul5 (Cul5CTD) and Rbx1 (Cul5CTD residues R538/R569 and Rbx1 residue E67, or Cul5CTD E474/E478/N491 and Rbx1 K105). Importantly, at each ubiquitination step (homogeneous or heterogeneous, linear or branched), the polyubiquitin linkages fit into the distances between the two arms, and these match the inherent CRL conformational tendencies. Hinge sites may constitute drug targets.


Subject(s)
Carrier Proteins/metabolism , Cullin Proteins/metabolism , Protein Processing, Post-Translational , Ubiquitins/metabolism , Allosteric Regulation , Allosteric Site , Carrier Proteins/chemistry , Cullin Proteins/chemistry , Humans , Hydrogen Bonding , Kinetics , Molecular Dynamics Simulation , NEDD8 Protein , Polyubiquitin , Protein Binding , Protein Interaction Domains and Motifs , Protein Structure, Secondary , Substrate Specificity , Ubiquitination , Ubiquitins/chemistry
13.
PLoS One ; 11(2): e0148974, 2016.
Article in English | MEDLINE | ID: mdl-26859389

ABSTRACT

HIV-1 entry into host cells is mediated by interactions between the V3-loop of viral glycoprotein gp120 and chemokine receptor CCR5 or CXCR4, collectively known as HIV-1 coreceptors. Accurate genotypic prediction of coreceptor usage is of significant clinical interest and determination of the factors driving tropism has been the focus of extensive study. We have developed a method based on nonlinear support vector machines to elucidate the interacting residue pairs driving coreceptor usage and provide highly accurate coreceptor usage predictions. Our models utilize centroid-centroid interaction energies from computationally derived structures of the V3-loop:coreceptor complexes as primary features, while additional features based on established rules regarding V3-loop sequences are also investigated. We tested our method on 2455 V3-loop sequences of various lengths and subtypes, and produce a median area under the receiver operator curve of 0.977 based on 500 runs of 10-fold cross validation. Our study is the first to elucidate a small set of specific interacting residue pairs between the V3-loop and coreceptors capable of predicting coreceptor usage with high accuracy across major HIV-1 subtypes. The developed method has been implemented as a web tool named CRUSH, CoReceptor USage prediction for HIV-1, which is available at http://ares.tamu.edu/CRUSH/.


Subject(s)
HIV-1/physiology , Receptors, CCR5/physiology , Receptors, CXCR4/physiology , Energy Metabolism , HIV Envelope Protein gp120/physiology , HIV-1/growth & development , Humans , Models, Biological , Protein Interaction Domains and Motifs/physiology , Structure-Activity Relationship , Tropism
SELECTION OF CITATIONS
SEARCH DETAIL
...