Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Ann N Y Acad Sci ; 1387(1): 112-123, 2017 01.
Article in English | MEDLINE | ID: mdl-27801987

ABSTRACT

Big Data is no longer solely the purview of big organizations with big resources. Today's routine tools and experimental methods can generate large slices of data. For example, high-throughput sequencing can quickly interrogate biological systems for the expression levels of thousands of different RNAs, examine epigenetic marks throughout the genome, and detect differences in the genomes of individuals. Multichannel electrophysiology platforms produce gigabytes of data in just a few minutes of recording. Imaging systems generate videos capturing biological behaviors over the course of days. Thus, any researcher now has access to a veritable wealth of data. However, the ability of any given researcher to utilize that data is limited by her/his own resources and skills for downloading, storing, and analyzing the data. In this paper, we examine the necessary resources required to engage Big Data, survey the state of modern data analysis pipelines, present a few data repository case studies, and touch on current institutions and programs supporting the work that relies on Big Data.


Subject(s)
Biomedical Research/methods , Cloud Computing , Computer Communication Networks , Systems Biology/methods , Access to Information , Animals , Biomedical Research/trends , Cloud Computing/trends , Computer Communication Networks/instrumentation , Computer Communication Networks/trends , Data Mining/methods , Data Mining/trends , Decision Making, Computer-Assisted , Genomics/methods , Genomics/trends , Humans , Image Processing, Computer-Assisted , Internet , Software , Systems Biology/instrumentation , Systems Biology/trends
2.
J Athl Train ; 51(1): 74-81, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26794628

ABSTRACT

CONTEXT: Analysis of injury and illness data collected at large international competitions provides the US Olympic Committee and the national governing bodies for each sport with information to best prepare for future competitions. Research in which authors have evaluated medical contacts to provide the expected level of medical care and sports medicine services at international competitions is limited. OBJECTIVE: To analyze the medical-contact data for athletes, staff, and coaches who participated in the 2011 Pan American Games in Guadalajara, Mexico, using unsupervised modeling techniques to identify underlying treatment patterns. DESIGN: Descriptive epidemiology study. SETTING: Pan American Games. PATIENTS OR OTHER PARTICIPANTS: A total of 618 U.S. athletes (337 males, 281 females) participated in the 2011 Pan American Games. MAIN OUTCOME MEASURE(S): Medical data were recorded from the injury-evaluation and injury-treatment forms used by clinicians assigned to the central US Olympic Committee Sport Medicine Clinic and satellite locations during the operational 17-day period of the 2011 Pan American Games. We used principal components analysis and agglomerative clustering algorithms to identify and define grouped modalities. Lift statistics were calculated for within-cluster subgroups. RESULTS: Principal component analyses identified 3 components, accounting for 72.3% of the variability in datasets. Plots of the principal components showed that individual contacts focused on 4 treatment clusters: massage, paired manipulation and mobilization, soft tissue therapy, and general medical. CONCLUSIONS: Unsupervised modeling techniques were useful for visualizing complex treatment data and provided insights for improved treatment modeling in athletes. Given its ability to detect clinically relevant treatment pairings in large datasets, unsupervised modeling should be considered a feasible option for future analyses of medical-contact data from international competitions.


Subject(s)
Athletic Injuries/therapy , Sports/physiology , Athletes/statistics & numerical data , Athletic Injuries/epidemiology , Athletic Injuries/ethnology , Female , Forecasting , Humans , Male , Mexico/epidemiology , Patient Acceptance of Health Care/statistics & numerical data , Physical Therapy Modalities/statistics & numerical data , Sports Medicine/statistics & numerical data , United States/ethnology
3.
J Chem Inf Model ; 53(12): 3352-66, 2013 Dec 23.
Article in English | MEDLINE | ID: mdl-24261543

ABSTRACT

Computational methods that can identify CYP-mediated sites of metabolism (SOMs) of drug-like compounds have become required tools for early stage lead optimization. In recent years, methods that combine CYP binding site features with CYP/ligand binding information have been sought in order to increase the prediction accuracy of such hybrid models over those that use only one representation. Two challenges that any hybrid ligand/structure-based method must overcome are (1) identification of the best binding pose for a specific ligand with a given CYP and (2) appropriately incorporating the results of docking with ligand reactivity. To address these challenges we have created Docking-Regioselectivity-Predictor (DR-Predictor)--a method that incorporates flexible docking-derived information with specialized electronic reactivity and multiple-instance-learning methods to predict CYP-mediated SOMs. In this study, the hybrid ligand-structure-based DR-Predictor method was tested on substrate sets for CYP 1A2 and CYP 2A6. For these data, the DR-Predictor model was found to identify the experimentally observed SOM within the top two predicted rank-positions for 86% of the 261 1A2 substrates and 83% of the 100 2A6 substrates. Given the accuracy and extendibility of the DR-Predictor method, we anticipate that it will further facilitate the prediction of CYP metabolism liabilities and aid in in-silico ADMET assessment of novel structures.


Subject(s)
Artificial Intelligence , Aryl Hydrocarbon Hydroxylases/chemistry , Cytochrome P-450 CYP1A2/chemistry , Molecular Docking Simulation , Small Molecule Libraries/chemistry , Aryl Hydrocarbon Hydroxylases/metabolism , Biotransformation , Catalytic Domain , Cytochrome P-450 CYP1A2/metabolism , Cytochrome P-450 CYP2A6 , Humans , Hydrogen Bonding , Hydrophobic and Hydrophilic Interactions , Ligands , Protein Binding , Small Molecule Libraries/metabolism , Structure-Activity Relationship , Substrate Specificity , Thermodynamics
4.
Bioinformatics ; 29(4): 497-8, 2013 Feb 15.
Article in English | MEDLINE | ID: mdl-23242264

ABSTRACT

SUMMARY: Regioselectivity-WebPredictor (RS-WebPredictor) is a server that predicts isozyme-specific cytochrome P450 (CYP)-mediated sites of metabolism (SOMs) on drug-like molecules. Predictions may be made for the promiscuous 2C9, 2D6 and 3A4 CYP isozymes, as well as CYPs 1A2, 2A6, 2B6, 2C8, 2C19 and 2E1. RS-WebPredictor is the first freely accessible server that predicts the regioselectivity of the last six isozymes. Server execution time is fast, taking on average 2s to encode a submitted molecule and 1s to apply a given model, allowing for high-throughput use in lead optimization projects. AVAILABILITY: RS-WebPredictor is accessible for free use at http://reccr.chem.rpi.edu/Software/RS-WebPredictor/


Subject(s)
Cytochrome P-450 Enzyme System/metabolism , Software , Algorithms , Cinnarizine/chemistry , Cinnarizine/metabolism , Isoenzymes/metabolism
5.
J Chem Inf Model ; 52(6): 1637-59, 2012 Jun 25.
Article in English | MEDLINE | ID: mdl-22524152

ABSTRACT

RS-Predictor is a tool for creating pathway-independent, isozyme-specific, site of metabolism (SOM) prediction models using any set of known cytochrome P450 (CYP) substrates and metabolites. Until now, the RS-Predictor method was only trained and validated on CYP 3A4 data, but in the present study, we report on the versatility the RS-Predictor modeling paradigm by creating and testing regioselectivity models for substrates of the nine most important CYP isozymes. Through curation of source literature, we have assembled 680 substrates distributed among CYPs 1A2, 2A6, 2B6, 2C19, 2C8, 2C9, 2D6, 2E1, and 3A4, the largest publicly accessible collection of P450 ligands and metabolites released to date. A comprehensive investigation into the importance of different descriptor classes for identifying the regioselectivity mediated by each isozyme is made through the generation of multiple independent RS-Predictor models for each set of isozyme substrates. Two of these models include a density functional theory (DFT) reactivity descriptor derived from SMARTCyp. Optimal combinations of RS-Predictor and SMARTCyp are shown to have stronger performance than either method alone, while also exceeding the accuracy of the commercial regioselectivity prediction methods distributed by Optibrium and Schrödinger, correctly identifying a large proportion of the metabolites in each substrate set within the top two rank-positions: 1A2 (83.0%), 2A6 (85.7%), 2B6 (82.1%), 2C19 (86.2%), 2C8 (83.8%), 2C9 (84.5%), 2D6 (85.9%), 2E1 (82.8%), 3A4 (82.3%), and merged (86.0%). Comprehensive datamining of each substrate set and careful statistical analyses of the predictions made by the different models revealed new insights into molecular features that control metabolic regioselectivity and enable accurate prospective prediction of likely SOMs.


Subject(s)
Cytochrome P-450 Enzyme System/metabolism , Isoenzymes/metabolism , Substrate Specificity
6.
J Med Chem ; 55(5): 2035-47, 2012 Mar 08.
Article in English | MEDLINE | ID: mdl-22280316

ABSTRACT

Treatment of ionization and tautomerism of ligands and receptors is one of the unresolved issues in structure-based prediction of binding affinities. Our solution utilizes the thermodynamic master equation, expressing the experimentally observed association constant as the sum of products, each valid for a specific ligand-receptor species pair, consisting of the association microconstant and the fractions of the involved ligand and receptor species. The microconstants are characterized by structure-based simulations, which are run for individual species pairs. Here we incorporated the multispecies approach into the QM/MM linear response method and used it for structural correlation of published inhibition data on mitogen-activated protein kinase (MAPK)-activated protein kinase (MK2) by 66 benzothiophene and pyrrolopyridine analogues, forming up to five tautomers and seven ionization species under experimental conditions. Extensive cross-validation showed that the resulting models were stable and predictive. Inclusion of all tautomers and ionization ligand species was essential: the explained variance increased to 90% from 66% for the single-species model.


Subject(s)
Intracellular Signaling Peptides and Proteins/antagonists & inhibitors , Models, Molecular , Protein Serine-Threonine Kinases/antagonists & inhibitors , Pyridines/chemistry , Pyrroles/chemistry , Quantitative Structure-Activity Relationship , Thiophenes/chemistry , Computer Simulation , Hydrogen Bonding , Hydrophobic and Hydrophilic Interactions , Intracellular Signaling Peptides and Proteins/chemistry , Ions , Isomerism , Ligands , Protein Binding , Protein Serine-Threonine Kinases/chemistry , Quantum Theory , Solutions , Thermodynamics , Water
7.
IEEE Trans Pattern Anal Mach Intell ; 34(6): 1068-79, 2012 Jun.
Article in English | MEDLINE | ID: mdl-21987558

ABSTRACT

We present a bundle algorithm for multiple-instance classification and ranking. These frameworks yield improved models on many problems possessing special structure. Multiple-instance loss functions are typically nonsmooth and nonconvex, and current algorithms convert these to smooth nonconvex optimization problems that are solved iteratively. Inspired by the latest linear-time subgradient-based methods for support vector machines, we optimize the objective directly using a nonconvex bundle method. Computational results show this method is linearly scalable, while not sacrificing generalization accuracy, permitting modeling on new and larger data sets in computational chemistry and other applications. This new implementation facilitates modeling with kernels.


Subject(s)
Algorithms , Pattern Recognition, Automated/methods , Artificial Intelligence , Humans , Neural Networks, Computer , Support Vector Machine
8.
J Chem Inf Model ; 51(11): 2808-20, 2011 Nov 28.
Article in English | MEDLINE | ID: mdl-21999408

ABSTRACT

Least-squares fitting of the Hill equation to quantitative high-throughput screening (qHTS) assays results in frequent unsatisfactory fits. We learn and exploit prior knowledge to improve the Hill fitting in a nonlinear regression method called domain knowledge fitter (DK-fitter). This paper formulates and solves DK-fitter for 44 public qHTS data sets. This new Hill parameter estimation technique is validated using three unbiased approaches, including a novel method that involves generating simulated samples. This paper fosters the extraction of higher quality information from screens for improved potency evaluation.


Subject(s)
Computational Biology/methods , High-Throughput Screening Assays , Models, Chemical , Computational Biology/statistics & numerical data , Drug Design , Enzyme Inhibitors/pharmacology , Pyruvate Kinase/antagonists & inhibitors , Pyruvate Kinase/metabolism , Quantitative Structure-Activity Relationship , Regression Analysis
9.
J Chem Inf Model ; 51(7): 1667-89, 2011 Jul 25.
Article in English | MEDLINE | ID: mdl-21528931

ABSTRACT

This article describes RegioSelectivity-Predictor (RS-Predictor), a new in silico method for generating predictive models of P450-mediated metabolism for drug-like compounds. Within this method, potential sites of metabolism (SOMs) are represented as "metabolophores": A concept that describes the hierarchical combination of topological and quantum chemical descriptors needed to represent the reactivity of potential metabolic reaction sites. RS-Predictor modeling involves the use of metabolophore descriptors together with multiple-instance ranking (MIRank) to generate an optimized descriptor weight vector that encodes regioselectivity trends across all cases in a training set. The resulting pathway-independent (O-dealkylation vs N-oxidation vs Csp(3) hydroxylation, etc.), isozyme-specific regioselectivity model may be used to predict potential metabolic liabilities. In the present work, cross-validated RS-Predictor models were generated for a set of 394 substrates of CYP 3A4 as a proof-of-principle for the method. Rank aggregation was then employed to merge independently generated predictions for each substrate into a single consensus prediction. The resulting consensus RS-Predictor models were shown to reliably identify at least one observed site of metabolism in the top two rank-positions on 78% of the substrates. Comparisons between RS-Predictor and previously described regioselectivity prediction methods reveal new insights into how in silico metabolite prediction methods should be compared.


Subject(s)
Cytochrome P-450 CYP3A , Models, Molecular , Acetaminophen/chemistry , Acetaminophen/metabolism , Binding Sites , Cytochrome P-450 CYP3A/chemistry , Cytochrome P-450 CYP3A/metabolism , Molecular Structure , Stereoisomerism , Warfarin/chemistry , Warfarin/metabolism
10.
Mol Inform ; 30(9): 765-77, 2011 Sep.
Article in English | MEDLINE | ID: mdl-27467409

ABSTRACT

Making suitable modeling choices is crucial for successful in silico drug design, and one of the most important of these is the proper extraction and curation of data from qHTS screens, and the use of optimized statistical learning methods to obtain valid models. More specifically, we aim to learn the top-1 % most potent compounds against a variety of targets in a procedure we call virtual screening hit identification (VISHID). To do so, we exploit quantitative high-throughput screens (qHTS) obtained from PubChem, descriptors derived from molecular structures, and support vector machines (SVM) for model generation. Our results illustrate how an appreciation of subtle issues underlying qHTS data extraction and the resulting SVM models created using these data can enhance the effectiveness of solutions and, in doing so, accelerate drug discovery.

11.
BMC Genomics ; 8: 380, 2007 Oct 19.
Article in English | MEDLINE | ID: mdl-17949499

ABSTRACT

BACKGROUND: Recently, we demonstrated that human mesenchymal stem cells (hMSC) stimulated with dexamethazone undergo gene focusing during osteogenic differentiation (Stem Cells Dev 14(6): 1608-20, 2005). Here, we examine the protein expression profiles of three additional populations of hMSC stimulated to undergo osteogenic differentiation via either contact with pro-osteogenic extracellular matrix (ECM) proteins (collagen I, vitronectin, or laminin-5) or osteogenic media supplements (OS media). Specifically, we annotate these four protein expression profiles, as well as profiles from naïve hMSC and differentiated human osteoblasts (hOST), with known gene ontologies and analyze them as a tensor with modes for the expressed proteins, gene ontologies, and stimulants. RESULTS: Direct component analysis in the gene ontology space identifies three components that account for 90% of the variance between hMSC, osteoblasts, and the four stimulated hMSC populations. The directed component maps the differentiation stages of the stimulated stem cell populations along the differentiation axis created by the difference in the expression profiles of hMSC and hOST. Surprisingly, hMSC treated with ECM proteins lie closer to osteoblasts than do hMSC treated with OS media. Additionally, the second component demonstrates that proteomic profiles of collagen I- and vitronectin-stimulated hMSC are distinct from those of OS-stimulated cells. A three-mode tensor analysis reveals additional focus proteins critical for characterizing the phenotypic variations between naïve hMSC, partially differentiated hMSC, and hOST. CONCLUSION: The differences between the proteomic profiles of OS-stimulated hMSC and ECM-hMSC characterize different transitional phenotypes en route to becoming osteoblasts. This conclusion is arrived at via a three-mode tensor analysis validated using hMSC plated on laminin-5.


Subject(s)
Bone Development , Mesenchymal Stem Cells/metabolism , Osteoblasts/metabolism , Proteomics , Cell Differentiation , Humans , Mesenchymal Stem Cells/cytology , Osteoblasts/cytology
SELECTION OF CITATIONS
SEARCH DETAIL
...