Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Sci Rep ; 9(1): 8469, 2019 06 11.
Article in English | MEDLINE | ID: mdl-31186508

ABSTRACT

Mass spectrometry is a valued method to evaluate the metabolomics content of a biological sample. The recent advent of rapid ionization technologies such as Laser Diode Thermal Desorption (LDTD) and Direct Analysis in Real Time (DART) has rendered high-throughput mass spectrometry possible. It is used for large-scale comparative analysis of populations of samples. In practice, many factors resulting from the environment, the protocol, and even the instrument itself, can lead to minor discrepancies between spectra, rendering automated comparative analysis difficult. In this work, a sequence/pipeline of algorithms to correct variations between spectra is proposed. The algorithms correct multiple spectra by identifying peaks that are common to all and, from those, computes a spectrum-specific correction. We show that these algorithms increase comparability within large datasets of spectra, facilitating comparative analysis, such as machine learning.

2.
Anal Chem ; 91(8): 5191-5199, 2019 04 16.
Article in English | MEDLINE | ID: mdl-30932474

ABSTRACT

Untargeted metabolomic measurements using mass spectrometry are a powerful tool for uncovering new small molecules with environmental and biological importance. The small molecule identification step, however, still remains an enormous challenge due to fragmentation difficulties or unspecific fragment ion information. Current methods to address this challenge are often dependent on databases or require the use of nuclear magnetic resonance (NMR), which have their own difficulties. The use of the gas-phase collision cross section (CCS) values obtained from ion mobility spectrometry (IMS) measurements were recently demonstrated to reduce the number of false positive metabolite identifications. While promising, the amount of empirical CCS information currently available is limited, thus predictive CCS methods need to be developed. In this article, we expand upon current experimental IMS capabilities by predicting the CCS values using a deep learning algorithm. We successfully developed and trained a prediction model for CCS values requiring only information about a compound's SMILES notation and ion type. The use of data from five different laboratories using different instruments allowed the algorithm to be trained and tested on more than 2400 molecules. The resulting CCS predictions were found to achieve a coefficient of determination of 0.97 and median relative error of 2.7% for a wide range of molecules. Furthermore, the method requires only a small amount of processing power to predict CCS values. Considering the performance, time, and resources necessary, as well as its applicability to a variety of molecules, this model was able to outperform all currently available CCS prediction algorithms.


Subject(s)
Deep Learning , Neural Networks, Computer , Algorithms , Ion Mobility Spectrometry , Magnetic Resonance Spectroscopy , Mass Spectrometry , Metabolomics
3.
Sci Rep ; 9(1): 4071, 2019 03 11.
Article in English | MEDLINE | ID: mdl-30858411

ABSTRACT

Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations by proposing strong performance guarantees, based on sample compression theory, for rule-based learning algorithms that produce highly interpretable models. We show that these guarantees can be leveraged to accelerate learning and improve model interpretability. Our approach is validated through an application to the genomic prediction of antimicrobial resistance, an important public health concern. Highly accurate models were obtained for 12 species and 56 antibiotics, and their interpretation revealed known resistance mechanisms, as well as some potentially new ones. An open-source disk-based implementation that is both memory and computationally efficient is provided with this work. The implementation is turnkey, requires no prior knowledge of machine learning, and is complemented by comprehensive tutorials.


Subject(s)
Genetic Association Studies , Genome/genetics , Machine Learning , Precision Medicine , Algorithms , Artificial Intelligence , Genomics , Humans , Software
4.
BMC Genomics ; 17(1): 754, 2016 Sep 26.
Article in English | MEDLINE | ID: mdl-27671088

ABSTRACT

BACKGROUND: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. RESULTS: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa, and S. pneumoniae for 17 antibiotics. The obtained models are accurate, faithful to the biological pathways targeted by the antibiotics, and they provide insight into the process of resistance acquisition. Moreover, a theoretical analysis of the method revealed tight statistical guarantees on the accuracy of the obtained models, supporting its relevance for genomic biomarker discovery. CONCLUSIONS: Our method allows the generation of accurate and interpretable predictive models of phenotypes, which rely on a small set of genomic variations. The method is not limited to predicting antibiotic resistance in bacteria and is applicable to a variety of organisms and phenotypes. Kover, an efficient implementation of our method, is open-source and should guide biological efforts to understand a plethora of phenotypes ( http://github.com/aldro61/kover/ ).

5.
PLoS Comput Biol ; 11(4): e1004074, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25849257

ABSTRACT

The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/.


Subject(s)
Antimicrobial Cationic Peptides/chemistry , Antimicrobial Cationic Peptides/pharmacokinetics , Bacterial Physiological Phenomena/drug effects , Drug Discovery/methods , Machine Learning , Sequence Analysis, Protein/methods , Amino Acid Sequence , Molecular Sequence Data , Pattern Recognition, Automated/methods , Peptides , Protein Interaction Mapping/methods , Structure-Activity Relationship
6.
J Immunol Methods ; 400-401: 30-6, 2013 Dec 31.
Article in English | MEDLINE | ID: mdl-24144535

ABSTRACT

We present MHC-NP, a tool for predicting peptides naturally processed by the MHC pathway. The method was part of the 2nd Machine Learning Competition in Immunology and yielded state-of-the-art accuracy for the prediction of peptides eluted from human HLA-A*02:01, HLA-B*07:02, HLA-B*35:01, HLA-B*44:03, HLA-B*53:01, HLA-B*57:01 and mouse H2-D(b) and H2-K(b) MHC molecules. We briefly explain the theory and motivations that have led to developing this tool. General applicability in the field of immunology and specifically epitope-based vaccine are expected. Our tool is freely available online and hosted by the Immune Epitope Database at http://tools.immuneepitope.org/mhcnp/.


Subject(s)
Artificial Intelligence , Epitope Mapping/methods , Major Histocompatibility Complex/immunology , Peptides/chemistry , Software , Algorithms , Animals , Antigen Presentation , H-2 Antigens/chemistry , H-2 Antigens/immunology , HLA-A2 Antigen/chemistry , HLA-A2 Antigen/immunology , HLA-B Antigens/chemistry , HLA-B Antigens/immunology , Histocompatibility Antigen H-2D/chemistry , Histocompatibility Antigen H-2D/immunology , Humans , Mice , Peptides/immunology , Protein Binding , Vaccines
7.
BMC Bioinformatics ; 14: 82, 2013 Mar 05.
Article in English | MEDLINE | ID: mdl-23497081

ABSTRACT

BACKGROUND: The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. RESULTS: We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. CONCLUSION: On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/.


Subject(s)
Artificial Intelligence , Peptides/chemistry , Protein Interaction Domains and Motifs , Protein Interaction Mapping/methods , Algorithms , Alleles , Binding Sites , Computer Simulation , Histocompatibility Antigens Class II/chemistry , Histocompatibility Antigens Class II/genetics , Histocompatibility Antigens Class II/metabolism , Peptides/immunology , Peptides/metabolism
8.
Retrovirology ; 5: 110, 2008 Dec 04.
Article in English | MEDLINE | ID: mdl-19055831

ABSTRACT

BACKGROUND: Human immunodeficiency virus type 1 (HIV-1) infects cells by means of ligand-receptor interactions. This lentivirus uses the CD4 receptor in conjunction with a chemokine coreceptor, either CXCR4 or CCR5, to enter a target cell. HIV-1 is characterized by high sequence variability. Nonetheless, within this extensive variability, certain features must be conserved to define functions and phenotypes. The determination of coreceptor usage of HIV-1, from its protein envelope sequence, falls into a well-studied machine learning problem known as classification. The support vector machine (SVM), with string kernels, has proven to be very efficient for dealing with a wide class of classification problems ranging from text categorization to protein homology detection. In this paper, we investigate how the SVM can predict HIV-1 coreceptor usage when it is equipped with an appropriate string kernel. RESULTS: Three string kernels were compared. Accuracies of 96.35% (CCR5) 94.80% (CXCR4) and 95.15% (CCR5 and CXCR4) were achieved with the SVM equipped with the distant segments kernel on a test set of 1425 examples with a classifier built on a training set of 1425 examples. Our datasets are built with Los Alamos National Laboratory HIV Databases sequences. A web server is available at http://genome.ulaval.ca/hiv-dskernel. CONCLUSION: We examined string kernels that have been used successfully for protein homology detection and propose a new one that we call the distant segments kernel. We also show how to extract the most relevant features for HIV-1 coreceptor usage. The SVM with the distant segments kernel is currently the best method described.


Subject(s)
Computational Biology/methods , Receptors, CCR5/chemistry , Receptors, CXCR4/chemistry , Receptors, CXCR4/genetics , Receptors, HIV/chemistry , Algorithms , HIV Infections/genetics , HIV Infections/metabolism , Humans , Internet , Receptors, CCR5/genetics , Receptors, CCR5/metabolism , Receptors, CXCR4/metabolism , Receptors, HIV/genetics , Receptors, HIV/metabolism , Sequence Homology, Amino Acid , Software , User-Computer Interface
9.
Environ Health Perspect ; 115(10): 1429-34, 2007 Oct.
Article in English | MEDLINE | ID: mdl-17938731

ABSTRACT

BACKGROUND: Brominated flame retardants, especially polybrominated diphenyl ethers (PBDEs), have been widely used in North America, but little is known about the level of exposure of human populations to these compounds. OBJECTIVES: We set out to assess the internal exposure of postmenopausal Canadian women to selected organobromine compounds and to investigate factors associated with this exposure. METHODS: We measured concentrations of four PBDEs, one polybrominated biphenyl, and for comparative purposes, 41 polychlorinated biphenyl (PCB) congeners in plasma samples from 110 healthy postmenopausal women who were recruited at a mammography clinic in 2003-2004. RESULTS: PBDE-47 was the major PBDE congener, with a mean (geometric) concentration of 8.1 ng/g lipids and extreme values reaching 1,780 ng/g. By comparison, the mean concentration of the major PCB congener (PCB-153) was 41.7 ng/g and the highest value was 177 ng/g. PBDEs 47, 99, and 100 were strongly intercorrelated, but weaker correlations were noted with PBDE-153. As the sum of PBDEs (summation operatorPBDEs) increased, the relative contribution of PBDE-47 to the summation operatorPBDEs increased, whereas that of PBDE-153 decreased. PBDE-153 was the only brominated compound correlated to PCB-153. PBDE levels were not linked to any sociodemographic, anthropometric, reproductive, or lifestyle variables documented in the present study. Age and body mass index gain since the age of 18 years were significant predictors of PCB-153 plasma levels. CONCLUSION: Our results suggest that exposure to PBDE-47 likely occurs through direct contact with the penta-PBDE formulation, whereas exposure to PBDE-153 may originate in part from the food chain.


Subject(s)
Bromine Compounds/blood , Environmental Exposure/adverse effects , Hydrocarbons, Brominated/blood , Phenyl Ethers/blood , Polybrominated Biphenyls/blood , Environmental Monitoring , Epidemiological Monitoring , Female , Food Chain , Halogenated Diphenyl Ethers , Humans , Middle Aged , Polychlorinated Biphenyls/blood , Postmenopause , Quebec/epidemiology
SELECTION OF CITATIONS
SEARCH DETAIL
...