Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 12(1): 9101, 2022 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-35650262

RESUMO

Identification of proteins is one of the most computationally intensive steps in genomics studies. It usually relies on aligners that do not accommodate rich information on proteins and require additional pipelining steps for protein identification. We introduce kAAmer, a protein database engine based on amino-acid k-mers that provides efficient identification of proteins while supporting the incorporation of flexible annotations on these proteins. Moreover, the database is built to be used as a microservice, to be hosted and queried remotely.


Assuntos
Aminoácidos , Software , Algoritmos , Bases de Dados de Proteínas , Análise de Sequência de DNA
2.
Front Nutr ; 9: 740898, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35252288

RESUMO

Machine learning (ML) algorithms may help better understand the complex interactions among factors that influence dietary choices and behaviors. The aim of this study was to explore whether ML algorithms are more accurate than traditional statistical models in predicting vegetable and fruit (VF) consumption. A large array of features (2,452 features from 525 variables) encompassing individual and environmental information related to dietary habits and food choices in a sample of 1,147 French-speaking adult men and women was used for the purpose of this study. Adequate VF consumption, which was defined as 5 servings/d or more, was measured by averaging data from three web-based 24 h recalls and used as the outcome to predict. Nine classification ML algorithms were compared to two traditional statistical predictive models, logistic regression and penalized regression (Lasso). The performance of the predictive ML algorithms was tested after the implementation of adjustments, including normalizing the data, as well as in a series of sensitivity analyses such as using VF consumption obtained from a web-based food frequency questionnaire (wFFQ) and applying a feature selection algorithm in an attempt to reduce overfitting. Logistic regression and Lasso predicted adequate VF consumption with an accuracy of 0.64 (95% confidence interval [CI]: 0.58-0.70) and 0.64 (95%CI: 0.60-0.68) respectively. Among the ML algorithms tested, the most accurate algorithms to predict adequate VF consumption were the support vector machine (SVM) with either a radial basis kernel or a sigmoid kernel, both with an accuracy of 0.65 (95%CI: 0.59-0.71). The least accurate ML algorithm was the SVM with a linear kernel with an accuracy of 0.55 (95%CI: 0.49-0.61). Using dietary intake data from the wFFQ and applying a feature selection algorithm had little to no impact on the performance of the algorithms. In summary, ML algorithms and traditional statistical models predicted adequate VF consumption with similar accuracies among adults. These results suggest that additional research is needed to explore further the true potential of ML in predicting dietary behaviours that are determined by complex interactions among several individual, social and environmental factors.

3.
BMC Bioinformatics ; 22(1): 477, 2021 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-34607569

RESUMO

BACKGROUND: Deep learning methods are a proven commodity in many fields and endeavors. One of these endeavors is predicting the presence of adverse drug-drug interactions (DDIs). The models generated can predict, with reasonable accuracy, the phenotypes arising from the drug interactions using their molecular structures. Nevertheless, this task requires improvement to be truly useful. Given the complexity of the predictive task, an extensive benchmarking on structure-based models for DDIs prediction was performed to evaluate their drawbacks and advantages. RESULTS: We rigorously tested various structure-based models that predict drug interactions using different splitting strategies to simulate different real-world scenarios. In addition to the effects of different training and testing setups on the robustness and generalizability of the models, we then explore the contribution of traditional approaches such as multitask learning and data augmentation. CONCLUSION: Structure-based models tend to generalize poorly to unseen drugs despite their ability to identify new DDIs among drugs seen during training accurately. Indeed, they efficiently propagate information between known drugs and could be valuable for discovering new DDIs in a database. However, these models will most probably fail when exposed to unknown drugs. While multitask learning does not help in our case to solve the problem, the use of data augmentation does at least mitigate it. Therefore, researchers must be cautious of the bias of the random evaluation scheme, especially if their goal is to discover new DDIs.


Assuntos
Preparações Farmacêuticas , Bases de Dados Factuais , Interações Medicamentosas
4.
BMC Med Inform Decis Mak ; 21(1): 219, 2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-34284765

RESUMO

BACKGROUND: Polypharmacy is common among older adults and it represents a public health concern, due to the negative health impacts potentially associated with the use of several medications. However, the large number of medication combinations and sequences of use makes it complicated for traditional statistical methods to predict which therapy is genuinely associated with health outcomes. The project aims to use artificial intelligence (AI) to determine the quality of polypharmacy among older adults with chronic diseases in the province of Québec, Canada. METHODS: We will use data from the Quebec Integrated Chronic Disease Surveillance System (QICDSS). QICDSS contains information about prescribed medications in older adults in Quebec collected over 20 years. It also includes diagnostic codes and procedures, and sociodemographic data linked through a unique identification number for each individual. Our research will be structured around three interconnected research axes: AI, Health, and Law&Ethics. The AI research axis will develop algorithms for finding frequent patterns of medication use that correlate with health events, considering data locality and temporality (explainable AI or XAI). The Health research axis will translate these patterns into polypharmacy indicators relevant to public health surveillance and clinicians. The Law&Ethics axis will assess the social acceptability of the algorithms developed using AI tools and the indicators developed by the Heath axis and will ensure that the developed indicators neither discriminate against any population group nor increase the disparities already present in the use of medications. DISCUSSION: The multi-disciplinary research team consists of specialists in AI, health data, statistics, pharmacy, public health, law, and ethics, which will allow investigation of polypharmacy from different points of view and will contribute to a deeper understanding of the clinical, social, and ethical issues surrounding polypharmacy and its surveillance, as well as the use of AI for health record data. The project results will be disseminated to the scientific community, healthcare professionals, and public health decision-makers in peer-reviewed publications, scientific meetings, and reports. The diffusion of the results will ensure the confidentiality of individual data.


Assuntos
Inteligência Artificial , Polimedicação , Idoso , Doença Crônica , Análise de Dados , Humanos , Quebeque
5.
Artigo em Inglês | MEDLINE | ID: mdl-33591919

RESUMO

Within the field of electromyography-based (EMG) gesture recognition, disparities exist between the offline accuracy reported in the literature and the real-time usability of a classifier. This gap mainly stems from two factors: 1) The absence of a controller, making the data collected dissimilar to actual control. 2) The difficulty of including the four main dynamic factors (gesture intensity, limb position, electrode shift, and transient changes in the signal), as including their permutations drastically increases the amount of data to be recorded. Contrarily, online datasets are limited to the exact EMG-based controller used to record them, necessitating the recording of a new dataset for each control method or variant to be tested. Consequently, this paper proposes a new type of dataset to serve as an intermediate between offline and online datasets, by recording the data using a real-time experimental protocol. The protocol, performed in virtual reality, includes the four main dynamic factors and uses an EMG-independent controller to guide movements. This EMG-independent feedback ensures that the user is in-the-loop during recording, while enabling the resulting dynamic dataset to be used as an EMG-based benchmark. The dataset is comprised of 20 able-bodied participants completing three to four sessions over a period of 14 to 21 days. The ability of the dynamic dataset to serve as a benchmark is leveraged to evaluate the impact of different recalibration techniques for long-term (across-day) gesture recognition, including a novel algorithm, named TADANN. TADANN consistently and significantly ( [Formula: see text]) outperforms using fine-tuning as the recalibration technique.


Assuntos
Gestos , Realidade Virtual , Algoritmos , Eletromiografia , Humanos , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão
6.
Sci Rep ; 10(1): 10464, 2020 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-32591639

RESUMO

Triple negative breast cancer (TNBC) is one of the most aggressive form of breast cancer (BC) with the highest mortality due to high rate of relapse, resistance, and lack of an effective treatment. Various molecular approaches have been used to target TNBC but with little success. Here, using machine learning algorithms, we analyzed the available BC data from the Cancer Genome Atlas Network (TCGA) and have identified two potential genes, TBC1D9 (TBC1 domain family member 9) and MFGE8 (Milk Fat Globule-EGF Factor 8 Protein), that could successfully differentiate TNBC from non-TNBC, irrespective of their heterogeneity. TBC1D9 is under-expressed in TNBC as compared to non-TNBC patients, while MFGE8 is over-expressed. Overexpression of TBC1D9 has a better prognosis whereas overexpression of MFGE8 correlates with a poor prognosis. Protein-protein interaction analysis by affinity purification mass spectrometry (AP-MS) and proximity biotinylation (BioID) experiments identified a role for TBC1D9 in maintaining cellular integrity, whereas MFGE8 would be involved in various tumor survival processes. These promising genes could serve as biomarkers for TNBC and deserve further investigation as they have the potential to be developed as therapeutic targets for TNBC.


Assuntos
Neoplasias de Mama Triplo Negativas/genética , Antígenos de Superfície/genética , Biomarcadores Tumorais/genética , Proteínas de Ligação ao Cálcio/genética , Feminino , Regulação Neoplásica da Expressão Gênica/genética , Células HEK293 , Humanos , Aprendizado de Máquina , Recidiva Local de Neoplasia/genética , Prognóstico , Transcriptoma/genética , Neoplasias de Mama Triplo Negativas/patologia
7.
Artigo em Inglês | MEDLINE | ID: mdl-32195238

RESUMO

Existing research on myoelectric control systems primarily focuses on extracting discriminative characteristics of the electromyographic (EMG) signal by designing handcrafted features. Recently, however, deep learning techniques have been applied to the challenging task of EMG-based gesture recognition. The adoption of these techniques slowly shifts the focus from feature engineering to feature learning. Nevertheless, the black-box nature of deep learning makes it hard to understand the type of information learned by the network and how it relates to handcrafted features. Additionally, due to the high variability in EMG recordings between participants, deep features tend to generalize poorly across subjects using standard training methods. Consequently, this work introduces a new multi-domain learning algorithm, named ADANN (Adaptive Domain Adversarial Neural Network), which significantly enhances (p = 0.00004) inter-subject classification accuracy by an average of 19.40% compared to standard training. Using ADANN-generated features, this work provides the first topological data analysis of EMG-based gesture recognition for the characterization of the information encoded within a deep network, using handcrafted features as landmarks. This analysis reveals that handcrafted features and the learned features (in the earlier layers) both try to discriminate between all gestures, but do not encode the same information to do so. In the later layers, the learned features are inclined to instead adopt a one-vs.-all strategy for a given class. Furthermore, by using convolutional network visualization techniques, it is revealed that learned features actually tend to ignore the most activated channel during contraction, which is in stark contrast with the prevalence of handcrafted features designed to capture amplitude information. Overall, this work paves the way for hybrid feature sets by providing a clear guideline of complementary information encoded within learned and handcrafted features.

8.
Sci Rep ; 9(1): 8469, 2019 06 11.
Artigo em Inglês | MEDLINE | ID: mdl-31186508

RESUMO

Mass spectrometry is a valued method to evaluate the metabolomics content of a biological sample. The recent advent of rapid ionization technologies such as Laser Diode Thermal Desorption (LDTD) and Direct Analysis in Real Time (DART) has rendered high-throughput mass spectrometry possible. It is used for large-scale comparative analysis of populations of samples. In practice, many factors resulting from the environment, the protocol, and even the instrument itself, can lead to minor discrepancies between spectra, rendering automated comparative analysis difficult. In this work, a sequence/pipeline of algorithms to correct variations between spectra is proposed. The algorithms correct multiple spectra by identifying peaks that are common to all and, from those, computes a spectrum-specific correction. We show that these algorithms increase comparability within large datasets of spectra, facilitating comparative analysis, such as machine learning.

9.
Sensors (Basel) ; 19(12)2019 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-31238529

RESUMO

Wearable technology can be employed to elevate the abilities of humans to perform demanding and complex tasks more efficiently. Armbands capable of surface electromyography (sEMG) are attractive and noninvasive devices from which human intent can be derived by leveraging machine learning. However, the sEMG acquisition systems currently available tend to be prohibitively costly for personal use or sacrifice wearability or signal quality to be more affordable. This work introduces the 3DC Armband designed by the Biomedical Microsystems Laboratory in Laval University; a wireless, 10-channel, 1000 sps, dry-electrode, low-cost (∼150 USD) myoelectric armband that also includes a 9-axis inertial measurement unit. The proposed system is compared with the Myo Armband by Thalmic Labs, one of the most popular sEMG acquisition systems. The comparison is made by employing a new offline dataset featuring 22 able-bodied participants performing eleven hand/wrist gestures while wearing the two armbands simultaneously. The 3DC Armband systematically and significantly ( p < 0.05 ) outperforms the Myo Armband, with three different classifiers employing three different input modalities when using ten seconds or more of training data per gesture. This new dataset, alongside the source code, Altium project and 3-D models are made readily available for download within a Github repository.


Assuntos
Eletromiografia/métodos , Aprendizado de Máquina , Dispositivos Eletrônicos Vestíveis , Gestos , Humanos , Processamento de Sinais Assistido por Computador
10.
Can Assoc Radiol J ; 70(2): 107-118, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-30962048

RESUMO

Artificial intelligence (AI) software that analyzes medical images is becoming increasingly prevalent. Unlike earlier generations of AI software, which relied on expert knowledge to identify imaging features, machine learning approaches automatically learn to recognize these features. However, the promise of accurate personalized medicine can only be fulfilled with access to large quantities of medical data from patients. This data could be used for purposes such as predicting disease, diagnosis, treatment optimization, and prognostication. Radiology is positioned to lead development and implementation of AI algorithms and to manage the associated ethical and legal challenges. This white paper from the Canadian Association of Radiologists provides a framework for study of the legal and ethical issues related to AI in medical imaging, related to patient data (privacy, confidentiality, ownership, and sharing); algorithms (levels of autonomy, liability, and jurisprudence); practice (best practices and current legal framework); and finally, opportunities in AI from the perspective of a universal health care system.


Assuntos
Inteligência Artificial/ética , Inteligência Artificial/legislação & jurisprudência , Radiologia/ética , Radiologia/legislação & jurisprudência , Canadá , Humanos , Guias de Prática Clínica como Assunto , Radiologistas/ética , Radiologistas/legislação & jurisprudência , Sociedades Médicas
11.
Anal Chem ; 91(8): 5191-5199, 2019 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-30932474

RESUMO

Untargeted metabolomic measurements using mass spectrometry are a powerful tool for uncovering new small molecules with environmental and biological importance. The small molecule identification step, however, still remains an enormous challenge due to fragmentation difficulties or unspecific fragment ion information. Current methods to address this challenge are often dependent on databases or require the use of nuclear magnetic resonance (NMR), which have their own difficulties. The use of the gas-phase collision cross section (CCS) values obtained from ion mobility spectrometry (IMS) measurements were recently demonstrated to reduce the number of false positive metabolite identifications. While promising, the amount of empirical CCS information currently available is limited, thus predictive CCS methods need to be developed. In this article, we expand upon current experimental IMS capabilities by predicting the CCS values using a deep learning algorithm. We successfully developed and trained a prediction model for CCS values requiring only information about a compound's SMILES notation and ion type. The use of data from five different laboratories using different instruments allowed the algorithm to be trained and tested on more than 2400 molecules. The resulting CCS predictions were found to achieve a coefficient of determination of 0.97 and median relative error of 2.7% for a wide range of molecules. Furthermore, the method requires only a small amount of processing power to predict CCS values. Considering the performance, time, and resources necessary, as well as its applicability to a variety of molecules, this model was able to outperform all currently available CCS prediction algorithms.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Espectrometria de Mobilidade Iônica , Espectroscopia de Ressonância Magnética , Espectrometria de Massas , Metabolômica
12.
Sci Rep ; 9(1): 4071, 2019 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-30858411

RESUMO

Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations by proposing strong performance guarantees, based on sample compression theory, for rule-based learning algorithms that produce highly interpretable models. We show that these guarantees can be leveraged to accelerate learning and improve model interpretability. Our approach is validated through an application to the genomic prediction of antimicrobial resistance, an important public health concern. Highly accurate models were obtained for 12 species and 56 antibiotics, and their interpretation revealed known resistance mechanisms, as well as some potentially new ones. An open-source disk-based implementation that is both memory and computationally efficient is provided with this work. The implementation is turnkey, requires no prior knowledge of machine learning, and is complemented by comprehensive tutorials.


Assuntos
Estudos de Associação Genética , Genoma/genética , Aprendizado de Máquina , Medicina de Precisão , Algoritmos , Inteligência Artificial , Genômica , Humanos , Software
13.
IEEE Trans Neural Syst Rehabil Eng ; 27(4): 760-771, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30714928

RESUMO

In recent years, deep learning algorithms have become increasingly more prominent for their unparalleled ability to automatically learn discriminant features from large amounts of data. However, within the field of electromyography-based gesture recognition, deep learning algorithms are seldom employed as they require an unreasonable amount of effort from a single person, to generate tens of thousands of examples. This paper's hypothesis is that general, informative features can be learned from the large amounts of data generated by aggregating the signals of multiple users, thus reducing the recording burden while enhancing gesture recognition. Consequently, this paper proposes applying transfer learning on aggregated data from multiple users while leveraging the capacity of deep learning algorithms to learn discriminant features from large datasets. Two datasets comprised 19 and 17 able-bodied participants, respectively (the first one is employed for pre-training), were recorded for this work, using the Myo armband. A third Myo armband dataset was taken from the NinaPro database and is comprised ten able-bodied participants. Three different deep learning networks employing three different modalities as input (raw EMG, spectrograms, and continuous wavelet transform (CWT)) are tested on the second and third dataset. The proposed transfer learning scheme is shown to systematically and significantly enhance the performance for all three networks on the two datasets, achieving an offline accuracy of 98.31% for 7 gestures over 17 participants for the CWT-based ConvNet and 68.98% for 18 gestures over 10 participants for the raw EMG-based ConvNet. Finally, a use-case study employing eight able-bodied participants suggests that real-time feedback allows users to adapt their muscle activation strategy which reduces the degradation in accuracy normally experienced over time.


Assuntos
Aprendizado Profundo , Eletromiografia/métodos , Gestos , Algoritmos , Bases de Dados Factuais , Humanos , Redes Neurais de Computação , Transferência de Experiência , Análise de Ondaletas , Dispositivos Eletrônicos Vestíveis
14.
Mol Biol Evol ; 34(10): 2716-2729, 2017 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-28957508

RESUMO

Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets.


Assuntos
Biologia Computacional/métodos , Genoma Bacteriano/genética , Análise de Sequência de DNA/métodos , Bactérias/genética , Evolução Biológica , Análise por Conglomerados , Simulação por Computador , Evolução Molecular , Genômica/métodos , Metagenômica , Filogenia , Células Procarióticas , Software
15.
BMC Genomics ; 17(1): 754, 2016 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-27671088

RESUMO

BACKGROUND: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. RESULTS: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa, and S. pneumoniae for 17 antibiotics. The obtained models are accurate, faithful to the biological pathways targeted by the antibiotics, and they provide insight into the process of resistance acquisition. Moreover, a theoretical analysis of the method revealed tight statistical guarantees on the accuracy of the obtained models, supporting its relevance for genomic biomarker discovery. CONCLUSIONS: Our method allows the generation of accurate and interpretable predictive models of phenotypes, which rely on a small set of genomic variations. The method is not limited to predicting antibiotic resistance in bacteria and is applicable to a variety of organisms and phenotypes. Kover, an efficient implementation of our method, is open-source and should guide biological efforts to understand a plethora of phenotypes ( http://github.com/aldro61/kover/ ).

16.
PLoS Comput Biol ; 11(4): e1004074, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25849257

RESUMO

The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/.


Assuntos
Peptídeos Catiônicos Antimicrobianos/química , Peptídeos Catiônicos Antimicrobianos/farmacocinética , Fenômenos Fisiológicos Bacterianos/efeitos dos fármacos , Descoberta de Drogas/métodos , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Peptídeos , Mapeamento de Interação de Proteínas/métodos , Relação Estrutura-Atividade
17.
J Immunol Methods ; 400-401: 30-6, 2013 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-24144535

RESUMO

We present MHC-NP, a tool for predicting peptides naturally processed by the MHC pathway. The method was part of the 2nd Machine Learning Competition in Immunology and yielded state-of-the-art accuracy for the prediction of peptides eluted from human HLA-A*02:01, HLA-B*07:02, HLA-B*35:01, HLA-B*44:03, HLA-B*53:01, HLA-B*57:01 and mouse H2-D(b) and H2-K(b) MHC molecules. We briefly explain the theory and motivations that have led to developing this tool. General applicability in the field of immunology and specifically epitope-based vaccine are expected. Our tool is freely available online and hosted by the Immune Epitope Database at http://tools.immuneepitope.org/mhcnp/.


Assuntos
Inteligência Artificial , Mapeamento de Epitopos/métodos , Complexo Principal de Histocompatibilidade/imunologia , Peptídeos/química , Software , Algoritmos , Animais , Apresentação de Antígeno , Antígenos H-2/química , Antígenos H-2/imunologia , Antígeno HLA-A2/química , Antígeno HLA-A2/imunologia , Antígenos HLA-B/química , Antígenos HLA-B/imunologia , Antígeno de Histocompatibilidade H-2D/química , Antígeno de Histocompatibilidade H-2D/imunologia , Humanos , Camundongos , Peptídeos/imunologia , Ligação Proteica , Vacinas
18.
Gigascience ; 2(1): 10, 2013 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-23870653

RESUMO

BACKGROUND: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

19.
BMC Bioinformatics ; 14: 82, 2013 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-23497081

RESUMO

BACKGROUND: The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. RESULTS: We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. CONCLUSION: On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/.


Assuntos
Inteligência Artificial , Peptídeos/química , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Alelos , Sítios de Ligação , Simulação por Computador , Antígenos de Histocompatibilidade Classe II/química , Antígenos de Histocompatibilidade Classe II/genética , Antígenos de Histocompatibilidade Classe II/metabolismo , Peptídeos/imunologia , Peptídeos/metabolismo
20.
Big Data ; 1(4): 227-36, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27447255

RESUMO

As analysts are expected to process a greater amount of information in a shorter amount of time, creators of big data software are challenged with the need for improved efficiency. Ray, our group's usable, scalable genome assembler, addresses big data problems by using optimal resources and producing one, correct and conservative, timely solution. Only by abstracting the size of the data from both the computers and the humans can the real scientific question, often complex in itself, eventually be solved. To draw a curtain over the specific computational machinery of big data, we developed RayPlatform, a programming framework that allows users to concentrate on their domain-specific problems. RayPlatform is a parallel message-passing software framework that runs on clouds, supercomputers, and desktops alike. Using established technologies such as C++ and MPI (message-passing interface), we handle the genomes of hundreds of species, from viruses to plants, using machines ranging from desktop computers to supercomputers. From this experience, we present insights on making computer time more useful-and user time much more valuable.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...