Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Nat Methods ; 20(1): 104-111, 2023 01.
Article in English | MEDLINE | ID: mdl-36522501

ABSTRACT

Protein sequence alignment is a key component of most bioinformatics pipelines to study the structures and functions of proteins. Aligning highly divergent sequences remains, however, a difficult task that current algorithms often fail to perform accurately, leaving many proteins or open reading frames poorly annotated. Here we leverage recent advances in deep learning for language modeling and differentiable programming to propose DEDAL (deep embedding and differentiable alignment), a flexible model to align protein sequences and detect homologs. DEDAL is a machine learning-based model that learns to align sequences by observing large datasets of raw protein sequences and of correct alignments. Once trained, we show that DEDAL improves by up to two- or threefold the alignment correctness over existing methods on remote homologs and better discriminates remote homologs from evolutionarily unrelated sequences, paving the way to improvements on many downstream tasks relying on sequence alignment in structural and functional genomics.


Subject(s)
Algorithms , Proteins , Amino Acid Sequence , Proteins/genetics , Proteins/chemistry , Sequence Alignment , Genomics
2.
PLoS One ; 10(6): e0128570, 2015.
Article in English | MEDLINE | ID: mdl-26068103

ABSTRACT

BACKGROUND: Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. CONTRIBUTIONS: In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. RESULTS: We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.


Subject(s)
Genomics , Models, Genetic , Selection, Genetic , Algorithms , Datasets as Topic , Genomics/methods , Models, Statistical , Plants/genetics , Quantitative Trait, Heritable
3.
J Med Internet Res ; 17(1): e2, 2015 Jan 28.
Article in English | MEDLINE | ID: mdl-25630348

ABSTRACT

BACKGROUND: The prevalence of non-communicable diseases is increasing throughout the world, including developing countries. OBJECTIVE: The intent was to conduct a study of a preventive medical service in a developing country, combining eHealth checkups and teleconsultation as well as assess stratification rules and the short-term effects of intervention. METHODS: We developed an eHealth system that comprises a set of sensor devices in an attaché case, a data transmission system linked to a mobile network, and a data management application. We provided eHealth checkups for the populations of five villages and the employees of five factories/offices in Bangladesh. Individual health condition was automatically categorized into four grades based on international diagnostic standards: green (healthy), yellow (caution), orange (affected), and red (emergent). We provided teleconsultation for orange- and red-grade subjects and we provided teleprescription for these subjects as required. RESULTS: The first checkup was provided to 16,741 subjects. After one year, 2361 subjects participated in the second checkup and the systolic blood pressure of these subjects was significantly decreased from an average of 121 mmHg to an average of 116 mmHg (P<.001). Based on these results, we propose a cost-effective method using a machine learning technique (random forest method) using the medical interview, subject profiles, and checkup results as predictor to avoid costly measurements of blood sugar, to ensure sustainability of the program in developing countries. CONCLUSIONS: The results of this study demonstrate the benefits of an eHealth checkup and teleconsultation program as an effective health care system in developing countries.


Subject(s)
Chronic Disease/prevention & control , Developing Countries , Preventive Medicine/methods , Remote Consultation , Adolescent , Adult , Aged , Aged, 80 and over , Child , Delivery of Health Care , Electronic Prescribing , Female , Humans , Male , Middle Aged , Remote Consultation/instrumentation , Risk Factors , Telemedicine , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...