Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Sci Rep ; 9(1): 20353, 2019 12 30.
Article in English | MEDLINE | ID: mdl-31889137

ABSTRACT

In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. To improve the clustering of small disease- or tissue-specific datasets, for which the identification of rare cell types is often problematic, we propose a transfer learning method to utilize large and well-annotated reference datasets, such as those produced by the Human Cell Atlas. Our approach modifies the dataset of interest while incorporating key information from the larger reference dataset via Non-negative Matrix Factorization (NMF). The modified dataset is subsequently provided to a clustering algorithm. We empirically evaluate the benefits of our approach on simulated scRNA-Seq data as well as on publicly available datasets. Finally, we present results for the analysis of a recently published small dataset and find improved clustering when transferring knowledge from a large reference dataset. Implementations of the method are available at https://github.com/nicococo/scRNA.


Subject(s)
Cluster Analysis , Computational Biology , Gene Expression Profiling , Machine Learning , Sequence Analysis, RNA , Single-Cell Analysis , Algorithms , Computational Biology/methods , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing , Humans , ROC Curve , Reproducibility of Results , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Transcriptome
2.
PLoS One ; 12(3): e0174392, 2017.
Article in English | MEDLINE | ID: mdl-28346487

ABSTRACT

High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motifPOIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets.


Subject(s)
Computational Biology/methods , Machine Learning , Support Vector Machine , Algorithms
3.
IEEE Trans Neural Syst Rehabil Eng ; 24(9): 961-970, 2016 09.
Article in English | MEDLINE | ID: mdl-26513794

ABSTRACT

Fundamental changes over time of surface EMG signal characteristics are a challenge for myocontrol algorithms controlling prosthetic devices. These changes are generally caused by electrode shifts after donning and doffing, sweating, additional weight or varying arm positions, which results in a change of the signal distribution-a scenario often referred to as covariate shift. A substantial decrease in classification accuracy due to these factors hinders the possibility to directly translate EMG signals into accurate myoelectric control patterns outside laboratory conditions. To overcome this limitation, we propose the use of supervised adaptation methods. The approach is based on adapting a trained classifier using a small calibration set only, which incorporates the relevant aspects of the nonstationarities, but requires only less than 1 min of data recording. The method was tested first through an offline analysis on signals acquired across 5 days from seven able-bodied individuals and four amputees. Moreover, we also conducted a three day online experiment on eight able-bodied individuals and one amputee, assessing user performance and user-ratings of the controllability. Across different testing days, both offline and online performance improved significantly when shrinking the training model parameters by a given estimator towards the calibration set parameters. In the offline data analysis, the classification accuracy remained above 92% over five days with the proposed approach, whereas it decreased to 75% without adaptation. Similarly, in the online study, with the proposed approach the performance increased by 25% compared to a test without adaptation. These results indicate that the proposed methodology can contribute to improve robustness of myoelectric pattern recognition methods in daily life applications.


Subject(s)
Amputation Stumps/physiopathology , Artificial Limbs , Electromyography/methods , Muscle Contraction/physiology , Muscle, Skeletal/physiology , Pattern Recognition, Automated/methods , Adult , Algorithms , Amputees/rehabilitation , Data Interpretation, Statistical , Humans , Male , Middle Aged , Radius/surgery , Reproducibility of Results , Sensitivity and Specificity , Young Adult
4.
PLoS One ; 10(12): e0144782, 2015.
Article in English | MEDLINE | ID: mdl-26690911

ABSTRACT

Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but--due to its black-box character--motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs--regardless of their length and complexity--underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set.


Subject(s)
Machine Learning , Models, Genetic , Nucleotide Motifs , Sequence Analysis, DNA/methods , Humans
5.
Article in English | MEDLINE | ID: mdl-25570960

ABSTRACT

Ensuring robustness of myocontrol algorithms for prosthetic devices is an important challenge. Robustness needs to be maintained under nonstationarities, e.g. due to electrode shifts after donning and doffing, sweating, additional weight or varying arm positions. Such nonstationary behavior changes the signal distributions - a scenario often referred to as covariate shift. This circumstance causes a significant decrease in classification accuracy in daily life applications. Re-training is possible but it is time consuming since it requires a large number of trials. In this paper, we propose to adapt the EMG classifier by a small calibration set only, which is able to capture the relevant aspects of the nonstationarities, but requires re-training data of only very short duration. We tested this strategy on signals acquired across 5 days in able-bodied individuals. The results showed that an estimator that shrinks the training model parameters towards the calibration set parameters significantly increased the classifier performance across different testing days. Even when using only one trial per class as re-training data for each day, the classification accuracy remained > 92% over five days. These results indicate that the proposed methodology can be a practical means for improving robustness in pattern recognition methods for myocontrol.


Subject(s)
Electromyography/methods , Prostheses and Implants , Adult , Algorithms , Discriminant Analysis , Electromyography/instrumentation , Female , Hand/physiology , Humans , Male , Movement , Pattern Recognition, Automated , Time Factors , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...