Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Bioinformatics ; 38(10): 2749-2756, 2022 05 13.
Article in English | MEDLINE | ID: mdl-35561207

ABSTRACT

MOTIVATION: Single-cell RNA-seq analysis has emerged as a powerful tool for understanding inter-cellular heterogeneity. Due to the inherent noise of the data, computational techniques often rely on dimensionality reduction (DR) as both a pre-processing step and an analysis tool. Ideally, DR should preserve the biological information while discarding the noise. However, if the DR is to be used directly to gain biological insight it must also be interpretable-that is the individual dimensions of the reduction should correspond to specific biological variables such as cell-type identity or pathway activity. Maximizing biological interpretability necessitates making assumption about the data structures and the choice of the model is critical. RESULTS: We present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that incorporates different interpretability inducing assumptions into a single modeling framework. The key advantage of our NIFA model is that it simultaneously models uni- and multi-modal latent factors, and thus isolates discrete cell-type identity and continuous pathway activity into separate components. We apply our approach to a range of datasets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA, NMF and scCoGAPS (an NMF method designed for single-cell data) in terms of disentangling biological sources of variation. Studying an immunotherapy dataset in detail, we show that NIFA is able to reproduce and refine previous findings in a single analysis framework and enables the discovery of new clinically relevant cell states. AVAILABILITY AND IMPLEMENTATION: NFIA is a R package which is freely available at GitHub (https://github.com/wgmao/NIFA). The test dataset is archived at https://zenodo.org/record/6286646. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Factor Analysis, Statistical , Sequence Analysis, RNA , Software
2.
IEEE J Biomed Health Inform ; 24(7): 1899-1906, 2020 07.
Article in English | MEDLINE | ID: mdl-31940570

ABSTRACT

OBJECTIVE: Left ventricular assist devices (LVADs) fail in up to 10% of patients due to the development of pump thrombosis. Remote monitoring of patients with LVADs can enable early detection and, subsequently, treatment and prevention of pump thrombosis. We assessed whether acoustical signals measured on the chest of patients with LVADs, combined with machine learning algorithms, can be used for detecting pump thrombosis. METHODS: 13 centrifugal pump (HVAD) recipients were enrolled in the study. When hospitalized for suspected pump thrombosis, clinical data and acoustical recordings were obtained at admission, prior to and after administration of thrombolytic therapy, and every 24 hours until laboratory and pump parameters normalized. First, we selected the most important features among our feature set using LDH-based correlation analysis. Then using these features, we trained a logistic regression model and determined our decision threshold to differentiate between thrombosis and non-thrombosis episodes. RESULTS: Accuracy, sensitivity and precision were calculated to be 88.9%, 90.9% and 83.3%, respectively. When tested on the post-thrombolysis data, our algorithm suggested possible pump abnormalities that were not identified by the reference pump power or biomarker abnormalities. SIGNIFICANCE: We showed that the acoustical signatures of LVADs can be an index of mechanical deterioration and, when combined with machine learning algorithms, provide clinical decision support regarding the presence of pump thrombosis.


Subject(s)
Heart Sounds/physiology , Heart-Assist Devices/adverse effects , Signal Processing, Computer-Assisted , Thrombosis/diagnosis , Acoustics , Aged , Algorithms , Female , Humans , Male , Middle Aged , Sound Spectrography , Stethoscopes
3.
Bioinformatics ; 34(13): i79-i88, 2018 07 01.
Article in English | MEDLINE | ID: mdl-29950006

ABSTRACT

Motivation: Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell-cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal. Results: Here, we present RAFSIL, a random forest based approach to learn cell-cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data. Availability and implementation: The RAFSIL R package is available at www.kostkalab.net/software.html. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Software , Cluster Analysis
4.
IEEE Trans Neural Syst Rehabil Eng ; 26(3): 594-601, 2018 03.
Article in English | MEDLINE | ID: mdl-29522403

ABSTRACT

In this paper, we investigate the effects of increasing mechanical stress on the knee joints by recording knee acoustical emissions and analyze them using an unsupervised graph mining algorithm. We placed miniature contact microphones on four different locations: on the lateral and medial sides of the patella and superficial to the lateral and medial meniscus. We extracted audio features in both time and frequency domains from the acoustical signals and calculated the graph community factor (GCF): an index of heterogeneity (variation) in the sounds due to different loading conditions enforced on the knee. To determine the GCF, a k-nearest neighbor graph was constructed and an Infomap community detection algorithm was used to extract all potential clusters within the graph-the number of detected communities were then quantified with GCF. Measurements from 12 healthy subjects showed that the GCF increased monotonically and significantly with vertical loading forces (mean GCF for no load = 30 and mean GCF for maximum load [body weight] = 39). This suggests that the increased complexity of the emitted sounds is related to the increased forces on the joint. In addition, microphones placed on the medial side of the patella and superficial to the lateral meniscus produced the most variation in the joint sounds. This information can be used to determine the optimal location for the microphones to obtain acoustical emissions with greatest sensitivity to loading. In future work, joint loading quantification based on acoustical emissions and derived GCF can be used for assessing cumulative knee usage and loading during activities, for example for patients rehabilitating knee injuries.


Subject(s)
Biomechanical Phenomena/physiology , Knee/physiology , Sound , Stress, Mechanical , Acoustic Stimulation , Adult , Algorithms , Female , Healthy Volunteers , Humans , Knee Joint/physiology , Male , Patella/physiology , Reproducibility of Results , Signal Processing, Computer-Assisted , Walking/physiology , Weight-Bearing , Young Adult
5.
J Appl Physiol (1985) ; 124(3): 537-547, 2018 03 01.
Article in English | MEDLINE | ID: mdl-28751371

ABSTRACT

Knee injuries and chronic disorders, such as arthritis, affect millions of Americans, leading to missed workdays and reduced quality of life. Currently, after an initial diagnosis, there are few quantitative technologies available to provide sensitive subclinical feedback to patients regarding improvements or setbacks to their knee health status; instead, most assessments are qualitative, relying on patient-reported symptoms, performance during functional tests, and physical examinations. Recent advances have been made with wearable technologies for assessing the health status of the knee (and potentially other joints) with the goal of facilitating personalized rehabilitation of injuries and care for chronic conditions. This review describes our progress in developing wearable sensing technologies that enable quantitative physiological measurements and interpretation of knee health status. Our sensing system enables longitudinal quantitative measurements of knee sounds, swelling, and activity context during clinical and field situations. Importantly, we leverage machine-learning algorithms to fuse the low-level signal and feature data of the measured time series waveforms into higher level metrics of joint health. This paper summarizes the engineering validation, baseline physiological experiments, and human subject studies-both cross-sectional and longitudinal-that demonstrate the efficacy of using such systems for robust knee joint health assessment. We envision our sensor system complementing and advancing present-day practices to reduce joint reinjury risk, to optimize rehabilitation recovery time for a quicker return to activity, and to reduce health care costs.


Subject(s)
Knee Joint/physiology , Monitoring, Physiologic/instrumentation , Wearable Electronic Devices , Biomarkers , Clinical Trials as Topic , Humans
6.
IEEE J Biomed Health Inform ; 21(4): 1172-1181, 2017 07.
Article in English | MEDLINE | ID: mdl-28113735

ABSTRACT

Complex tissues such as brain and bone marrow are made up of multiple cell types. As the study of biological tissue structure progresses, the role of cell-type-specific research becomes increasingly important. Novel sequencing technology such as single-cell cytometry provides researchers access to valuable biological data. Applying machine-learning techniques to these high-throughput datasets provides deep insights into the cellular landscape of the tissue where those cells are a part of. In this paper, we propose the use of random-forest-based single-cell profiling, a new machine-learning-based technique, to profile different cell types of intricate tissues using single-cell cytometry data. Our technique utilizes random forests to capture cell marker dependences and model the cellular populations using the cell network concept. This cellular network helps us discover what cell types are in the tissue. Our experimental results on public-domain datasets indicate promising performance and accuracy of our technique in extracting cell populations of complex tissues.


Subject(s)
Algorithms , Computational Biology/methods , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Animals , Bone Marrow Cells/cytology , Cluster Analysis , Databases, Factual , Decision Trees , Humans , Machine Learning , Mice
7.
Article in English | MEDLINE | ID: mdl-27076456

ABSTRACT

Single-cell flow cytometry is a technology that measures the expression of several cellular markers simultaneously for a large number of cells. Identification of homogeneous cell populations, currently done by manual biaxial gating, is highly subjective and time consuming. To overcome the shortcomings of manual gating, automatic algorithms have been proposed. However, the performance of these methods highly depends on the shape of populations and the dimension of the data. In this paper, we have developed a time-efficient method that accurately identifies cellular populations. This is done based on a novel technique that estimates the initial number of clusters in high dimension and identifies the final clusters by merging clusters using their phenotypic signatures in low dimension. The proposed method is called SigClust. We have applied SigClust to four public datasets and compared it with five well known methods in the field. The results are promising and indicate higher performance and accuracy compared to similar approaches reported in literature.


Subject(s)
Biomarkers/analysis , Cells , Computational Biology/methods , Flow Cytometry/methods , Phenotype , Algorithms , Animals , Cells/classification , Cells/cytology , Cluster Analysis , Databases, Factual , Humans , Mice , Software
8.
IEEE Trans Biomed Circuits Syst ; 10(5): 1012-1022, 2016 10.
Article in English | MEDLINE | ID: mdl-27654975

ABSTRACT

Single-cell technologies like flow cytometry (FCM) provide valuable biological data for knowledge discovery in complex cellular systems like tissues and organs. FCM data contains multi-dimensional information about the cellular heterogeneity of intricate cellular systems. It is possible to correlate single-cell markers with phenotypic properties of those systems. Cell population identification and clinical outcome prediction from single-cell measurements are challenging problems in the field of single cell analysis. In this paper, we propose a hybrid learning approach to predict clinical outcome using samples' single-cell FCM data. The proposed method is efficient in both i) identification of cellular clusters in each sample's FCM data and ii) predict clinical outcome (healthy versus unhealthy) for each subject. Our method is robust and the experimental results indicate promising performance.


Subject(s)
Biomarkers/metabolism , Cells, Cultured/metabolism , Cells, Cultured/pathology , Decision Support Systems, Clinical , Diagnosis, Computer-Assisted/methods , Flow Cytometry/methods , Tissue Array Analysis/methods , Humans , Outcome Assessment, Health Care/methods , Reproducibility of Results , Sensitivity and Specificity
9.
BMC Med Genomics ; 9 Suppl 2: 41, 2016 08 10.
Article in English | MEDLINE | ID: mdl-27510222

ABSTRACT

BACKGROUND: Measurement of various markers of single cells using flow cytometry has several biological applications. These applications include improving our understanding of behavior of cellular systems, identifying rare cell populations and personalized medication. A common critical issue in the existing methods is identification of the number of cellular populations which heavily affects the accuracy of results. Furthermore, anomaly detection is crucial in flow cytometry experiments. In this work, we propose a two-stage clustering technique for cell type identification in single subject flow cytometry data and extend it for anomaly detection among multiple subjects. RESULTS: Our experimentation on 42 flow cytometry datasets indicates high performance and accurate clustering (F-measure > 91 %) in identifying main cellular populations. Furthermore, our anomaly detection technique evaluated on Acute Myeloid Leukemia dataset results in only <2 % false positives.


Subject(s)
Cells/classification , Flow Cytometry/methods , Biomarkers/analysis , Cells/cytology , Cluster Analysis , Fuzzy Logic , Humans , Leukemia, Myeloid, Acute/pathology , Markov Chains
10.
Annu Int Conf IEEE Eng Med Biol Soc ; 2016: 816-819, 2016 Aug.
Article in English | MEDLINE | ID: mdl-28268450

ABSTRACT

In this work, EEG spectral features of different subjects are uniquely mapped into a 2D feature space. Such distinctive 2D features pave the way to identify subjects from their EEG spectral characteristics in an unsupervised manner without any prior knowledge. First, we extract power spectral density of EEG signals in different frequency bands. Next, we use t-distributed stochastic neighbor embedding to map data points from high dimensional space in a visible 2D space. Such non-linear data embedding method visualizes different subjects' data points as well-separated islands in two dimensions. We use a fuzzy c-means clustering technique to identify different subjects without any prior knowledge. The experimental results show that our proposed method efficiently (precision greater than 90%) discriminates 10 subjects using only the spectral information within their EEG signals.


Subject(s)
Electroencephalography , Unsupervised Machine Learning , Cluster Analysis , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...