Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
1.
Sci Rep ; 14(1): 13558, 2024 06 12.
Article in English | MEDLINE | ID: mdl-38866809

ABSTRACT

Longitudinal studies that continuously generate data enable the capture of temporal variations in experimentally observed parameters, facilitating the interpretation of results in a time-aware manner. We propose IL-VIS (incrementally learned visualizer), a new machine learning pipeline that incrementally learns and visualizes a progression trajectory representing the longitudinal changes in longitudinal studies. At each sampling time point in an experiment, IL-VIS generates a snapshot of the longitudinal process on the data observed thus far, a new feature that is beyond the reach of classical static models. We first verify the utility and correctness of IL-VIS using simulated data, for which the true progression trajectories are known. We find that it accurately captures and visualizes the trends and (dis)similarities between high-dimensional progression trajectories. We then apply IL-VIS to longitudinal multi-electrode array data from brain cortical organoids when exposed to different levels of quinolinic acid, a metabolite contributing to many neuroinflammatory diseases including Alzheimer's disease, and its blocking antibody. We uncover valuable insights into the organoids' electrophysiological maturation and response patterns over time under these conditions.


Subject(s)
Machine Learning , Longitudinal Studies , Humans , Organoids , Alzheimer Disease/metabolism , Brain/physiology
2.
Biosystems ; 220: 104749, 2022 Oct.
Article in English | MEDLINE | ID: mdl-35917953

ABSTRACT

High throughput technologies used in experimental biological sciences produce data with a vast number of variables at a rapid pace, making large volumes of high-dimensional data available. The exploratory analysis of such high-dimensional data can be aided by human interpretable low-dimensional visualizations. This work investigates how both discrete and continuous structures in biological data can be captured using the recently proposed dimensionality reduction method SONG, and compares the results with commonly used methods UMAP and PHATE. Using simulated and real-world datasets, we observe that SONG produces insightful visualizations by preserving various patterns, including discrete clusters, continuums, and branching structures in all considered datasets. More importantly, for datasets containing both discrete and continuous structures, SONG performs better at preserving both the structures compared to UMAP and PHATE. Furthermore, our quantitative evaluation of the three methods using downstream analysis validates the on par quality of the SONG's low-dimensional embeddings compared to the other methods.

3.
Epilepsia ; 63(7): 1693-1703, 2022 07.
Article in English | MEDLINE | ID: mdl-35460272

ABSTRACT

OBJECTIVE: Antiseizure drugs (ASDs) modulate synaptic and ion channel function to prevent abnormal hypersynchronous or excitatory activity arising in neuronal networks, but the relationship between ASDs with respect to their impact on network activity is poorly defined. In this study, we first investigated whether different ASD classes exert differential impact upon network activity, and we then sought to classify ASDs according to their impact on network activity. METHODS: We used multielectrode arrays (MEAs) to record the network activity of cultured cortical neurons after applying ASDs from two classes: sodium channel blockers (SCBs) and γ-aminobutyric acid type A receptor-positive allosteric modulators (GABA PAMs). A two-dimensional representation of changes in network features was then derived, and the ability of this low-dimensional representation to classify ASDs with different molecular targets was assessed. RESULTS: A two-dimensional representation of network features revealed a separation between the SCB and GABA PAM drug classes, and could classify several test compounds known to act through these molecular targets. Interestingly, several ASDs with novel targets, such as cannabidiol and retigabine, had closer similarity to the SCB class with respect to their impact upon network activity. SIGNIFICANCE: These results demonstrate that the molecular target of two common classes of ASDs is reflected through characteristic changes in network activity of cultured neurons. Furthermore, a low-dimensional representation of network features can be used to infer an ASDs molecular target. This approach may allow for drug screening to be performed based on features extracted from MEA recordings.


Subject(s)
Neurons , Unsupervised Machine Learning , Neurons/physiology , Receptors, GABA , Sodium Channel Blockers , gamma-Aminobutyric Acid
4.
J Biomech ; 125: 110552, 2021 08 26.
Article in English | MEDLINE | ID: mdl-34237661

ABSTRACT

Joint angle quantification from inertial measurement units (IMUs) is commonly performed using kinematic modelling, which depends on anatomical sensor placement and/or functional joint calibration; however, accurate three-dimensional joint motion measurement remains challenging to achieve. The aims of this study were firstly to employ deep neural networks to convert IMU data to ankle joint angles that are indistinguishable from those derived from motion capture-based inverse kinematics (IK) - the reference standard; and secondly, to validate the robustness of this approach across contrasting walking speeds in healthy individuals. Kinematics data were simultaneously calculated using IMUs and IK from 9 subjects during walking on a treadmill at 0.5 m/s, 1.0 m/s and 1.5 m/s. A generative adversarial network was trained using gait data at two of the walking speeds to predict ankle kinematics from IMU data alone for the third walking speed. There were significant differences between IK and IMU joint angle predictions for ankle eversion and internal rotation during walking at 0.5 m/s and 1.0 m/s (p < 0.001); however, no significant differences in joint angles were observed between the generative adversarial network prediction and IK at any speed or plane of joint motion (p < 0.05). The RMS difference in ankle joint kinematics between the generative adversarial network and IK for walking at 1.0 m/s was 3.8°, 2.1° and 3.5° for dorsiflexion, inversion and axial rotation, respectively. The modeling approach presented for real-time IMU to ankle joint angle conversion, which can be readily expanded to other joints, may provide enhanced IMU capability in applications such as telemedicine, remote monitoring and rehabilitation.


Subject(s)
Ankle Joint , Walking , Biomechanical Phenomena , Gait , Humans , Neural Networks, Computer
5.
IEEE Trans Neural Netw Learn Syst ; 32(10): 4588-4602, 2021 Oct.
Article in English | MEDLINE | ID: mdl-32997635

ABSTRACT

Nonparametric dimensionality reduction techniques, such as t-distributed Stochastic Neighbor Embedding (t-SNE) and uniform manifold approximation and projection (UMAP), are proficient in providing visualizations for data sets of fixed sizes. However, they cannot incrementally map and insert new data points into an already provided data visualization. We present self-organizing nebulous growths (SONG), a parametric nonlinear dimensionality reduction technique that supports incremental data visualization, i.e., incremental addition of new data while preserving the structure of the existing visualization. In addition, SONG is capable of handling new data increments, no matter whether they are similar or heterogeneous to the already observed data distribution. We test SONG on a variety of real and simulated data sets. The results show that SONG is superior to Parametric t-SNE, t-SNE, and UMAP in incremental data visualization. Especially, for heterogeneous increments, SONG improves over Parametric t-SNE by 14.98% on the Fashion MNIST data set and 49.73% on the MNIST data set regarding the cluster quality measured by the adjusted mutual information scores. On similar or homogeneous increments, the improvements are 8.36% and 42.26%, respectively. Furthermore, even when the abovementioned data sets are presented all at once, SONG performs better or comparable to UMAP and superior to t-SNE. We also demonstrate that the algorithmic foundations of SONG render it more tolerant to noise compared with UMAP and t-SNE, thus providing greater utility for data with high variance, high mixing of clusters, or noise.

6.
BMC Mol Cell Biol ; 21(Suppl 1): 34, 2020 Aug 19.
Article in English | MEDLINE | ID: mdl-32814564

ABSTRACT

BACKGROUND: Microbial Interaction Networks (MINs) provide important information for understanding bacterial communities. MINs can be inferred by examining microbial abundance profiles. Abundance profiles are often interpreted with the Lotka Volterra model in research. However existing research fails to consider a biologically meaningful underlying mathematical model for MINs or to address the possibility of multiple solutions. RESULTS: In this paper we present IMPARO, a method for inferring microbial interactions through parameter optimisation. We use biologically meaningful models for both the abundance profile, as well as the MIN. We show how multiple MINs could be inferred with similar reconstructed abundance profile accuracy, and argue that a unique solution is not always satisfactory. Using our method, we successfully inferred clear interactions in the gut microbiome which have been previously observed in in-vitro experiments. CONCLUSIONS: IMPARO was used to successfully infer microbial interactions in human microbiome samples as well as in a varied set of simulated data. The work also highlights the importance of considering multiple solutions for MINs.


Subject(s)
Bacteria/metabolism , Gastrointestinal Microbiome , Microbial Interactions/physiology , Models, Biological , Algorithms , Data Accuracy , Feces/microbiology , Female , Healthy Volunteers , Humans , Male
7.
BMC Bioinformatics ; 19(Suppl 13): 377, 2019 Feb 04.
Article in English | MEDLINE | ID: mdl-30717665

ABSTRACT

BACKGROUND: Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. RESULTS: In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. CONCLUSIONS: These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.


Subject(s)
Algorithms , Ecological and Environmental Phenomena , Metagenomics , Computer Simulation , Fermented Foods , Gastrointestinal Microbiome , Genome, Viral , Humans , Lakes/virology , Time Factors , Viruses/genetics
8.
Front Microbiol ; 9: 1763, 2018.
Article in English | MEDLINE | ID: mdl-30177915

ABSTRACT

Microorganisms are critical to maintaining stratified biogeochemical characteristics in meromictic lakes; however, their community composition and potential roles in nutrient cycling are not thoroughly described. Both metagenomics and metaviromics were used to determine the composition and capacity of archaea, bacteria, and viruses along the water column in the landlocked meromictic Lake Shunet in Siberia. Deep sequencing of 265 Gb and high-quality assembly revealed a near-complete genome corresponding to Nonlabens sp. sh3vir. in a viral sample and 38 bacterial bins (0.2-5.3 Mb each). The mixolimnion (3.0 m) had the most diverse archaeal, bacterial, and viral communities, followed by the monimolimnion (5.5 m) and chemocline (5.0 m). The bacterial and archaeal communities were dominated by Thiocapsa and Methanococcoides, respectively, whereas the viral community was dominated by Siphoviridae. The archaeal and bacterial assemblages and the associated energy metabolism were significantly related to the various depths, in accordance with the stratification of physicochemical parameters. Reconstructed elemental nutrient cycles of the three layers were interconnected, including co-occurrence of denitrification and nitrogen fixation in each layer and involved unique processes due to specific biogeochemical properties at the respective depths. According to the gene annotation, several pre-dominant yet unknown and uncultured bacteria also play potentially important roles in nutrient cycling. Reciprocal BLAST analysis revealed that the viruses were specific to the host archaea and bacteria in the mixolimnion. This study provides insights into the bacterial, archaeal, and viral assemblages and the corresponding capacity potentials in Lake Shunet, one of the three meromictic lakes in central Asia. Lake Shunet was determined to harbor specific and diverse viral, bacterial, and archaeal communities that intimately interacted, revealing patterns shaped by indigenous physicochemical parameters.

9.
BMC Bioinformatics ; 19(1): 129, 2018 04 11.
Article in English | MEDLINE | ID: mdl-29642848

ABSTRACT

BACKGROUND: Drug repositioning is the process of identifying new uses for existing drugs. Computational drug repositioning methods can reduce the time, costs and risks of drug development by automating the analysis of the relationships in pharmacology networks. Pharmacology networks are large and heterogeneous. Clustering drugs into small groups can simplify large pharmacology networks, these subgroups can also be used as a starting point for repositioning drugs. In this paper, we propose a two-tiered drug-centric unsupervised clustering approach for drug repositioning, integrating heterogeneous drug data profiles: drug-chemical, drug-disease, drug-gene, drug-protein and drug-side effect relationships. RESULTS: The proposed drug repositioning approach is threefold; (i) clustering drugs based on their homogeneous profiles using the Growing Self Organizing Map (GSOM); (ii) clustering drugs based on drug-drug relation matrices based on the previous step, considering three state-of-the-art graph clustering methods; and (iii) inferring drug repositioning candidates and assigning a confidence value for each identified candidate. In this paper, we compare our two-tiered clustering approach against two existing heterogeneous data integration approaches with reference to the Anatomical Therapeutic Chemical (ATC) classification, using GSOM. Our approach yields Normalized Mutual Information (NMI) and Standardized Mutual Information (SMI) of 0.66 and 36.11, respectively, while the two existing methods yield NMI of 0.60 and 0.64 and SMI of 22.26 and 33.59. Moreover, the two existing approaches failed to produce useful cluster separations when using graph clustering algorithms while our approach is able to identify useful clusters for drug repositioning. Furthermore, we provide clinical evidence for four predicted results (Chlorthalidone, Indomethacin, Metformin and Thioridazine) to support that our proposed approach can be reliably used to infer ATC code and drug repositioning. CONCLUSION: The proposed two-tiered unsupervised clustering approach is suitable for drug clustering and enables heterogeneous data integration. It also enables identifying reliable repositioning drug candidates with reference to ATC therapeutic classification. The repositioning drug candidates identified consistently by multiple clustering algorithms and with high confidence have a higher possibility of being effective repositioning candidates.


Subject(s)
Drug Repositioning , Statistics as Topic , Algorithms , Cluster Analysis , Computational Biology , Humans , Pharmaceutical Preparations/classification
10.
IEEE J Biomed Health Inform ; 22(1): 205-214, 2018 01.
Article in English | MEDLINE | ID: mdl-28371786

ABSTRACT

Quantitative gait analysis is an important tool in objective assessment and management of total knee arthroplasty (TKA) patients. Studies evaluating gait patterns in TKA patients have tended to focus on discrete data such as spatiotemporal information, joint range of motion and peak values of kinematics and kinetics, or consider selected principal components of gait waveforms for analysis. These strategies may not have the capacity to capture small variations in gait patterns associated with each joint across an entire gait cycle, and may ultimately limit the accuracy of gait classification. The aim of this study was to develop an automatic feature extraction method to analyse patterns from high-dimensional autocorrelated gait waveforms. A general linear feature extraction framework was proposed and a hierarchical partial least squares method derived for discriminant analysis of multiple gait waveforms. The effectiveness of this strategy was verified using a dataset of joint angle and ground reaction force waveforms from 43 patients after TKA surgery and 31 healthy control subjects. Compared with principal component analysis and partial least squares methods, the hierarchical partial least squares method achieved generally better classification performance on all possible combinations of waveforms, with the highest classification accuracy . The novel hierarchical partial least squares method proposed is capable of capturing virtually all significant differences between TKA patients and the controls, and provides new insights into data visualization. The proposed framework presents a foundation for more rigorous classification of gait, and may ultimately be used to evaluate the effects of interventions such as surgery and rehabilitation.


Subject(s)
Arthroplasty, Replacement, Knee , Gait/physiology , Pattern Recognition, Automated/methods , Range of Motion, Articular/physiology , Aged , Female , Humans , Image Processing, Computer-Assisted , Least-Squares Analysis , Lower Extremity/physiology , Male , Middle Aged , Principal Component Analysis , Walking/physiology
11.
Comput Struct Biotechnol J ; 15: 447-455, 2017.
Article in English | MEDLINE | ID: mdl-29085573

ABSTRACT

Assessing biodiversity is an important step in the study of microbial ecology associated with a given environment. Multiple indices have been used to quantify species diversity, which is a key biodiversity measure. Measuring species diversity of viruses in different environments remains a challenge relative to measuring the diversity of other microbial communities. Metagenomics has played an important role in elucidating viral diversity by conducting metavirome studies; however, metavirome data are of high complexity requiring robust data preprocessing and analysis methods. In this review, existing bioinformatics methods for measuring species diversity using metavirome data are categorised broadly as either sequence similarity-dependent methods or sequence similarity-independent methods. The former includes a comparison of DNA fragments or assemblies generated in the experiment against reference databases for quantifying species diversity, whereas estimates from the latter are independent of the knowledge of existing sequence data. Current methods and tools are discussed in detail, including their applications and limitations. Drawbacks of the state-of-the-art method are demonstrated through results from a simulation. In addition, alternative approaches are proposed to overcome the challenges in estimating species diversity measures using metavirome data.

12.
BMC Bioinformatics ; 18(1): 140, 2017 Mar 01.
Article in English | MEDLINE | ID: mdl-28249566

ABSTRACT

BACKGROUND: Investigating and understanding drug-drug interactions (DDIs) is important in improving the effectiveness of clinical care. DDIs can occur when two or more drugs are administered together. Experimentally based DDI detection methods require a large cost and time. Hence, there is a great interest in developing efficient and useful computational methods for inferring potential DDIs. Standard binary classifiers require both positives and negatives for training. In a DDI context, drug pairs that are known to interact can serve as positives for predictive methods. But, the negatives or drug pairs that have been confirmed to have no interaction are scarce. To address this lack of negatives, we introduce a Positive-Unlabeled Learning method for inferring potential DDIs. RESULTS: The proposed method consists of three steps: i) application of Growing Self Organizing Maps to infer negatives from the unlabeled dataset; ii) using a pairwise similarity function to quantify the overlap between individual features of drugs and iii) using support vector machine classifier for inferring DDIs. We obtained 6036 DDIs from DrugBank database. Using the proposed approach, we inferred 589 drug pairs that are likely to not interact with each other; these drug pairs are used as representative data for the negative class in binary classification for DDI prediction. Moreover, we classify the predicted DDIs as Cytochrome P450 (CYP) enzyme-Dependent and CYP-Independent interactions invoking their locations on the Growing Self Organizing Map, due to the particular importance of these enzymes in clinically significant interaction effects. Further, we provide a case study on three predicted CYP-Dependent DDIs to evaluate the clinical relevance of this study. CONCLUSION: Our proposed approach showed an absolute improvement in F1-score of 14 and 38% in comparison to the method that randomly selects unlabeled data points as likely negatives, depending on the choice of similarity function. We inferred 5300 possible CYP-Dependent DDIs and 592 CYP-Independent DDIs with the highest posterior probabilities. Our discoveries can be used to improve clinical care as well as the research outcomes of drug development.


Subject(s)
Drug Interactions/physiology , Pharmaceutical Preparations/metabolism , Support Vector Machine , Cluster Analysis , Cytochrome P-450 Enzyme System/metabolism , Databases, Factual , Humans , Pharmaceutical Preparations/chemistry , Protein Isoforms/genetics , Protein Isoforms/metabolism
13.
BMC Bioinformatics ; 18(Suppl 16): 551, 2017 12 28.
Article in English | MEDLINE | ID: mdl-29297291

ABSTRACT

BACKGROUND: Cancer constitutes a momentous health burden in our society. Critical information on cancer may be hidden in its signaling pathways. However, even though a large amount of money has been spent on cancer research, some critical information on cancer-related signaling pathways still remains elusive. Hence, new works towards a complete understanding of cancer-related signaling pathways will greatly benefit the prevention, diagnosis, and treatment of cancer. RESULTS: We propose the node-weighted Steiner tree approach to identify important elements of cancer-related signaling pathways at the level of proteins. This new approach has advantages over previous approaches since it is fast in processing large protein-protein interaction networks. We apply this new approach to identify important elements of two well-known cancer-related signaling pathways: PI3K/Akt and MAPK. First, we generate a node-weighted protein-protein interaction network using protein and signaling pathway data. Second, we modify and use two preprocessing techniques and a state-of-the-art Steiner tree algorithm to identify a subnetwork in the generated network. Third, we propose two new metrics to select important elements from this subnetwork. On a commonly used personal computer, this new approach takes less than 2 s to identify the important elements of PI3K/Akt and MAPK signaling pathways in a large node-weighted protein-protein interaction network with 16,843 vertices and 1,736,922 edges. We further analyze and demonstrate the significance of these identified elements to cancer signal transduction by exploring previously reported experimental evidences. CONCLUSIONS: Our node-weighted Steiner tree approach is shown to be both fast and effective to identify important elements of cancer-related signaling pathways. Furthermore, it may provide new perspectives into the identification of signaling pathways for other human diseases.


Subject(s)
Computational Biology/methods , Neoplasms/genetics , Protein Interaction Maps/genetics , Algorithms , Humans , Signal Transduction
14.
BMC Bioinformatics ; 18(Suppl 16): 571, 2017 12 28.
Article in English | MEDLINE | ID: mdl-29297295

ABSTRACT

BACKGROUND: In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. RESULTS: Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. CONCLUSIONS: The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.


Subject(s)
Algorithms , Contig Mapping , Metagenomics/methods , Workflow , Base Sequence , Cluster Analysis , Databases, Genetic , Genome , Humans , Metagenome
15.
Clin Biomech (Bristol, Avon) ; 32: 92-101, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26874198

ABSTRACT

BACKGROUND: Gait features characteristic of a cohort may be difficult to evaluate due to differences in subjects' demographic factors and walking speed. The aim of this study was to employ a multiple regression normalization method that accounts for subject age, height, body mass, gender, and self-selected walking speed in the evaluation of gait in unilateral total knee arthroplasty patients. METHODS: Three-dimensional gait analysis was performed on 45 total knee arthroplasty patients and 31 aged-matched controls walking at their self-selected speed. Gait data peaks including joint angles, ground reaction forces, net joint moments, and net joint powers were normalized using subject body mass, standard dimensionless equations, and a multiple regression approach that modeled subject age, height, body mass, gender, and self-selected walking speed. FINDINGS: Normalizing gait data using subject body mass, dimensionless equations, and multiple regression approach resulted in a significantly lower knee adduction moment and knee extensor power in total knee arthroplasty patients compared to controls (p<0.05). In contrast to normalization using body mass and dimensionless equations, multiple regression normalization greatly reduced variance in gait data by minimizing correlations with subject demographic factors and walking speed, resulting in significantly higher peak hip extension angles and peak hip flexion powers in total knee arthroplasty patients (p<0.05). INTERPRETATION: Total knee arthroplasty patients generate greater hip extension angles and hip flexor power and have a lower knee adduction moment than healthy controls. This gait pattern may be a strategy to reduce muscle and joint loading at the knee.


Subject(s)
Arthroplasty, Replacement, Knee/methods , Gait/physiology , Adult , Aged , Aged, 80 and over , Arthroplasty, Replacement, Knee/rehabilitation , Cohort Studies , Female , Humans , Imaging, Three-Dimensional , Knee Joint/physiology , Male , Middle Aged , Range of Motion, Articular/physiology , Regression Analysis , Stress, Mechanical , Walking/physiology
16.
J Appl Biomech ; 32(2): 128-39, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26426798

ABSTRACT

Normalization of gait data is performed to reduce the effects of intersubject variations due to physical characteristics. This study reports a multiple regression normalization approach for spatiotemporal gait data that takes into account intersubject variations in self-selected walking speed and physical properties including age, height, body mass, and sex. Spatiotemporal gait data including stride length, cadence, stance time, double support time, and stride time were obtained from healthy subjects including 782 children, 71 adults, 29 elderly subjects, and 28 elderly Parkinson's disease (PD) patients. Data were normalized using standard dimensionless equations, a detrending method, and a multiple regression approach. After normalization using dimensionless equations and the detrending method, weak to moderate correlations between walking speed, physical properties, and spatiotemporal gait features were observed (0.01 < |r| < 0.88), whereas normalization using the multiple regression method reduced these correlations to weak values (|r| <0.29). Data normalization using dimensionless equations and detrending resulted in significant differences in stride length and double support time of PD patients; however the multiple regression approach revealed significant differences in these features as well as in cadence, stance time, and stride time. The proposed multiple regression normalization may be useful in machine learning, gait classification, and clinical evaluation of pathological gait patterns.


Subject(s)
Data Interpretation, Statistical , Gait Disorders, Neurologic/physiopathology , Gait , Parkinson Disease/physiopathology , Spatio-Temporal Analysis , Walking , Adolescent , Aged , Aged, 80 and over , Algorithms , Child , Child, Preschool , Female , Gait Disorders, Neurologic/diagnosis , Gait Disorders, Neurologic/etiology , Humans , Machine Learning , Male , Middle Aged , Parkinson Disease/complications , Parkinson Disease/diagnosis , Pattern Recognition, Automated , Physical Examination/methods , Regression Analysis , Reproducibility of Results , Sensitivity and Specificity
17.
BMC Syst Biol ; 10(Suppl 5): 128, 2016 Dec 05.
Article in English | MEDLINE | ID: mdl-28105946

ABSTRACT

BACKGROUND: Drug repositioning can reduce the time, costs and risks of drug development by identifying new therapeutic effects for known drugs. It is challenging to reposition drugs as pharmacological data is large and complex. Subnetwork identification has already been used to simplify the visualization and interpretation of biological data, but it has not been applied to drug repositioning so far. In this paper, we fill this gap by proposing a new Physarum-inspired Prize-Collecting Steiner Tree algorithm to identify subnetworks for drug repositioning. RESULTS: Drug Similarity Networks (DSN) are generated using the chemical, therapeutic, protein, and phenotype features of drugs. In DSNs, vertex prizes and edge costs represent the similarities and dissimilarities between drugs respectively, and terminals represent drugs in the cardiovascular class, as defined in the Anatomical Therapeutic Chemical classification system. A new Physarum-inspired Prize-Collecting Steiner Tree algorithm is proposed in this paper to identify subnetworks. We apply both the proposed algorithm and the widely-used GW algorithm to identify subnetworks in our 18 generated DSNs. In these DSNs, our proposed algorithm identifies subnetworks with an average Rand Index of 81.1%, while the GW algorithm can only identify subnetworks with an average Rand Index of 64.1%. We select 9 subnetworks with high Rand Index to find drug repositioning opportunities. 10 frequently occurring drugs in these subnetworks are identified as candidates to be repositioned for cardiovascular diseases. CONCLUSIONS: We find evidence to support previous discoveries that nitroglycerin, theophylline and acarbose may be able to be repositioned for cardiovascular diseases. Moreover, we identify seven previously unknown drug candidates that also may interact with the biological cardiovascular system. These discoveries show our proposed Prize-Collecting Steiner Tree approach as a promising strategy for drug repositioning.


Subject(s)
Algorithms , Biomimetics , Computational Biology/methods , Drug Repositioning , Physarum
18.
BMC Genomics ; 16 Suppl 12: S12, 2015.
Article in English | MEDLINE | ID: mdl-26680279

ABSTRACT

BACKGROUND: Mass Spectrometry (MS) is a ubiquitous analytical tool in biological research and is used to measure the mass-to-charge ratio of bio-molecules. Peak detection is the essential first step in MS data analysis. Precise estimation of peak parameters such as peak summit location and peak area are critical to identify underlying bio-molecules and to estimate their abundances accurately. We propose a new method to detect and quantify peaks in mass spectra. It uses dual-tree complex wavelet transformation along with Stein's unbiased risk estimator for spectra smoothing. Then, a new method, based on the modified Asymmetric Pseudo-Voigt (mAPV) model and hierarchical particle swarm optimization, is used for peak parameter estimation. RESULTS: Using simulated data, we demonstrated the benefit of using the mAPV model over Gaussian, Lorentz and Bi-Gaussian functions for MS peak modelling. The proposed mAPV model achieved the best fitting accuracy for asymmetric peaks, with lower percentage errors in peak summit location estimation, which were 0.17% to 4.46% less than that of the other models. It also outperformed the other models in peak area estimation, delivering lower percentage errors, which were about 0.7% less than its closest competitor - the Bi-Gaussian model. In addition, using data generated from a MALDI-TOF computer model, we showed that the proposed overall algorithm outperformed the existing methods mainly in terms of sensitivity. It achieved a sensitivity of 85%, compared to 77% and 71% of the two benchmark algorithms, continuous wavelet transformation based method and Cromwell respectively. CONCLUSIONS: The proposed algorithm is particularly useful for peak detection and parameter estimation in MS data with overlapping peak distributions and asymmetric peaks. The algorithm is implemented using MATLAB and the source code is freely available at http://mapv.sourceforge.net.


Subject(s)
Computational Biology/methods , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Algorithms , Computer Simulation
19.
BMC Bioinformatics ; 16 Suppl 18: S3, 2015.
Article in English | MEDLINE | ID: mdl-26678073

ABSTRACT

BACKGROUND: Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. RESULTS: On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. CONCLUSIONS: The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors. AVAILABILITY: http://sourceforge.net/projects/viquas/.


Subject(s)
Metagenomics , Algorithms , Benchmarking , High-Throughput Nucleotide Sequencing , Internet , User-Computer Interface
20.
IEEE J Biomed Health Inform ; 19(6): 1794-802, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26551989

ABSTRACT

Quantitative gait assessment is important in diagnosis and management of Parkinson's disease (PD); however, gait characteristics of a cohort are dispersed by patient physical properties including age, height, body mass, and gender, as well as walking speed, which may limit capacity to discern some pathological features. The aim of this study was twofold. First, to use a multiple regression normalization strategy that accounts for subject age, height, body mass, gender, and self-selected walking speed to identify differences in spatial-temporal gait features between PD patients and controls; and second, to evaluate the effectiveness of machine learning strategies in classifying PD gait after gait normalization. Spatial-temporal gait data during self-selected walking were obtained from 23 PD patients and 26 aged-matched controls. Data were normalized using standard dimensionless equations and multiple regression normalization. Machine learning strategies were then employed to classify PD gait using the raw gait data, data normalized using dimensionless equations, and data normalized using the multiple regression approach. After normalizing data using the dimensionless equations, only stride length, step length, and double support time were significantly different between PD patients and controls (p < 0.05); however, normalizing data using the multiple regression method revealed significant differences in stride length, cadence, stance time, and double support time. Random Forest resulted in a PD classification accuracy of 92.6% after normalizing gait data using the multiple regression approach, compared to 80.4% (support vector machine) and 86.2% (kernel Fisher discriminant) using raw data and data normalized using dimensionless equations, respectively. Our multiple regression normalization approach will assist in diagnosis and treatment of PD using spatial-temporal gait data.


Subject(s)
Gait/physiology , Image Processing, Computer-Assisted/methods , Machine Learning , Parkinson Disease/diagnosis , Parkinson Disease/physiopathology , Walking/physiology , Aged , Aged, 80 and over , Female , Humans , Male , Middle Aged , Regression Analysis , Spatio-Temporal Analysis , Video Recording
SELECTION OF CITATIONS
SEARCH DETAIL
...