Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Stat Med ; 41(11): 2005-2024, 2022 05 20.
Article in English | MEDLINE | ID: mdl-35118686

ABSTRACT

Functional magnetic resonance imaging (fMRI) is a non-invasive technique that facilitates the study of brain activity by measuring changes in blood flow. Brain activity signals can be recorded during the alternate performance of given tasks, that is, task fMRI (tfMRI), or during resting-state, that is, resting-state fMRI (rsfMRI), as a measure of baseline brain activity. This contributes to the understanding of how the human brain is organized in functionally distinct subdivisions. fMRI experiments from high-resolution scans provide hundred of thousands of longitudinal signals for each individual, corresponding to brain activity measurements over each voxel of the brain along the duration of the experiment. In this context, we propose novel visualization techniques for high-dimensional functional data relying on depth-based notions that enable computationally efficient 2-dim representations of fMRI data, which elucidate sample composition, outlier presence, and individual variability. We believe that this previous step is crucial to any inferential approach willing to identify neuroscientific patterns across individuals, tasks, and brain regions. We present the proposed technique via an extensive simulation study, and demonstrate its application on a motor and language tfMRI experiment.


Subject(s)
Brain Mapping , Magnetic Resonance Imaging , Brain/diagnostic imaging , Brain Mapping/methods , Humans , Language
2.
Entropy (Basel) ; 22(6)2020 Jun 17.
Article in English | MEDLINE | ID: mdl-33286447

ABSTRACT

Nowadays, across the most important problems faced by health centers are those caused by the existence of patients who do not attend their appointments. Among others, these patients cause loss of revenue to the health centers and increase the patients' waiting list. In order to tackle these problems, several scheduling systems have been developed. Many of them require predicting whether a patient will show up for an appointment. However, obtaining these estimates accurately is currently a challenging problem. In this work, a systematic review of the literature on predicting patient no-shows is conducted aiming at establishing the current state-of-the-art. Based on a systematic review following the PRISMA methodology, 50 articles were found and analyzed. Of these articles, 82% were published in the last 10 years and the most used technique was logistic regression. In addition, there is significant growth in the size of the databases used to build the classifiers. An important finding is that only two studies achieved an accuracy higher than the show rate. Moreover, a single study attained an area under the curve greater than the 0.9 value. These facts indicate the difficulty of this problem and the need for further research.

3.
Stat Appl Genet Mol Biol ; 16(2): 133-144, 2017 04 25.
Article in English | MEDLINE | ID: mdl-28593899

ABSTRACT

We propose an approach for multiple sequence alignment (MSA) derived from the dynamic time warping viewpoint and recent techniques of curve synchronization developed in the context of functional data analysis. Starting from pairwise alignments of all the sequences (viewed as paths in a certain space), we construct a median path that represents the MSA we are looking for. We establish a proof of concept that our method could be an interesting ingredient to include into refined MSA techniques. We present a simple synthetic experiment as well as the study of a benchmark dataset, together with comparisons with 2 widely used MSA softwares.


Subject(s)
Sequence Alignment/methods , Software , Algorithms , Base Sequence/genetics , Computer Simulation
4.
Stat Med ; 36(13): 2120-2134, 2017 06 15.
Article in English | MEDLINE | ID: mdl-28215052

ABSTRACT

We propose a semiparametric nonlinear mixed-effects model (SNMM) using penalized splines to classify longitudinal data and improve the prediction of a binary outcome. The work is motivated by a study in which different hormone levels were measured during the early stages of pregnancy, and the challenge is using this information to predict normal versus abnormal pregnancy outcomes. The aim of this paper is to compare models and estimation strategies on the basis of alternative formulations of SNMMs depending on the characteristics of the data set under consideration. For our motivating example, we address the classification problem using a particular case of the SNMM in which the parameter space has a finite dimensional component (fixed effects and variance components) and an infinite dimensional component (unknown function) that need to be estimated. The nonparametric component of the model is estimated using penalized splines. For the parametric component, we compare the advantages of using random effects versus direct modeling of the correlation structure of the errors. Numerical studies show that our approach improves over other existing methods for the analysis of this type of data. Furthermore, the results obtained using our method support the idea that explicit modeling of the serial correlation of the error term improves the prediction accuracy with respect to a model with random effects, but independent errors. Copyright © 2017 John Wiley & Sons, Ltd.


Subject(s)
Longitudinal Studies , Models, Statistical , Pregnancy Outcome/epidemiology , Data Interpretation, Statistical , Female , Hexachlorocyclohexane/blood , Humans , Pregnancy/blood , Pregnancy Trimesters/blood
5.
J Multivar Anal ; 143: 94-106, 2016 Jan.
Article in English | MEDLINE | ID: mdl-27274601

ABSTRACT

Joint models for a wide class of response variables and longitudinal measurements consist on a mixed-effects model to fit longitudinal trajectories whose random effects enter as covariates in a generalized linear model for the primary response. They provide a useful way to assess association between these two kinds of data, which in clinical studies are often collected jointly on a series of individuals and may help understanding, for instance, the mechanisms of recovery of a certain disease or the efficacy of a given therapy. When a nonlinear mixed-effects model is used to fit the longitudinal trajectories, the existing estimation strategies based on likelihood approximations have been shown to exhibit some computational efficiency problems (De la Cruz et al., 2011). In this article we consider a Bayesian estimation procedure for the joint model with a nonlinear mixed-effects model for the longitudinal data and a generalized linear model for the primary response. The proposed prior structure allows for the implementation of an MCMC sampler. Moreover, we consider that the errors in the longitudinal model may be correlated. We apply our method to the analysis of hormone levels measured at the early stages of pregnancy that can be used to predict normal versus abnormal pregnancy outcomes. We also conduct a simulation study to assess the importance of modelling correlated errors and quantify the consequences of model misspecification.

6.
Biometrics ; 71(2): 333-43, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25639332

ABSTRACT

We propose a classification method for longitudinal data. The Bayes classifier is classically used to determine a classification rule where the underlying density in each class needs to be well modeled and estimated. This work is motivated by a real dataset of hormone levels measured at the early stages of pregnancy that can be used to predict normal versus abnormal pregnancy outcomes. The proposed model, which is a semiparametric linear mixed-effects model (SLMM), is a particular case of the semiparametric nonlinear mixed-effects class of models (SNMM) in which finite dimensional (fixed effects and variance components) and infinite dimensional (an unknown function) parameters have to be estimated. In SNMM's maximum likelihood estimation is performed iteratively alternating parametric and nonparametric procedures. However, if one can make the assumption that the random effects and the unknown function interact in a linear way, more efficient estimation methods can be used. Our contribution is the proposal of a unified estimation procedure based on a penalized EM-type algorithm. The Expectation and Maximization steps are explicit. In this latter step, the unknown function is estimated in a nonparametric fashion using a lasso-type procedure. A simulation study and an application on real data are performed.


Subject(s)
Data Interpretation, Statistical , Models, Statistical , Algorithms , Bayes Theorem , Biometry , Chorionic Gonadotropin, beta Subunit, Human/metabolism , Computer Simulation , Female , Humans , Likelihood Functions , Linear Models , Longitudinal Studies , Nonlinear Dynamics , Pregnancy , Pregnancy Complications/diagnosis , Pregnancy Complications/metabolism , Pregnancy Outcome
7.
Biostatistics ; 15(4): 603-19, 2014 Oct.
Article in English | MEDLINE | ID: mdl-24622037

ABSTRACT

We propose a new method to visualize and detect shape outliers in samples of curves. In functional data analysis, we observe curves defined over a given real interval and shape outliers may be defined as those curves that exhibit a different shape from the rest of the sample. Whereas magnitude outliers, that is, curves that lie outside the range of the majority of the data, are in general easy to identify, shape outliers are often masked among the rest of the curves and thus difficult to detect. In this article, we exploit the relationship between two measures of depth for functional data to help to visualize curves in terms of shape and to develop an algorithm for shape outlier detection. We illustrate the use of the visualization tool, the outliergram, through several examples and analyze the performance of the algorithm on a simulation study. Finally, we apply our method to assess cluster quality in a real set of time course microarray data.


Subject(s)
Biomedical Research/methods , Computer Simulation , Microarray Analysis/methods , Pattern Recognition, Automated/methods , Algorithms
8.
Stat Appl Genet Mol Biol ; 11(1): Article 5, 2012 Jan 06.
Article in English | MEDLINE | ID: mdl-22499681

ABSTRACT

This article proposes a novel approach to statistical alignment of nucleotide sequences by introducing a context dependent structure on the substitution process in the underlying evolutionary model. We propose to estimate alignments and context dependent mutation rates relying on the observation of two homologous sequences. The procedure is based on a generalized pair-hidden Markov structure, where conditional on the alignment path, the nucleotide sequences follow a Markov distribution. We use a stochastic approximation expectation maximization (saem) algorithm to give accurate estimators of parameters and alignments. We provide results both on simulated data and vertebrate genomes, which are known to have a high mutation rate from CG dinucleotide. In particular, we establish that the method improves the accuracy of the alignment of a human pseudogene and its functional gene.


Subject(s)
Base Sequence , Markov Chains , Models, Statistical , Sequence Alignment/methods
9.
Biostatistics ; 13(3): 398-414, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22058080

ABSTRACT

In functional data analysis, the time warping model aims at representing a set of curves exhibiting phase and amplitude variation with respect to a common continuous process. Many biological processes, when observed across the time among different individuals, fit into this concept. The observed curves are modeled as the composition of an "amplitude process," which governs the common behavior, and a "warping process" that induces time distortion among the individuals. We aim at characterizing the first one. Because of the phase variation present among the curves, classical sample statistics computed on the observed sample provide poor representations of the amplitude process. Existing methods to estimate the mean behavior of the amplitude process consist on aligning the curves, that is, eliminating time variation, before estimation. However, since they rely on the use of sample means, they are very sensitive to the presence of outliers. In this article, we propose the use of a functional depth-based median as a robust estimator of the central behavior of the amplitude process. We investigate its properties in the time warping model, and we evaluate its performance in different simulation studies where we compare it to existing estimators, and we show its robustness against atypical observations. Finally, we illustrate its use with real data on a yeast time course microarray data set.


Subject(s)
Models, Statistical , Computer Simulation , Gene Expression Profiling/methods , Longitudinal Studies , Oligonucleotide Array Sequence Analysis , Time Factors , Yeasts/genetics
10.
Stat Appl Genet Mol Biol ; 9: Article 10, 2010.
Article in English | MEDLINE | ID: mdl-20196745

ABSTRACT

In this work we deal with parameter estimation in a latent variable model, namely the multiple-hidden i.i.d. model, which is derived from multiple alignment algorithms. We first provide a rigorous formalism for the homology structure of k sequences related by a star-shaped phylogenetic tree in the context of multiple alignment based on indel evolution models. We discuss possible definitions of likelihoods and compare them to the criterion used in multiple alignment algorithms. Existence of two different Information divergence rates is established and a divergence property is shown under additional assumptions. This would yield consistency for the parameter in parametrization schemes for which the divergence property holds. We finally extend the definition of the multiple-hidden i.i.d. model and the results obtained to the case in which the sequences are related by an arbitrary phylogenetic tree. Simulations illustrate different cases which are not covered by our results.


Subject(s)
Models, Statistical , Sequence Alignment/statistics & numerical data , Algorithms , Biostatistics , Evolution, Molecular , INDEL Mutation , Likelihood Functions , Markov Chains , Models, Genetic , Phylogeny , Stochastic Processes
11.
Article in English | MEDLINE | ID: mdl-19407352

ABSTRACT

We present a stochastic sequence evolution model to obtain alignments and estimate mutation rates between two homologous sequences. The model allows two possible evolutionary behaviors along a DNA sequence in order to determine conserved regions and take its heterogeneity into account. In our model, the sequence is divided into slow and fast evolution regions. The boundaries between these sections are not known. It is our aim to detect them. The evolution model is based on a fragment insertion and deletion process working on fast regions only and on a substitution process working on fast and slow regions with different rates. This model induces a pair hidden Markov structure at the level of alignments, thus making efficient statistical alignment algorithms possible. We propose two complementary estimation methods, namely, a Gibbs sampler for Bayesian estimation and a stochastic version of the EM algorithm for maximum likelihood estimation. Both algorithms involve the sampling of alignments. We propose a partial alignment sampler, which is computationally less expensive than the typical whole alignment sampler. We show the convergence of the two estimation algorithms when used with this partial sampler. Our algorithms provide consistent estimates for the mutation rates and plausible alignments and sequence segmentations on both simulated and real data.


Subject(s)
DNA/genetics , Evolution, Molecular , Models, Genetic , Models, Statistical , Mutation , Sequence Alignment , Algorithms , Animals , Base Sequence , Bayes Theorem , Computer Simulation , DNA Mutational Analysis , Drosophila/genetics , Humans , Markov Chains , Molecular Sequence Data , Vertebrates/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...