Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Adv Data Anal Classif ; 16(3): 691-723, 2022.
Article in English | MEDLINE | ID: mdl-36043219

ABSTRACT

A probabilistic model for random hypergraphs is introduced to represent unary, binary and higher order interactions among objects in real-world problems. This model is an extension of the latent class analysis model that introduces two clustering structures for hyperedges and captures variation in the size of hyperedges. An expectation maximization algorithm with minorization maximization steps is developed to perform parameter estimation. Model selection using Bayesian Information Criterion is proposed. The model is applied to simulated data and two real-world data sets where interesting results are obtained.

2.
Adv Data Anal Classif ; 16(1): 55-92, 2022.
Article in English | MEDLINE | ID: mdl-35308632

ABSTRACT

In supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase. Moreover, the same units in the test data may be measured on a set of additional variables recorded at a subsequent stage with respect to when the learning sample was collected. In this situation, the classifier built in the learning phase needs to adapt to handle potential unknown classes and the extra dimensions. We introduce a model-based discriminant approach, Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt to the increasing dimensionality. Model estimation is carried out via a full inductive approach based on an EM algorithm. The method is then embedded in a more general framework for adaptive variable selection and classification suitable for data of large dimensions. A simulation study and an artificial experiment related to classification of adulterated honey samples are used to validate the ability of the proposed framework to deal with complex situations.

3.
Stat Methods Appt ; 30(5): 1365-1398, 2021.
Article in English | MEDLINE | ID: mdl-34840548

ABSTRACT

We propose a weighted stochastic block model (WSBM) which extends the stochastic block model to the important case in which edges are weighted. We address the parameter estimation of the WSBM by use of maximum likelihood and variational approaches, and establish the consistency of these estimators. The problem of choosing the number of classes in a WSBM is addressed. The proposed model is applied to simulated data and an illustrative data set.

4.
Anal Chim Acta ; 1153: 338245, 2021 Apr 08.
Article in English | MEDLINE | ID: mdl-33714445

ABSTRACT

Classification of high-dimensional spectroscopic data is a common task in analytical chemistry. Well-established procedures like support vector machines (SVMs) and partial least squares discriminant analysis (PLS-DA) are the most common methods for tackling this supervised learning problem. Nonetheless, interpretation of these models remains sometimes difficult, and solutions based on feature selection are often adopted as they lead to the automatic identification of the most informative wavelengths. Unfortunately, for some delicate applications like food authenticity, mislabeled and adulterated spectra occur both in the calibration and/or validation sets, with dramatic effects on the model development, its prediction accuracy and robustness. Motivated by these issues, the present paper proposes a robust model-based method that simultaneously performs variable selection, outliers and label noise detection. We demonstrate the effectiveness of our proposal in dealing with three agri-food spectroscopic studies, where several forms of perturbations are considered. Our approach succeeds in diminishing problem complexity, identifying anomalous spectra and attaining competitive predictive accuracy considering a very low number of selected wavelengths.

5.
Sci Rep ; 11(1): 2525, 2021 01 28.
Article in English | MEDLINE | ID: mdl-33510263

ABSTRACT

Improved prostate cancer detection methods would avoid over-diagnosis of clinically indolent disease informing appropriate treatment decisions. The aims of this study were to investigate the role of a panel of Inflammation biomarkers to inform the need for a biopsy to diagnose prostate cancer. Peripheral blood serum obtained from 436 men undergoing transrectal ultrasound guided biopsy were assessed for a panel of 18 inflammatory serum biomarkers in addition to Total and Free Prostate Specific Antigen (PSA). This panel was integrated into a previously developed Irish clinical risk calculator (IPRC) for the detection of prostate cancer and high-grade prostate cancer (Gleason Score ≥ 7). Using logistic regression and multinomial regression methods, two models (Logst-RC and Multi-RC) were developed considering linear and nonlinear effects of the panel in conjunction with clinical and demographic parameters for determination of the two endpoints. Both models significantly improved the predictive ability of the clinical model for detection of prostate cancer (from 0.656 to 0.731 for Logst-RC and 0.713 for Multi-RC) and high-grade prostate cancer (from 0.716 to 0.785 for Logst-RC and 0.767 for Multi-RC) and demonstrated higher clinical net benefit. This improved discriminatory power and clinical utility may allow for individualised risk stratification improving clinical decision making.


Subject(s)
Biomarkers/blood , Inflammation Mediators/blood , Prostatic Neoplasms/blood , Prostatic Neoplasms/diagnosis , Adult , Aged , Aged, 80 and over , Biopsy , Early Detection of Cancer , Humans , Liquid Biopsy , Male , Middle Aged , Neoplasm Grading , Neoplasm Staging , Prostatic Neoplasms/epidemiology , ROC Curve , Risk Assessment , Risk Factors
6.
J Comput Graph Stat ; 28(1): 185-196, 2019.
Article in English | MEDLINE | ID: mdl-31447541

ABSTRACT

Many existing statistical and machine learning tools for social network analysis focus on a single level of analysis. Methods designed for clustering optimize a global partition of the graph, whereas projection-based approaches (e.g., the latent space model in the statistics literature) represent in rich detail the roles of individuals. Many pertinent questions in sociology and economics, however, span multiple scales of analysis. Further, many questions involve comparisons across disconnected graphs that will, inevitably be of different sizes, either due to missing data or the inherent heterogeneity in real-world networks. We propose a class of network models that represent network structure on multiple scales and facilitate comparison across graphs with different numbers of individuals. These models differentially invest modeling effort within subgraphs of high density, often termed communities, while maintaining a parsimonious structure between said subgraphs. We show that our model class is projective, highlighting an ongoing discussion in the social network modeling literature on the dependence of inference paradigms on the size of the observed graph. We illustrate the utility of our method using data on household relations from Karnataka, India. Supplementary material for this article is available online.

7.
Prostate ; 78(10): 724-730, 2018 07.
Article in English | MEDLINE | ID: mdl-29608018

ABSTRACT

BACKGROUND: Up to a third of prostate cancer patients fail curative treatment strategies such as surgery and radiation therapy in the form of biochemical recurrence (BCR) which can be predictive of poor outcome. Recent clinical trials have shown that men experiencing BCR might benefit from earlier intervention post-radical prostatectomy (RP). Therefore, there is an urgent need to identify earlier prognostic biomarkers which will guide clinicians in making accurate diagnosis and timely decisions on the next appropriate treatment. The objective of this study was to evaluate Serum Response Factor (SRF) protein expression following RP and to investigate its association with BCR. MATERIALS AND METHODS: SRF nuclear expression was evaluated by immunohistochemistry (IHC) in TMAs across three international radical prostatectomy cohorts for a total of 615 patients. Log-rank test and Kaplan-Meier analyses were used for BCR comparisons. Stepwise backwards elimination proportional hazard regression analysis was used to explore the significance of SRF in predicting BCR in the context of other clinical pathological variables. Area under the curve (AUC) values were generated by simulating repeated random sub-samples. RESULTS: Analysis of the immunohistochemical staining of benign versus cancer cores showed higher expression of nuclear SRF protein expression in cancer cores compared with benign for all the three TMAs analysed (P < 0.001, n = 615). Kaplan-Meier curves of the three TMAs combined showed that patients with higher SRF nuclear expression had a shorter time to BCR compared with patients with lower SRF expression (P < 0.001, n = 215). Together with pathological T stage T3, SRF was identified as a predictor of BCR using stepwise backwards elimination proportional hazard regression analysis (P = 0.0521). Moreover ROC curves and AUC values showed that SRF was better than T stage in predicting BCR at year 3 and 5 following radical prostatectomy, the combination of SRF and T stage had a higher AUC value than the two taken separately. CONCLUSIONS: SRF assessment by IHC following RP could be useful in guiding clinicians to better identify patients for appropriate follow-up and timely treatment.


Subject(s)
Neoplasm Recurrence, Local/metabolism , Prostate/metabolism , Prostatic Neoplasms/metabolism , Prostatic Neoplasms/surgery , Serum Response Factor/biosynthesis , Aged , Humans , Immunochemistry , Male , Middle Aged , Neoplasm Recurrence, Local/blood , Neoplasm Recurrence, Local/pathology , Prostate/surgery , Prostatic Neoplasms/blood , Prostatic Neoplasms/pathology , Serum Response Factor/blood , Survival Analysis
8.
J Comput Graph Stat ; 24(2): 520-538, 2015 Jun 01.
Article in English | MEDLINE | ID: mdl-26101465

ABSTRACT

A novel and flexible framework for investigating the roles of actors within a network is introduced. Particular interest is in roles as defined by local network connectivity patterns, identified using the ego-networks extracted from the network. A mixture of Exponential-family Random Graph Models is developed for these ego-networks in order to cluster the nodes into roles. We refer to this model as the ego-ERGM. An Expectation-Maximization algorithm is developed to infer the unobserved cluster assignments and to estimate the mixture model parameters using a maximum pseudo-likelihood approximation. The flexibility and utility of the method are demonstrated on examples of simulated and real networks.

9.
BMC Med Inform Decis Mak ; 13: 126, 2013 Nov 15.
Article in English | MEDLINE | ID: mdl-24238348

ABSTRACT

BACKGROUND: There are dilemmas associated with the diagnosis and prognosis of prostate cancer which has lead to over diagnosis and over treatment. Prediction tools have been developed to assist the treatment of the disease. METHODS: A retrospective review was performed of the Irish Prostate Cancer Research Consortium database and 603 patients were used in the study. Statistical models based on routinely used clinical variables were built using logistic regression, random forests and k nearest neighbours to predict prostate cancer stage. The predictive ability of the models was examined using discrimination metrics, calibration curves and clinical relevance, explored using decision curve analysis. The N = 603 patients were then applied to the 2007 Partin table to compare the predictions from the current gold standard in staging prediction to the models developed in this study. RESULTS: 30% of the study cohort had non organ-confined disease. The model built using logistic regression illustrated the highest discrimination metrics (AUC = 0.622, Sens = 0.647, Spec = 0.601), best calibration and the most clinical relevance based on decision curve analysis. This model also achieved higher discrimination than the 2007 Partin table (ECE AUC = 0.572 & 0.509 for T1c and T2a respectively). However, even the best statistical model does not accurately predict prostate cancer stage. CONCLUSIONS: This study has illustrated the inability of the current clinical variables and the 2007 Partin table to accurately predict prostate cancer stage. New biomarker features are urgently required to address the problem clinician's face in identifying the most appropriate treatment for their patients. This paper also demonstrated a concise methodological approach to evaluate novel features or prediction models.


Subject(s)
Models, Statistical , Neoplasm Staging/standards , Prognosis , Prostatic Neoplasms , Adult , Aged , Calibration/standards , Databases, Factual/statistics & numerical data , Humans , Ireland , Male , Middle Aged , Predictive Value of Tests , Prostatic Neoplasms/diagnosis , Prostatic Neoplasms/pathology , Reproducibility of Results , Retrospective Studies , Sensitivity and Specificity
10.
Stroke ; 42(3): 681-6, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21233462

ABSTRACT

BACKGROUND AND PURPOSE: Additional exercise therapy has been shown to have a positive impact on function after acute stroke and research is now focusing on methods to increase the amount of therapy that is delivered. This randomized controlled trial examined the impact of additional family-mediated exercise (FAME) therapy on outcome after acute stroke. METHODS: Forty participants with acute stroke were randomly assigned to either a control group who received routine therapy with no formal input from their family members or a FAME group, who received routine therapy and additional lower limb FAME therapy for 8 weeks. The primary outcome measure used was the lower limb section of the Fugl-Meyer Assessment modified by Lindmark. Other measures of impairment, activity, and participation were completed at baseline, postintervention, and at a 3-month follow-up. RESULTS: Statistically significant differences in favor of the FAME group were noted on all measures of impairment and activity postintervention (P<0.05). These improvements persisted at the 3-month follow-up but only walking was statistically significant (P<0.05). Participants in the FAME group were also significantly more integrated into their community at follow-up (P<0.05). Family members in the FAME group reported a significant decrease in their levels of caregiver strain at the follow-up when compared with those in the control group (P<0.01). CONCLUSIONS: This evidence-based FAME intervention can serve to optimize patient recovery and family involvement after acute stroke at the same time as being mindful of available resources.


Subject(s)
Caregivers/standards , Exercise Therapy/methods , Exercise Therapy/standards , Recovery of Function/physiology , Stroke Rehabilitation , Stroke/physiopathology , Aged , Aged, 80 and over , Cohort Studies , Female , Follow-Up Studies , Humans , Male , Middle Aged , Time Factors , Treatment Outcome
11.
J Proteome Res ; 10(3): 1361-73, 2011 Mar 04.
Article in English | MEDLINE | ID: mdl-21166384

ABSTRACT

In recent years, Prostate Specific Antigen (PSA) testing is widespread and has been associated with deceased mortality rates; however, this testing has raised concerns of overdiagnosis and overtreatment. It is clear that additional biomarkers are required. To identify these biomarkers, we have undertaken proteomics and metabolomics expression profiles of serum samples from BPH, Gleason score 5 and 7 using two-dimensional difference in gel electrophoresis (2D-DIGE) and nuclear magnetic resonance spectroscopy (NMR). Panels of serum protein biomarkers were identified by applying Random Forests to the 2D-DIGE data. The evaluation of selected biomarker panels has shown that they can provide higher prediction accuracy than the current diagnostic standard. With careful validation of these serum biomarker panels, these panels may potentially help to reduce unnecessary invasive diagnostic procedures and more accurately direct the urologist to curative surgery.


Subject(s)
Biomarkers, Tumor/analysis , Biomarkers, Tumor/blood , Prostatic Neoplasms/blood , Prostatic Neoplasms/diagnosis , Prostatic Neoplasms/pathology , Two-Dimensional Difference Gel Electrophoresis/methods , Area Under Curve , Cluster Analysis , Humans , Male , Mass Spectrometry/methods , Neoplasm Staging , Reproducibility of Results
12.
Ann Appl Stat ; 4(1): 396-421, 2010 Mar 01.
Article in English | MEDLINE | ID: mdl-20936055

ABSTRACT

Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins.

13.
Bioinformatics ; 26(21): 2705-12, 2010 Nov 01.
Article in English | MEDLINE | ID: mdl-20802251

ABSTRACT

MOTIVATION: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. RESULTS: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. AVAILABILITY: The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info


Subject(s)
Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Algorithms , Cluster Analysis , Computer Simulation , Normal Distribution , Pattern Recognition, Automated/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...