Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
Plast Reconstr Surg Glob Open ; 12(4): e5771, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38689944

ABSTRACT

Background: Facial skin cancer and its surgical treatment can affect health-related quality of life. The FACE-Q Skin Cancer Module is a patient-reported outcome measure that measures different aspects of health-related quality of life and has recently been translated into Dutch. This study aimed to evaluate the performance of the translated version in a Dutch cohort using modern psychometric measurement theory (Rasch). Methods: Dutch participants with facial skin cancer were prospectively recruited and asked to complete the translated FACE-Q Skin Cancer Module. The following assumptions of the Rasch model were tested: unidimensionality, local independence, and monotonicity. Response thresholds, fit statistics, internal consistency, floor and ceiling effects, and targeting were assessed for all scales and items within the scales. Responsiveness was tested for the "cancer worry" scale. Results: In total, 259 patients completed the preoperative questionnaire and were included in the analysis. All five scales assessed showed a good or sufficient fit to the Rasch model. Unidimensionality and monotonicity were present for all scales. Some items showed a local dependency. Most of the scales demonstrate ordered item thresholds and appropriate fit statistics. Conclusions: The FACE-Q Skin Cancer Module is a well-designed patient-reported outcome measure that shows psychometric validity for the translated version in a Dutch cohort, using classical and modern test theory.

2.
J Am Coll Surg ; 237(6): 856-861, 2023 12 01.
Article in English | MEDLINE | ID: mdl-37703495

ABSTRACT

BACKGROUND: Disparity in surgical care impedes the delivery of uniformly high-quality care. Metrics that quantify disparity in care can help identify areas for needed intervention. A literature-based Disparity-Sensitive Score (DSS) system for surgical care was adapted by the Metrics for Equitable Access and Care in Surgery (MEASUR) group. The alignment between the MEASUR DSS and Delphi ratings of an expert advisory panel (EAP) regarding the disparity sensitivity of surgical quality metrics was assessed. STUDY DESIGN: Using DSS criteria MEASUR co-investigators scored 534 surgical metrics which were subsequently rated by the EAP. All scores were converted to a 9-point scale. Agreement between the new measurement technique (ie DSS) and an established subjective technique (ie importance and validity ratings) were assessed using the Bland-Altman method, adjusting for the linear relationship between the paired difference and the paired average. The limit of agreement (LOA) was set at 1.96 SD (95%). RESULTS: The percentage of DSS scores inside the LOA was 96.8% (LOA, 0.02 points) for the importance rating and 94.6% (LOA, 1.5 points) for the validity rating. In comparison, 94.4% of the 2 subjective EAP ratings were inside the LOA (0.7 points). CONCLUSIONS: Applying the MEASUR DSS criteria using available literature allowed for identification of disparity-sensitive surgical metrics. The results suggest that this literature-based method of selecting quality metrics may be comparable to more complex consensus-based Delphi methods. In fields with robust literature, literature-based composite scores may be used to select quality metrics rather than assembling consensus panels.


Subject(s)
Benchmarking , Quality of Health Care , Humans , Delphi Technique , Consensus
4.
Sci Rep ; 12(1): 21269, 2022 12 08.
Article in English | MEDLINE | ID: mdl-36481644

ABSTRACT

Contrary to national guidelines, women with ovarian cancer often receive treatment at the end of life, potentially due to the difficulty in accurately estimating prognosis. We trained machine learning algorithms to guide prognosis by predicting 180-day mortality for women with ovarian cancer using patient-reported outcomes (PRO) data. We collected data from a single academic cancer institution in the United States. Women completed biopsychosocial PRO measures every 90 days. We randomly partitioned our dataset into training and testing samples. We used synthetic minority oversampling to reduce class imbalance in the training dataset. We fitted training data to six machine learning algorithms and combined their classifications on the testing dataset into an unweighted voting ensemble. We assessed each algorithm's accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) using testing data. We recruited 245 patients who completed 1319 PRO assessments. The final voting ensemble produced state-of-the-art results on the task of predicting 180-day mortality for ovarian cancer paitents (Accuracy = 0.79, Sensitivity = 0.71, Specificity = 0.80, AUROC = 0.76). The algorithm correctly identified 25 of the 35 women in the testing dataset who died within 180 days of assessment. Machine learning algorithms trained using PRO data offer encouraging performance in predicting whether a woman with ovarian cancer will die within 180 days. This model could be used to drive data-driven end-of-life care and address current shortcomings in care delivery. Our model demonstrates the potential of biopsychosocial PROM information to make substantial contributions to oncology prediction modeling. This model could inform clinical decision-making Future research is needed to validate these findings in a larger, more diverse sample.


Subject(s)
Ovarian Neoplasms , Schools , Humans , Female , Machine Learning , Patient Reported Outcome Measures
5.
Aesthetic Plast Surg ; 46(6): 2769-2780, 2022 12.
Article in English | MEDLINE | ID: mdl-35764813

ABSTRACT

INTRODUCTION: In the past decade there has been an increasing interest in the field of patient-reported outcome measures (PROMs) which are now commonly used alongside traditional outcome measures, such as morbidity and mortality. Since the FACE-Q Aesthetic development in 2010, it has been widely used in clinical practice and research, measuring the quality of life and patient satisfaction. It quantifies the impact and change across different aspects of cosmetic facial surgery and minimally invasive treatments. We review how researchers have utilized the FACE-Q Aesthetic module to date, and aim to understand better whether and how it has enhanced our understanding and practice of aesthetic facial procedures. METHODS: We performed a systematic search of the literature. Publications that used the FACE-Q Aesthetic module to evaluate patient outcomes were included. Publications about the development of PROMs or modifications of the FACE-Q Aesthetic, translation or validation studies of the FACE-Q Aesthetic scales, papers not published in English, reviews, comments/discussions, or letters to the editor were excluded. RESULTS: Our search produced 1189 different articles; 70 remained after applying in- and exclusion criteria. Significant findings and associations were further explored. The need for evidence-based patient-reported outcome caused a growing uptake of the FACE-Q Aesthetic in cosmetic surgery and dermatology an increasing amount of evidence concerning facelift surgery, botulinum toxin, rhinoplasty, soft tissue fillers, scar treatments, and experimental areas. DISCUSSION: The FACE-Q Aesthetic has been used to contribute substantial evidence about the outcome from the patient perspective in cosmetic facial surgery and minimally invasive treatments. The FACE-Q Aesthetic holds great potential to improve quality of care and may fundamentally change the way we measure success in plastic surgery and dermatology. LEVEL OF EVIDENCE III: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .


Subject(s)
Patient Reported Outcome Measures , Plastic Surgery Procedures , Quality of Life , Humans , Esthetics
6.
Plast Reconstr Surg Glob Open ; 10(4): e4279, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35450263

ABSTRACT

Background: Carpal tunnel syndrome (CTS) is extremely common and typically treated with carpal tunnel decompression (CTD). Although generally an effective treatment, up to 25% of patients do not experience meaningful benefit. Given the prevalence, this amounts to considerable morbidity and cost without return. Being able to reliably predict which patients would benefit from CTD preoperatively would support more patient-centered and value-based care. Methods: We used registry data from 1916 consecutive patients undergoing CTD for CTS at a regional hand center between 2010 and 2019. Improvement was defined as change exceeding the respective QuickDASH subscale's minimal important change estimate. Predictors included a range of clinical, demographic and patient-reported variables. Data were split into training (75%) and test (25%) sets. A range of machine learning algorithms was developed using the training data and evaluated with the test data. We also used a machine learning technique called chi-squared automatic interaction detection to develop flowcharts that could help clinicians and patients to understand the chances of a patient improving with surgery. Results: The top performing models predicted functional and symptomatic improvement with accuracies of 0.718 (95% confidence interval 0.660, 0.771) and 0.759 (95% confidence interval 0.708, 0.810), respectively. The chi-squared automatic interaction detection flowcharts could provide valuable clinical insights from as little as two preoperative questions. Conclusions: Patient-reported outcome measures and machine learning can support patient-centered and value-based healthcare. Our algorithms can be used for expectation management and to rationalize treatment risks and costs associated with CTD.

7.
Qual Life Res ; 31(3): 917-925, 2022 Mar.
Article in English | MEDLINE | ID: mdl-34590202

ABSTRACT

PURPOSE: This study aimed to evaluate and improve the accuracy and efficiency of the QuickDASH for use in assessment of limb function in patients with upper extremity lymphedema using modern psychometric techniques. METHOD: We conducted confirmative factor analysis (CFA) and Mokken analysis to examine the assumption of unidimensionality for IRT model on data from 285 patients who completed the QuickDASH, and then fit the data to Samejima's graded response model (GRM) and assessed the assumption of local independence of items and calibrated the item responses for CAT simulation. RESULTS: Initial CFA and Mokken analyses demonstrated good scalability of items and unidimensionality. However, the local independence of items assumption was violated between items 9 (severity of pain) and 11 (sleeping difficulty due to pain) (Yen's Q3 = 0.46) and disordered thresholds were evident for item 5 (cutting food). After addressing these breaches of assumptions, the re-analyzed GRM with the remaining 10 items achieved an improved fit. Simulation of CAT administration demonstrated a high correlation between scores on the CAT and the QuickDash (r = 0.98). Items 2 (doing heavy chores) and 8 (limiting work or daily activities) were the most frequently used. The correlation among factor scores derived from the QuickDASH version with 11 items and the Ultra-QuickDASH version with items 2 and 8 was as high as 0.91. CONCLUSION: By administering just these two best performing QuickDash items we can obtain estimates that are very similar to those obtained from the full-length QuickDash without the need for CAT technology.


Subject(s)
Computerized Adaptive Testing , Lymphedema , Humans , Lymphedema/diagnosis , Psychometrics , Quality of Life/psychology , Surveys and Questionnaires
8.
J Plast Reconstr Aesthet Surg ; 75(1): 33-44, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34753682

ABSTRACT

BACKGROUND: Facial vascularized composite allotransplantation (fVCA) is a life-enhancing procedure performed to improve quality of life (QOL). Patient-reported outcome measures (PROMs) are tools used to assess QOL from the patients' perspective, and are increasingly recognized as an important clinical metric to assess outcomes of treatment. A systematic literature review was performed to identify and appraise the content of PROMs used in fVCA. METHODS: We searched PUBMED/Medline, CINAHL, Embase, PsychInfo, and Web of Science from their inception through to June 2020. Included studies used a PROM in candidates and recipients of fVCA of any gender or age. We excluded abstracts, reviews, editorials, and dissertations. Items from each PROM were extracted and coded, using top-level codes and subcodes, to develop a preliminary conceptual framework of QOL concerns in fVCA, and to guide future PROM selection. RESULTS: Title and abstract screening of 6089 publications resulted in 16 studies that met inclusion criteria. Review of the 16 studies identified 38 PROMs, none of which were developed for fVCA. Review of the coded content for each PROM identified six top-level codes (appearance, facial function, physical, psychological and social health, and experience of care) and 16 subcodes, making up the preliminary conceptual framework. CONCLUSION: There are currently no PROMs designed to measure QOL concerns of fVCA candidates and recipients. Findings from this systematic review will be used to inform an interview guide for use in qualitative interviews to elicit and refine important concepts related to QOL in fVCA.


Subject(s)
Quality of Life , Vascularized Composite Allotransplantation , Face , Humans , Patient Reported Outcome Measures
9.
Plast Reconstr Surg Glob Open ; 9(9): e3806, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34549001

ABSTRACT

BACKGROUND: The CLEFT-Q is a patient-reported outcome measure with seven scales measuring elements of facial appearance in cleft lip and/or palate. We built on the validated CLEFT-Q structural model to describe conceptual relationships between these scales, and tested our hypothesis through structural equation modeling (SEM). In our hypothesized model, the appearance of the nose, nostrils, teeth, jaw, lips, and cleft lip scar all contribute to overall facial appearance. METHODS: We included 640 participants from the international CLEFT-Q field test. Model fit was assessed using weighted least squares mean and variance adjusted regression. The model was then refined through modification indices. The fit of the hypothesized model was confirmed in an independent sample of 452 participants. RESULTS: The refined model demonstrated excellent fit to the data (comparative fit index 0.999, Tucker-Lewis index 0.999, root mean square error of approximation 0.036 and standardized root mean square residual 0.036). The confirmatory analysis also demonstrated excellent model fit. CONCLUSION: Our structural model, based on a clinical understanding of appearance in orofacial clefting, aligns with CLEFT-Q field test data. This supports the instrument's use and the exploration of a wider range of applications, such as multidimensional computerized adaptive testing.

10.
Plast Reconstr Surg ; 148(4): 863-869, 2021 Oct 01.
Article in English | MEDLINE | ID: mdl-34415858

ABSTRACT

BACKGROUND: Skin cancer is among the most frequently occurring malignancies worldwide, which creates a great need for an effective patient-reported outcome measure. Providing shorter questionnaires reduces patient burden and increases patients' willingness to complete forms. The authors set out to use computerized adaptive testing to reduce the number of items needed to predict results for scales of the FACE-Q Skin Cancer Module, a validated patient-reported outcome measure that measures health-related quality of life and patient satisfaction in facial surgery. METHODS: Computerized adaptive testing generates tailored questionnaires for patients in real time based on their responses to previous questions. The authors used an open-source computerized adaptive testing simulation software to run item responses for the five scales from the FACE-Q Skin Cancer Module (i.e., scar appraisal, satisfaction with facial appearance, appearance-related psychosocial distress, cancer worry, and satisfaction with information about appearance). Each simulation continued to administer items until prespecified levels of precision were met, estimated by standard error. Mean and maximum item reductions between the original fixed-length short forms and the simulated versions were evaluated. RESULTS: The number of questions that patients needed to answer to complete the FACE-Q Skin Oncology Module was reduced from 41 items in the original form to a mean of 23 ± 0.55 items (range, 15 to 29) using the computerized adaptive testing version. Simulated computerized adaptive testing scores maintained a high correlation (0.98 to 0.99) with the score from the fixed-length short forms. CONCLUSIONS: Applying computerized adaptive testing to the FACE-Q Skin Cancer Module can reduce the length of assessment by more than 50 percent, with virtually no loss in precision. It is likely to play a critical role in the implementation in clinical practice.


Subject(s)
Facial Neoplasms/surgery , Patient Reported Outcome Measures , Plastic Surgery Procedures/statistics & numerical data , Skin Neoplasms/surgery , Surgical Wound/surgery , Computerized Adaptive Testing , Esthetics , Face/surgery , Facial Neoplasms/pathology , Humans , Patient Satisfaction/statistics & numerical data , Psychometrics/methods , Psychometrics/statistics & numerical data , Quality of Life , Plastic Surgery Procedures/psychology , Reproducibility of Results , Skin Neoplasms/psychology , Surgical Wound/etiology , Surveys and Questionnaires/statistics & numerical data
11.
BMC Med Res Methodol ; 21(1): 158, 2021 07 31.
Article in English | MEDLINE | ID: mdl-34332525

ABSTRACT

BACKGROUND: Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be analysed by statistical and machine learning (ML) models. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software. METHODS: We performed three NLP experiments using publicly-available data obtained from medicine review websites. First, we conducted lexicon-based sentiment analysis on open-text patient reviews of four drugs: Levothyroxine, Viagra, Oseltamivir and Apixaban. Next, we used unsupervised ML (latent Dirichlet allocation, LDA) to identify similar drugs in the dataset, based solely on their reviews. Finally, we developed three supervised ML algorithms to predict whether a drug review was associated with a positive or negative rating. These algorithms were: a regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). We compared the performance of these algorithms in terms of classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity and specificity. RESULTS: Levothyroxine and Viagra were reviewed with a higher proportion of positive sentiments than Oseltamivir and Apixaban. One of the three LDA clusters clearly represented drugs used to treat mental health problems. A common theme suggested by this cluster was drugs taking weeks or months to work. Another cluster clearly represented drugs used as contraceptives. Supervised machine learning algorithms predicted positive or negative drug ratings with classification accuracies ranging from 0.664, 95% CI [0.608, 0.716] for the regularised regression to 0.720, 95% CI [0.664,0.776] for the SVM. CONCLUSIONS: In this paper, we present a conceptual overview of common techniques used to analyse large volumes of text, and provide reproducible code that can be readily applied to other research studies using open-source software.


Subject(s)
Machine Learning , Natural Language Processing , Algorithms , Humans , Neural Networks, Computer , Support Vector Machine
12.
J Med Internet Res ; 23(7): e26412, 2021 07 30.
Article in English | MEDLINE | ID: mdl-34328443

ABSTRACT

BACKGROUND: Computerized adaptive testing (CAT) has been shown to deliver short, accurate, and personalized versions of the CLEFT-Q patient-reported outcome measure for children and young adults born with a cleft lip and/or palate. Decision trees may integrate clinician-reported data (eg, age, gender, cleft type, and planned treatments) to make these assessments even shorter and more accurate. OBJECTIVE: We aimed to create decision tree models incorporating clinician-reported data into adaptive CLEFT-Q assessments and compare their accuracy to traditional CAT models. METHODS: We used relevant clinician-reported data and patient-reported item responses from the CLEFT-Q field test to train and test decision tree models using recursive partitioning. We compared the prediction accuracy of decision trees to CAT assessments of similar length. Participant scores from the full-length questionnaire were used as ground truth. Accuracy was assessed through Pearson's correlation coefficient of predicted and ground truth scores, mean absolute error, root mean squared error, and a two-tailed Wilcoxon signed-rank test comparing squared error. RESULTS: Decision trees demonstrated poorer accuracy than CAT comparators and generally made data splits based on item responses rather than clinician-reported data. CONCLUSIONS: When predicting CLEFT-Q scores, individual item responses are generally more informative than clinician-reported data. Decision trees that make binary splits are at risk of underfitting polytomous patient-reported outcome measure data and demonstrated poorer performance than CATs in this study.


Subject(s)
Cleft Lip , Cleft Palate , Cleft Lip/diagnosis , Cleft Palate/diagnosis , Humans , Patient Reported Outcome Measures , Quality of Life
14.
J Plast Reconstr Aesthet Surg ; 74(6): 1355-1401, 2021 06.
Article in English | MEDLINE | ID: mdl-33376081

ABSTRACT

BACKGROUND: Computerised adaptive testing (CAT) has the potential to transform plastic surgery outcome measurement by making patient-reported outcome measures (PROMs) shorter, individualised and more accurate than pen-and-paper questionnaires. OBJECTIVES: This paper reports the results of two optimisation studies for the CLEFT-Q CAT, a CAT intended for use in the field of cleft lip and/or palate. Specifically, we aimed to identify the optimal score estimation and item selection methods for using this CAT in clinical practice. These represent two major components of any CAT algorithm. METHOD: Monte Carlo simulations were performed using simulated data in the R statistical computing environment and incorporated a range of score estimation and item selection techniques. The performance and accuracy of the CAT was assessed by mean items administered, correlation between CAT scores and paired linear assessment scores, and the root mean squared deviation (RMSD) of these score pairs. RESULTS: The accuracy of the CLEFT-Q CAT was not significantly affected by the choice of score estimation or item selection method. Sub-scales which originally contain more items were amenable to greater item reduction with CAT. CONCLUSION: This study shows that score estimation and item selection methods that need minimal processing power can be used in the CLEFT-Q CAT without compromising accuracy. This means that the CLEFT-Q CAT could be administered quickly and efficiently with basic hardware demands. We recommend the use of less computationally intensive techniques in future CLEFT-Q CAT studies.


Subject(s)
Cleft Lip , Cleft Palate , Patient Reported Outcome Measures , Plastic Surgery Procedures , Quality of Life , Surgery, Plastic , Cleft Lip/psychology , Cleft Lip/surgery , Cleft Palate/psychology , Cleft Palate/surgery , Computer Simulation , Humans , Monte Carlo Method , Psychometrics , Plastic Surgery Procedures/methods , Plastic Surgery Procedures/statistics & numerical data , Reproducibility of Results , Software Design , Surgery, Plastic/adverse effects , Surgery, Plastic/methods , Surgery, Plastic/statistics & numerical data
15.
Qual Life Res ; 29(4): 1065-1072, 2020 Apr.
Article in English | MEDLINE | ID: mdl-31758485

ABSTRACT

PURPOSE: With the BODY-Q, one can assess outcomes, such as satisfaction with appearance, in weight loss and body contouring patients using multiple scales. All scales can be used independently in any given combination or order. Currently, the BODY-Q cannot provide overall appearance scores across scales that measure a similar super-ordinate construct (i.e., overall appearance), which could improve the scales' usefulness as a benchmarking tool and improve the comprehensibility of patient feedback. We explored the possibility of establishing overall appearance scores, by applying a bifactor model to the BODY-Q appearance scales. METHODS: In a bifactor model, questionnaire items load onto both a primary specific factors and a general factor, such as satisfaction with appearance. The international BODY-Q validation patient sample (n = 734) was used to fit a bifactor model to the appearance domain. Factor loadings, fit indices, and correlation between bifactor appearance domain and satisfaction with body scale were assessed. RESULTS: All items loaded on the general factor of their corresponding domain. In the appearance domain, all items demonstrated adequate item fit to the model. All scales had satisfactory fit to the bifactor model (RMSEA 0.045, CFI 0.969, and TLI 0.964). The correlation between the appearance domain summary scores and satisfaction with body scale scores was found to be 0.77. DISCUSSION: We successfully applied a bifactor model to BODY-Q data with good item and model fit indices. With this method, we were able to produce reliable overall appearance scores which may improve the interpretability of the BODY-Q while increasing flexibility.


Subject(s)
Body Image/psychology , Patient Satisfaction/statistics & numerical data , Physical Appearance, Body/physiology , Psychometrics/methods , Benchmarking , Health Status , Humans , Quality of Life/psychology , Surveys and Questionnaires , Weight Loss
16.
J Plast Reconstr Aesthet Surg ; 72(11): 1819-1824, 2019 Nov.
Article in English | MEDLINE | ID: mdl-31358447

ABSTRACT

BACKGROUND: The International Consortium for Health Outcome Measurement (ICHOM) has recently agreed upon a core outcome set for the comprehensive appraisal of cleft care, which puts a greater emphasis on patient-reported outcome measures (PROMs) and, in particular, the CLEFT-Q. The CLEFT-Q comprises 12 scales with a total of 110 items, aimed to be answered by children as young as 8 years old. OBJECTIVE: In this study, we aimed to use computerised adaptive testing (CAT) to reduce the number of items needed to predict results for each CLEFT-Q scale. METHOD: We used an open-source CAT simulation package to run item responses over each of the full-length scales and its CAT counterpart at varying degrees of precision, estimated by standard error (SE). The mean number of items needed to achieve a given SE was recorded for each scale's CAT, and the correlations between results from the full-length scales and those predicted by the CAT versions were calculated. RESULTS: Using CATs for each of the 12 CLEFT-Q scales, we reduced the number of questions that participants needed to answer, that is, from 110 to a mean of 43.1 (range 34-60, SE < 0.55) while maintaining a 97% correlation between scores obtained with CAT and full-length scales. CONCLUSIONS: CAT is likely to play a fundamental role in the uptake of PROMs into clinical practice given the high degree of accuracy achievable with substantially fewer items.


Subject(s)
Cleft Lip/surgery , Cleft Palate/surgery , Patient Reported Outcome Measures , Adolescent , Adult , Algorithms , Child , Computer Simulation , Diagnosis, Computer-Assisted , Female , Humans , Male , Predictive Value of Tests , Reproducibility of Results , Surveys and Questionnaires , Young Adult
17.
BMC Med Res Methodol ; 19(1): 64, 2019 03 19.
Article in English | MEDLINE | ID: mdl-30890124

ABSTRACT

BACKGROUND: Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data. METHODS: We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment. RESULTS: The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble. CONCLUSIONS: We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.


Subject(s)
Algorithms , Breast Neoplasms/diagnosis , Diagnosis, Computer-Assisted/methods , Machine Learning , Neural Networks, Computer , Support Vector Machine , Female , Humans , Sensitivity and Specificity , Software
18.
PLoS One ; 14(2): e0206507, 2019.
Article in English | MEDLINE | ID: mdl-30759097

ABSTRACT

BACKGROUND: People living with serious mental health conditions experience increased morbidity due to physical health issues driven by medication side-effects and lifestyle factors. Coordinated mental and physical healthcare delivered in accordance with a care plan could help to reduce morbidity and mortality in this population. Efforts to develop new models of care are hampered by a lack of validated instruments to accurately assess the extent to which mental health services users and carers are involved in care planning for physical health. OBJECTIVE: To develop a brief and accurate patient-reported experience measure (PREM) capable of assessing involvement in physical health care planning for mental health service users and their carers. METHODS: We employed psychometric and statistical techniques to refine a bank of candidate questionnaire items, derived from qualitative interviews, into a valid and reliable measure involvement in physical health care planning. We assessed the psychometric performance of the item bank using modern psychometric analyses. We assessed unidimensionality, scalability, fit to the partial credit Rasch model, category threshold ordering, local dependency, differential item functioning, and test-retest reliability. Once purified of poorly performing and erroneous items, we simulated computerized adaptive testing (CAT) with 15, 10 and 5 items using the calibrated item bank. RESULTS: Issues with category threshold ordering, local dependency and differential item functioning were evident for a number of items in the nascent item bank and were resolved by removing problematic items. The final 19 item PREM had excellent fit to the Rasch model fit (x2 = 192.94, df = 1515, P = .02, RMSEA = .03 (95% CI = .01-.04). The 19-item bank had excellent reliability (marginal r = 0.87). The correlation between questionnaire scores at baseline and 2-week follow-up was high (r = .70, P < .01) and 94.9% of assessment pairs were within the Bland Altman limits of agreement. Simulated CAT demonstrated that assessments could be made using as few as 10 items (mean SE = .43). DISCUSSION: We developed a flexible patient reported outcome measure to quantify service user and carer involvement in physical health care planning. We demonstrate the potential to substantially reduce assessment length whilst maintaining reliability by utilizing CAT.


Subject(s)
Caregivers , Health Planning , Mental Disorders/therapy , Mental Health Services , Patient Participation , Patient Reported Outcome Measures , Adult , Caregivers/psychology , Computer Simulation , Female , Health Planning/methods , Humans , Male , Mental Disorders/psychology , Psychometrics , Qualitative Research , United Kingdom
SELECTION OF CITATIONS
SEARCH DETAIL
...