Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
J Pers Med ; 14(5)2024 May 11.
Article in English | MEDLINE | ID: mdl-38793096

ABSTRACT

Despite the extensive literature on missing data theory and cautionary articles emphasizing the importance of realistic analysis for healthcare data, a critical gap persists in incorporating domain knowledge into the missing data methods. In this paper, we argue that the remedy is to identify the key scenarios that lead to data missingness and investigate their theoretical implications. Based on this proposal, we first introduce an analysis framework where we investigate how different observation agents, such as physicians, influence the data availability and then scrutinize each scenario with respect to the steps in the missing data analysis. We apply this framework to the case study of observational data in healthcare facilities. We identify ten fundamental missingness scenarios and show how they influence the identification step for missing data graphical models, inverse probability weighting estimation, and exponential tilting sensitivity analysis. To emphasize how domain-informed analysis can improve method reliability, we conduct simulation studies under the influence of various missingness scenarios. We compare the results of three common methods in medical data analysis: complete-case analysis, Missforest imputation, and inverse probability weighting estimation. The experiments are conducted for two objectives: variable mean estimation and classification accuracy. We advocate for our analysis approach as a reference for the observational health data analysis. Beyond that, we also posit that the proposed analysis framework is applicable to other medical domains.

2.
BMC Pediatr ; 24(1): 249, 2024 Apr 11.
Article in English | MEDLINE | ID: mdl-38605404

ABSTRACT

BACKGROUND: Long-term survival after premature birth is significantly determined by development of morbidities, primarily affecting the cardio-respiratory or central nervous system. Existing studies are limited to pairwise morbidity associations, thereby lacking a holistic understanding of morbidity co-occurrence and respective risk profiles. METHODS: Our study, for the first time, aimed at delineating and characterizing morbidity profiles at near-term age and investigated the most prevalent morbidities in preterm infants: bronchopulmonary dysplasia (BPD), pulmonary hypertension (PH), mild cardiac defects, perinatal brain pathology and retinopathy of prematurity (ROP). For analysis, we employed two independent, prospective cohorts, comprising a total of 530 very preterm infants: AIRR ("Attention to Infants at Respiratory Risks") and NEuroSIS ("Neonatal European Study of Inhaled Steroids"). Using a data-driven strategy, we successfully characterized morbidity profiles of preterm infants in a stepwise approach and (1) quantified pairwise morbidity correlations, (2) assessed the discriminatory power of BPD (complemented by imaging-based structural and functional lung phenotyping) in relation to these morbidities, (3) investigated collective co-occurrence patterns, and (4) identified infant subgroups who share similar morbidity profiles using machine learning techniques. RESULTS: First, we showed that, in line with pathophysiologic understanding, BPD and ROP have the highest pairwise correlation, followed by BPD and PH as well as BPD and mild cardiac defects. Second, we revealed that BPD exhibits only limited capacity in discriminating morbidity occurrence, despite its prevalence and clinical indication as a driver of comorbidities. Further, we demonstrated that structural and functional lung phenotyping did not exhibit higher association with morbidity severity than BPD. Lastly, we identified patient clusters that share similar morbidity patterns using machine learning in AIRR (n=6 clusters) and NEuroSIS (n=8 clusters). CONCLUSIONS: By capturing correlations as well as more complex morbidity relations, we provided a comprehensive characterization of morbidity profiles at discharge, linked to shared disease pathophysiology. Future studies could benefit from identifying risk profiles to thereby develop personalized monitoring strategies. TRIAL REGISTRATION: AIRR: DRKS.de, DRKS00004600, 28/01/2013. NEuroSIS: ClinicalTrials.gov, NCT01035190, 18/12/2009.


Subject(s)
Bronchopulmonary Dysplasia , Infant, Premature, Diseases , Retinopathy of Prematurity , Infant , Female , Pregnancy , Infant, Newborn , Humans , Infant, Premature , Prospective Studies , Infant, Very Low Birth Weight , Infant, Premature, Diseases/epidemiology , Bronchopulmonary Dysplasia/complications , Morbidity , Retinopathy of Prematurity/epidemiology , Gestational Age
3.
JCO Clin Cancer Inform ; 7: e2300062, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37922432

ABSTRACT

PURPOSE: Overall survival (OS) is the primary end point in phase III oncology trials. Given low success rates, surrogate end points, such as progression-free survival or objective response rate, are used in early go/no-go decision making. Here, we investigate whether early trends of OS prognostic biomarkers, such as the ROPRO and DeepROPRO, can also be used for this purpose. METHODS: Using real-world data, we emulated a series of 12 advanced non-small-cell lung cancer (aNSCLC) clinical trials, originally conducted by six different sponsors and evaluated four different mechanisms, in a total of 19,920 individuals. We evaluated early trends (until 6 months) of the OS biomarker alongside early OS within the joint model (JM) framework. Study-level estimates of early OS and ROPRO trends were correlated against the actual final OS hazard ratios (HRs). RESULTS: We observed a strong correlation between the JM estimates and final OS HR at 3 months (adjusted R2 = 0.88) and at 6 months (adjusted R2 = 0.85). In the leave-one-out analysis, there was a low overall prediction error of the OS HR at both 3 months (root-mean-square error [RMSE] = 0.11) and 6 months (RMSE = 0.12). In addition, at 3 months, the absolute prediction error of the OS HR was lower than 0.05 for three trials. CONCLUSION: We describe a pipeline to predict trial OS HRs using emulated aNSCLC studies and their early OS and OS biomarker trends. The method has the potential to accelerate and improve decision making in drug development.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Humans , Carcinoma, Non-Small-Cell Lung/therapy , Carcinoma, Non-Small-Cell Lung/drug therapy , Prognosis , Lung Neoplasms/therapy , Lung Neoplasms/drug therapy , Disease-Free Survival , Biomarkers
4.
Stat Med ; 42(29): 5419-5450, 2023 Dec 20.
Article in English | MEDLINE | ID: mdl-37759370

ABSTRACT

The pattern graph framework solves a wide range of missing data problems with nonignorable mechanisms. However, it faces two challenges of assessability and interpretability, particularly important in safety-critical problems such as clinical diagnosis: (i) How can one assess the validity of the framework's a priori assumption and make necessary adjustments to accommodate known information about the problem? (ii) How can one interpret the process of exponential tilting used for sensitivity analysis in the pattern graph framework and choose the tilt perturbations based on meaningful real-world quantities? In this paper, we introduce Informed Sensitivity Analysis, an extension of the pattern graph framework that enables us to incorporate substantive knowledge about the missingness mechanism into the pattern graph framework. Our extension allows us to examine the validity of assumptions underlying pattern graphs and interpret sensitivity analysis results in terms of realistic problem characteristics. We apply our method to a prevalent nonignorable missing data scenario in clinical research. We validate and compare our method's results of our method with a number of widely-used missing data methods, including Unweighted CCA, KNN Imputer, MICE, and MissForest. The validation is done using both boot-strapped simulated experiments as well as real-world clinical observations in the MIMIC-III public dataset.


Subject(s)
Models, Statistical , Palliative Care , Humans , Triazoles
5.
Biochim Biophys Acta Mol Basis Dis ; 1869(2): 166592, 2023 02.
Article in English | MEDLINE | ID: mdl-36328146

ABSTRACT

SARS-CoV-2 remains an acute threat to human health, endangering hospital capacities worldwide. Previous studies have aimed at informing pathophysiologic understanding and identification of disease indicators for risk assessment, monitoring, and therapeutic guidance. While findings start to emerge in the general population, observations in high-risk patients with complex pre-existing conditions are limited. We addressed the gap of existing knowledge with regard to a differentiated understanding of disease dynamics in SARS-CoV-2 infection while specifically considering disease stage and severity. We biomedically characterized quantitative proteomics in a hospitalized cohort of COVID-19 patients with mild to severe symptoms suffering from different (co)-morbidities in comparison to both healthy individuals and patients with non-COVID related inflammation. Deep clinical phenotyping enabled the identification of individual disease trajectories in COVID-19 patients. By the use of the individualized disease phase assignment, proteome analysis revealed a severity dependent general type-2-centered host response side-by-side with a disease specific antiviral immune reaction in early disease. The identification of phenomena such as neutrophil extracellular trap (NET) formation and a pro-coagulatory response characterizing severe disease was successfully validated in a second cohort. Together with the regulation of proteins related to SARS-CoV-2-specific symptoms identified by proteome screening, we not only confirmed results from previous studies but provide novel information for biomarker and therapy development.


Subject(s)
COVID-19 , Humans , SARS-CoV-2/metabolism , Antiviral Agents , Proteome/metabolism , Proteomics
6.
Nutrients ; 14(19)2022 Sep 21.
Article in English | MEDLINE | ID: mdl-36235563

ABSTRACT

Very preterm infants are at high risk for suboptimal nutrition in the first weeks of life leading to insufficient weight gain and complications arising from metabolic imbalances such as insufficient bone mineral accretion. We investigated the use of a novel set of standardized parenteral nutrition (PN; MUC PREPARE) solutions regarding improving nutritional intake, accelerating termination of parenteral feeding, and positively affecting growth in comparison to individually prescribed and compounded PN solutions. We studied the effect of MUC PREPARE on macro- and micronutrient intake, metabolism, and growth in 58 very preterm infants and compared results to a historic reference group of 58 very preterm infants matched for clinical characteristics. Infants receiving MUC PREPARE demonstrated improved macro- and micronutrient intake resulting in balanced electrolyte levels and stable metabolomic profiles. Subsequently, improved energy supply was associated with up to 1.5 weeks earlier termination of parenteral feeding, while simultaneously reaching up to 1.9 times higher weight gain at day 28 in extremely immature infants (<27 GA weeks) as well as overall improved growth at 2 years of age for all infants. The use of the new standardized PN solution MUC PREPARE improved nutritional supply and short- and long-term growth and reduced PN duration in very preterm infants and is considered a superior therapeutic strategy.


Subject(s)
Infant, Premature, Diseases , Parenteral Nutrition Solutions , Electrolytes , Female , Fetal Growth Retardation , Humans , Infant , Infant, Newborn , Infant, Premature , Micronutrients , Weight Gain
7.
Ann Thorac Surg ; 114(6): 2173-2179, 2022 12.
Article in English | MEDLINE | ID: mdl-34890575

ABSTRACT

BACKGROUND: Hospital readmission within 30 days of discharge is a well-studied outcome. Predicting readmission after cardiac surgery, however, is notoriously challenging; the best-performing models in the literature have areas under the curve around .65. A reliable predictive model would enable clinicians to identify patients at risk for readmission and to develop prevention strategies. METHODS: We analyzed The Society of Thoracic Surgeons (STS) Adult Cardiac Surgery Database at our institution, augmented with electronic medical record data. Predictors included demographics, preoperative comorbidities, proxies for intraoperative risk, indicators of postoperative complications, and time series-derived variables. We trained several machine learning models, evaluating each on a held-out test set. RESULTS: Our analysis cohort consisted of 4924 cases from 2011 to 2016. Of those, 723 (14.7%) were readmitted within 30 days of discharge. Our models included 141 STS-derived and 24 electronic medical records-derived variables. A random forest model performed best, with test area under the curve 0.76 (95% confidence interval, 0.73 to 0.79). Using exclusively preoperative variables, as in STS calculated risk scores, degraded the area under the curve, to 0.64 (95% confidence interval, 0.60 to 0.68). Key predictors included length of stay (12.5 times more important than the average variable) and whether the patient was discharged to a rehabilitation facility (11.2 times). CONCLUSIONS: Our approach, augmenting STS variables with electronic medical records data and using flexible machine learning modeling, yielded state-of-the-art performance for predicting 30-day readmission. Separately, the importance of variables not directly related to inpatient care, such as discharge location, amplifies questions about the efficacy of assessing care quality by readmissions.


Subject(s)
Cardiac Surgical Procedures , Patient Readmission , Adult , Humans , Patient Discharge , Machine Learning , Cardiac Surgical Procedures/adverse effects , Cohort Studies , Risk Factors , Retrospective Studies
8.
Front Artif Intell ; 4: 625573, 2021.
Article in English | MEDLINE | ID: mdl-33937744

ABSTRACT

Introduction: Prognostic scores are important tools in oncology to facilitate clinical decision-making based on patient characteristics. To date, classic survival analysis using Cox proportional hazards regression has been employed in the development of these prognostic scores. With the advance of analytical models, this study aimed to determine if more complex machine-learning algorithms could outperform classical survival analysis methods. Methods: In this benchmarking study, two datasets were used to develop and compare different prognostic models for overall survival in pan-cancer populations: a nationwide EHR-derived de-identified database for training and in-sample testing and the OAK (phase III clinical trial) dataset for out-of-sample testing. A real-world database comprised 136K first-line treated cancer patients across multiple cancer types and was split into a 90% training and 10% testing dataset, respectively. The OAK dataset comprised 1,187 patients diagnosed with non-small cell lung cancer. To assess the effect of the covariate number on prognostic performance, we formed three feature sets with 27, 44 and 88 covariates. In terms of methods, we benchmarked ROPRO, a prognostic score based on the Cox model, against eight complex machine-learning models: regularized Cox, Random Survival Forests (RSF), Gradient Boosting (GB), DeepSurv (DS), Autoencoder (AE) and Super Learner (SL). The C-index was used as the performance metric to compare different models. Results: For in-sample testing on the real-world database the resulting C-index [95% CI] values for RSF 0.720 [0.716, 0.725], GB 0.722 [0.718, 0.727], DS 0.721 [0.717, 0.726] and lastly, SL 0.723 [0.718, 0.728] showed significantly better performance as compared to ROPRO 0.701 [0.696, 0.706]. Similar results were derived across all feature sets. However, for the out-of-sample validation on OAK, the stronger performance of the more complex models was not apparent anymore. Consistently, the increase in the number of prognostic covariates did not lead to an increase in model performance. Discussion: The stronger performance of the more complex models did not generalize when applied to an out-of-sample dataset. We hypothesize that future research may benefit by adding multimodal data to exploit advantages of more complex models.

9.
Int J Comput Assist Radiol Surg ; 14(11): 2005-2020, 2019 Nov.
Article in English | MEDLINE | ID: mdl-31037493

ABSTRACT

PURPOSE: Automatically segmenting and classifying surgical activities is an important prerequisite to providing automated, targeted assessment and feedback during surgical training. Prior work has focused almost exclusively on recognizing gestures, or short, atomic units of activity such as pushing needle through tissue, whereas we also focus on recognizing higher-level maneuvers, such as suture throw. Maneuvers exhibit more complexity and variability than the gestures from which they are composed, however working at this granularity has the benefit of being consistent with existing training curricula. METHODS: Prior work has focused on hidden Markov model and conditional-random-field-based methods, which typically leverage unary terms that are local in time and linear in model parameters. Because maneuvers are governed by long-term, nonlinear dynamics, we argue that the more expressive unary terms offered by recurrent neural networks (RNNs) are better suited for this task. Four RNN architectures are compared for recognizing activities from kinematics: simple RNNs, long short-term memory, gated recurrent units, and mixed history RNNs. We report performance in terms of error rate and edit distance, and we use a functional analysis-of-variance framework to assess hyperparameter sensitivity for each architecture. RESULTS: We obtain state-of-the-art performance for both maneuver recognition from kinematics (4 maneuvers; error rate of [Formula: see text]; normalized edit distance of [Formula: see text]) and gesture recognition from kinematics (10 gestures; error rate of [Formula: see text]; normalized edit distance of [Formula: see text]). CONCLUSIONS: Automated maneuver recognition is feasible with RNNs, an exciting result which offers the opportunity to provide targeted assessment and feedback at a higher level of granularity. In addition, we show that multiple hyperparameters are important for achieving good performance, and our hyperparameter analysis serves to aid future work in RNN-based activity recognition.


Subject(s)
Education, Medical, Graduate/methods , General Surgery/education , Neural Networks, Computer , Pattern Recognition, Automated/methods , Robotics/education , Suture Techniques/education , Gestures , Humans , Robotics/methods
10.
JAMA Facial Plast Surg ; 21(2): 104-109, 2019 Mar 01.
Article in English | MEDLINE | ID: mdl-30325993

ABSTRACT

IMPORTANCE: Daytime sleepiness in surgical trainees can impair intraoperative technical skill and thus affect their learning and pose a risk to patient safety. OBJECTIVE: To determine the association between daytime sleepiness of surgeons in residency and fellowship training and their intraoperative technical skill during septoplasty. DESIGN, SETTING, AND PARTICIPANTS: This prospective cohort study included 19 surgical trainees in otolaryngology-head and neck surgery programs at 2 academic institutions (Johns Hopkins University School of Medicine and MedStar Georgetown University Hospital). The physicians were recruited from June 13, 2016, to April 20, 2018. The analysis includes data that were captured between June 27, 2016, and April 20, 2018. MAIN OUTCOMES AND MEASURES: Attending physician and surgical trainee self-rated intraoperative technical skill using the Septoplasty Global Assessment Tool (SGAT) and visual analog scales. Daytime sleepiness reported by surgical trainees was measured using the Epworth Sleepiness Scale (ESS). RESULTS: Of 19 surgical trainees, 17 resident physicians (9 female [53%]) and 2 facial plastic surgery fellowship physicians (1 female and 1 male) performed a median of 3.00 septoplasty procedures (range, 1-9 procedures) under supervision by an attending physician. Of the 19 surgical trainees, 10 (53%) were aged 25 to 30 years and 9 (47%) were 31 years or older. The mean ESS score overall was 6.74 (95% CI, 5.96-7.52), and this score did not differ between female and male trainees. The mean ESS score was 7.57 (95% CI, 6.58-8.56) in trainees aged 25 to 30 years and 5.44 (95% CI, 4.32-6.57) in trainees aged 31 years or older. In regression models adjusted for sex, age, postgraduate year, and technical complexity of the procedure, there was a statistically significant inverse association between ESS scores and attending physician-rated technical skill for both SGAT (-0.41; 95% CI, -0.55 to -0.27; P < .001) and the visual analog scale (-0.75; 95% CI, -1.40 to -0.07; P = .03). The association between ESS scores and technical skill was not statistically significant for trainee self-rated SGAT (0.04; 95% CI, -0.17 to 0.24; P = .73) and the self-rated visual analog scale (0.19; 95% CI, -0.79 to 1.2; P = .70). CONCLUSIONS AND RELEVANCE: The findings suggest that daytime sleepiness of surgical trainees is inversely associated with attending physician-rated intraoperative technical skill when performing septoplasty. Thus, surgical trainees' ability to learn technical skill in the operating room may be influenced by their daytime sleepiness. LEVEL OF EVIDENCE: NA.


Subject(s)
Clinical Competence , Disorders of Excessive Somnolence/complications , Internship and Residency , Rhinoplasty , Adult , Female , Humans , Male , Nasal Septum/surgery , Prospective Studies
11.
IEEE Trans Biomed Eng ; 64(9): 2025-2041, 2017 09.
Article in English | MEDLINE | ID: mdl-28060703

ABSTRACT

OBJECTIVE: State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. METHODS: In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. RESULTS: Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. CONCLUSION: Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. SIGNIFICANCE: The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.


Subject(s)
Clinical Competence/statistics & numerical data , Clinical Competence/standards , Gestures , Imaging, Three-Dimensional/statistics & numerical data , Imaging, Three-Dimensional/standards , Robotic Surgical Procedures/statistics & numerical data , Robotic Surgical Procedures/standards , Benchmarking/methods , Benchmarking/standards , Databases, Factual , Humans , Pattern Recognition, Automated/methods , United States
12.
PLoS One ; 11(3): e0149174, 2016.
Article in English | MEDLINE | ID: mdl-26950551

ABSTRACT

BACKGROUND: Surgical tasks are performed in a sequence of steps, and technical skill evaluation includes assessing task flow efficiency. Our objective was to describe differences in task flow for expert and novice surgeons for a basic surgical task. METHODS: We used a hierarchical semantic vocabulary to decompose and annotate maneuvers and gestures for 135 instances of a surgeon's knot performed by 18 surgeons. We compared counts of maneuvers and gestures, and analyzed task flow by skill level. RESULTS: Experts used fewer gestures to perform the task (26.29; 95% CI = 25.21 to 27.38 for experts vs. 31.30; 95% CI = 29.05 to 33.55 for novices) and made fewer errors in gestures than novices (1.00; 95% CI = 0.61 to 1.39 vs. 2.84; 95% CI = 2.3 to 3.37). Transitions among maneuvers, and among gestures within each maneuver for expert trials were more predictable than novice trials. CONCLUSIONS: Activity segments and state flow transitions within a basic surgical task differ by surgical skill level, and can be used to provide targeted feedback to surgical trainees.


Subject(s)
Clinical Competence , Suture Techniques , Medical Errors , Surgeons
13.
J Surg Educ ; 73(3): 482-9, 2016.
Article in English | MEDLINE | ID: mdl-26896147

ABSTRACT

OBJECTIVE: Task-level metrics of time and motion efficiency are valid measures of surgical technical skill. Metrics may be computed for segments (maneuvers and gestures) within a task after hierarchical task decomposition. Our objective was to compare task-level and segment (maneuver and gesture)-level metrics for surgical technical skill assessment. DESIGN: Our analyses include predictive modeling using data from a prospective cohort study. We used a hierarchical semantic vocabulary to segment a simple surgical task of passing a needle across an incision and tying a surgeon's knot into maneuvers and gestures. We computed time, path length, and movements for the task, maneuvers, and gestures using tool motion data. We fit logistic regression models to predict experience-based skill using the quantitative metrics. We compared the area under a receiver operating characteristic curve (AUC) for task-level, maneuver-level, and gesture-level models. SETTING: Robotic surgical skills training laboratory. PARTICIPANTS: In total, 4 faculty surgeons with experience in robotic surgery and 14 trainee surgeons with no or minimal experience in robotic surgery. RESULTS: Experts performed the task in shorter time (49.74s; 95% CI = 43.27-56.21 vs. 81.97; 95% CI = 69.71-94.22), with shorter path length (1.63m; 95% CI = 1.49-1.76 vs. 2.23; 95% CI = 1.91-2.56), and with fewer movements (429.25; 95% CI = 383.80-474.70 vs. 728.69; 95% CI = 631.84-825.54) than novices. Experts differed from novices on metrics for individual maneuvers and gestures. The AUCs were 0.79; 95% CI = 0.62-0.97 for task-level models, 0.78; 95% CI = 0.6-0.96 for maneuver-level models, and 0.7; 95% CI = 0.44-0.97 for gesture-level models. There was no statistically significant difference in AUC between task-level and maneuver-level (p = 0.7) or gesture-level models (p = 0.17). CONCLUSIONS: Maneuver-level and gesture-level metrics are discriminative of surgical skill and can be used to provide targeted feedback to surgical trainees.


Subject(s)
Clinical Competence , Robotic Surgical Procedures/education , Robotic Surgical Procedures/standards , Suture Techniques/education , Time and Motion Studies , Adult , Female , Humans , Male , Prospective Studies
14.
Int J Comput Assist Radiol Surg ; 10(6): 981-91, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25895080

ABSTRACT

PURPOSE: Previous work on surgical skill assessment using intraoperative tool motion has focused on highly structured surgical tasks such as cholecystectomy and used generic motion metrics such as time and number of movements. Other statistical methods such as hidden Markov models (HMM) and descriptive curve coding (DCC) have been successfully used to assess skill in structured activities on bench-top tasks. Methods to assess skill and provide effective feedback to trainees for unstructured surgical tasks in the operating room, such as tissue dissection in septoplasty, have yet to be developed. METHODS: We proposed a method that provides a descriptive structure for septoplasty by automatically segmenting it into higher-level meaningful activities called strokes. These activities characterize the surgeon's tool motion pattern. We constructed a spatial graph from the sequence of strokes in each procedure and used its properties to train a classifier to distinguish between expert and novice surgeons. We compared the results from our method with those from HMM, DCC, and generic metric-based approaches. RESULTS: We showed that our method--with an average accuracy of 91 %--performs better or equal than these state-of-the-art methods, while simultaneously providing surgeons with an intuitive understanding of the procedure. CONCLUSIONS: In this study, we developed and evaluated an automated approach to objectively assess surgical skill during unstructured task of tissue dissection in nasal septoplasty.


Subject(s)
Clinical Competence , Feedback , Nasal Obstruction/surgery , Nasal Septum/surgery , Nasal Surgical Procedures/methods , Biomechanical Phenomena , Humans , Operating Rooms
15.
Article in English | MEDLINE | ID: mdl-24505645

ABSTRACT

The growing availability of data from robotic and laparoscopic surgery has created new opportunities to investigate the modeling and assessment of surgical technical performance and skill. However, previously published methods for modeling and assessment have not proven to scale well to large and diverse data sets. In this paper, we describe a new approach for simultaneous detection of gestures and skill that can be generalized to different surgical tasks. It consists of two parts: (1) descriptive curve coding (DCC), which transforms the surgical tool motion trajectory into a coded string using accumulated Frenet frames, and (2) common string model (CSM), a classification model using a similarity metric computed from longest common string motifs. We apply DCC-CSM method to detect surgical gestures and skill levels in two kinematic datasets (collected from the da Vinci surgical robot). DCC-CSM method classifies gestures and skill with 87.81% and 91.12% accuracy, respectively.


Subject(s)
Arm/physiology , Gestures , Man-Machine Systems , Movement/physiology , Pattern Recognition, Automated/methods , Robotics/methods , Surgery, Computer-Assisted/methods , Humans , Motion , Reproducibility of Results , Sensitivity and Specificity
16.
Int Forum Allergy Rhinol ; 2(6): 507-15, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22696449

ABSTRACT

BACKGROUND: Assessment of surgical skill plays a crucial role in determining competency, monitoring educational programs, and providing trainee feedback. With the changing health care environment, it will likely play an important role in credentialing and maintenance of certification. The ideal skill assessment tool should be unbiased, objective, and accurate. We hypothesize that tool-motion data-how a surgeon moves his/her instruments-and eye-gaze data-what a surgeon looks at when he/she operates-contain sufficient information to quantitatively and objectively evaluate surgical skill. We investigate this hypothesis by developing a statistical model of surgery and testing the model experimentally in the context of endoscopic sinus surgery (ESS). METHODS: A total of 378 trials were recorded from 7 expert and 13 novice surgeons while they were performing a series of 9 different ESS tasks. Data was collected using an electromagnetic tracker to record the surgeon's tool and endoscope motions. In addition, the location of surgeon's eye gaze was recorded using an infrared eye tracker camera. This data was fit to the statistical model and used to test the accuracy of skill assessment. RESULTS: The skill of expert surgeons was identified correctly for 94.6% of tasks. For surgeries performed by novice surgeons the proposed model properly recognizes the skill level with 88.6% accuracy. CONCLUSION: We present an objective and unbiased method for assessing the skill of endoscopic sinus surgeons. Experimental results show that the proposed method successfully identifies the skill levels of both expert and novice surgeons.


Subject(s)
Clinical Competence/standards , Endoscopy/standards , Otolaryngology/standards , Paranasal Sinuses/surgery , Algorithms , Cadaver , Endoscopy/education , Eye Movement Measurements , Eye Movements , Humans , Otolaryngology/education , Sensitivity and Specificity
17.
Article in English | MEDLINE | ID: mdl-23285585

ABSTRACT

We observe that expert surgeons performing MIS learn to minimize their tool path length and avoid collisions with vital structures. We thus conjecture that an expert surgeon's tool paths can be predicted by minimizing an appropriate energy function. We hypothesize that this reference path will be closer to an expert with greater skill, as measured by an objective measurement instrument such as objective structured assessment of technical skill (OSATS). To test this hypothesis, we have developed a surgical path planner (SPP) for functional endoscopic sinus surgery (FESS). We measure the similarity between an automatically generated reference path and surgical motions of subjects. We also develop a complementary similarity metric by translating tool motion to a coordinate-independent coding of motion, which we call the descriptive curve coding (DCC) method. We evaluate our methods on surgical motions recorded from FESS training tasks. The results show that the SPP reference path predicts the OSATS scores with 88% accuracy. We also show that motions coded with DCC predict OSATS scores with 90% accuracy. Finally, the combination of SPP and DCC identifies surgical skill with 93% accuracy.


Subject(s)
General Surgery/education , Minimally Invasive Surgical Procedures/methods , Nasal Cavity/surgery , Robotics , Surgery, Computer-Assisted/education , Algorithms , Automation , Calibration , Clinical Competence , Electronic Data Processing , Endoscopy/methods , Humans , Probability , ROC Curve , Reproducibility of Results , Support Vector Machine , Tomography, X-Ray Computed/methods
18.
Med Image Comput Comput Assist Interv ; 13(Pt 3): 295-302, 2010.
Article in English | MEDLINE | ID: mdl-20879412

ABSTRACT

In the context of minimally invasive surgery, clinical risks are highly associated with surgeons' skill in manipulating surgical tools and their knowledge of the closed anatomy. A quantitative surgical skill assessment can reduce faulty procedures and prevent some surgical risks. In this paper focusing on sinus surgery, we present two methods to identify both skill level and task type by recording motion data of surgical tools as well as the surgeon's eye gaze location on the screen. We generate a total of 14 discrete Hidden Markov Models for seven surgical tasks at both expert and novice levels using a repeated k-fold evaluation method. The dataset contains 95 expert and 139 novice trials of surgery over a cadaver. The results reveal two insights: eye-gaze data contains skill related structures; and adding this info to the surgical tool motion data improves skill assessment by 13.2% and 5.3% for expert and novice levels, respectively. The proposed system quantifies surgeon's skill level with an accuracy of 82.5% and surgical task type of 77.8%.


Subject(s)
Clinical Competence , Eye Movements , Image Interpretation, Computer-Assisted/methods , Minimally Invasive Surgical Procedures/instrumentation , Minimally Invasive Surgical Procedures/methods , Surgical Instruments , Task Performance and Analysis , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...