Search | VHL Regional Portal

1.

The Tool for Automatic Measurement of Morphological Information (TAMMI).

Crossley, Scott A; Tywoniw, Rurik; Choi, Joon Suh.

Behav Res Methods ; 2023 Dec 29.

Article in English | MEDLINE | ID: mdl-38158554

ABSTRACT

This study documents and assesses the Tool for Automatic Measurement of Morphological Information (TAMMI), which calculates measures related to basic morpheme counts, morphological variety, morphological complexity, morpheme type-token counts, and variables found in the MorphoLex database (Sánchez-Gutiérrez et al., 2018) including morpheme frequency/length, morpheme family size counts and frequency, and morpheme hapax counts. These measures are assessed in two studies that include a word frequency measure as a control variable. The first study examined links between morphological variables and judgements of reading ease in a corpus of ~ 5000 reading excerpts, finding that variables related to derivational variety, word frequency, affix frequency, and morpheme counts explained 40% of the variance in the reading scores. The second examined links between morphological variables and human assessments of vocabulary proficiency in a corpus of ~ 7000 essays written by English-language learners (ELLs), finding that the number of morphemes, morpheme variety, and the number of roots explained 21% of the variance in the human assessments.

2.

Lexical and phraseological differences between second language written and spoken opinion responses.

Kim, Minkyung; Crossley, Scott A.

Front Psychol ; 14: 1068685, 2023.

Article in English | MEDLINE | ID: mdl-36939413

ABSTRACT

This study examines differences in lexical and phraseological complexity features between second language (L2) written and spoken opinion responses via classification analysis. The study further examines the characteristics of L2 written and spoken responses that were misclassified in terms of lexical and phraseological differences, L2 learners' vocabulary knowledge, and raters' judgments of L2 use. The goal is to more thoroughly explore potential differences in lexical and phraseological production based on modality. The results indicated that L2 written responses tended to elicit greater lexical and phraseological complexity. The results also indicated that crossing the boundaries from L2 spoken to written (i.e., the use of less lexical and phraseological complexity) was related to lower levels of L2 vocabulary knowledge and tended to be penalized by raters in terms of L2 use. In contrast, crossing the boundaries from L2 written output to spoken (i.e., the use of greater lexical and phraseological complexity) was acceptable in terms of L2 use. Overall, this study highlights lexical and phraseological differences and the importance of the use of greater lexical and phraseological complexity in a modality-insensitive manner in L2 opinion-giving responses.

3.

A large-scaled corpus for assessing text readability.

Crossley, Scott; Heintz, Aron; Choi, Joon Suh; Batchelor, Jordan; Karimi, Mehrnoush; Malatinszky, Agnes.

Behav Res Methods ; 55(2): 491-507, 2023 02.

Article in English | MEDLINE | ID: mdl-35297016

ABSTRACT

This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt's year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse processing and reading with a resource from which to develop and test readability metrics and to model text readability. The CLEAR corpus includes a number of improvements in comparison to previous readability corpora including size, breadth of the excerpts available, which cover over 250 years of writing in two different genres, and unique readability criterion provided for each text based on teachers' ratings of text difficulty for student readers. This paper discusses the development of the corpus and presents reliability metrics for the human ratings of readability.

Subject(s)

Comprehension , Reading , Humans , Reproducibility of Results , Writing , Publishing

4.

Do Struggling Adult Readers Monitor Their Reading? Understanding the Role of Online and Offline Comprehension Monitoring Processes During Reading.

Tighe, Elizabeth L; Kaldes, Gal; Talwar, Amani; Crossley, Scott A; Greenberg, Daphne; Skalicky, Stephen.

J Learn Disabil ; 56(1): 25-42, 2023.

Article in English | MEDLINE | ID: mdl-35321590

ABSTRACT

Comprehension monitoring is a meta-cognitive skill that is defined as the ability to self-evaluate one's comprehension of text. Although it is known that struggling adult readers are poor at monitoring their comprehension, additional research is needed to understand the mechanisms underlying comprehension monitoring and their role in reading comprehension in this population. This study used a comprehension monitoring task with struggling adult readers, which included online eye movements (reread and regression path durations) and an offline verbal protocol (oral explanations of key information). We examined whether eye movements predicted accuracy on the passages' reading comprehension questions, a norm-referenced reading assessment, and an offline verbal protocol after controlling for age and traditional component skills (i.e., decoding, oral language, working memory). Regression path duration uniquely predicted accuracy on the questions; however, decoding and oral vocabulary were the most salient predictors of the norm-referenced reading comprehension measure. Regression path duration also predicted the offline verbal protocol, such that those who exhibited longer regression path duration were also better at explaining key information. These results contribute to the literature regarding struggling adults' reading component skills, eye movement behaviors involved in processing connected text, and future considerations in assessing comprehension monitoring.

Subject(s)

Reading , Adult , Humans

5.

The persuasive essays for rating, selecting, and understanding argumentative and discourse elements (PERSUADE) corpus 1.0.

Crossley, Scott A; Baffour, Perpetual; Tian, Yu; Picou, Aigner; Benner, Meg; Boser, Ulrich.

Assess Writ ; 54: None, 2022 Oct.

Article in English | MEDLINE | ID: mdl-36570517

ABSTRACT

This paper introduces the Persuasive Essays for Rating, Selecting, and Understanding Argumentative and Discourse Elements (PERSUADE) corpus.The PERSUADE corpus is large-scale corpus of writing with annotated discourse elements. The goal of the corpus is to spur the development of new, open-source scoring algorithms that identify discourse elements in argumentative writing to open new avenues for the development of automatic writing evaluation systems that focus more specifically on the semantic and organizational elements of student writing.

6.

Age of Exposure 2.0: Estimating word complexity using iterative models of word embeddings.

Botarleanu, Robert-Mihai; Dascalu, Mihai; Watanabe, Micah; Crossley, Scott Andrew; McNamara, Danielle S.

Behav Res Methods ; 54(6): 3015-3042, 2022 12.

Article in English | MEDLINE | ID: mdl-35167112

ABSTRACT

Age of acquisition (AoA) is a measure of word complexity which refers to the age at which a word is typically learned. AoA measures have shown strong correlations with reading comprehension, lexical decision times, and writing quality. AoA scores based on both adult and child data have limitations that allow for error in measurement, and increase the cost and effort to produce. In this paper, we introduce Age of Exposure (AoE) version 2, a proxy for human exposure to new vocabulary terms that expands AoA word lists through training regressors to predict AoA scores. Word2vec word embeddings are trained on cumulatively increasing corpora of texts, word exposure trajectories are generated by aligning the word2vec vector spaces, and features of words are derived for modeling AoA scores. Our prediction models achieve low errors (from 13% with a corresponding R2 of .35 up to 7% with an R2 of .74), can be uniformly applied to different AoA word lists, and generalize to the entire vocabulary of a language. Our method benefits from using existing readability indices to define the order of texts in the corpora, while the performed analyses confirm that the generated AoA scores accurately predicted the difficulty of texts (R2 of .84, surpassing related previous work). Further, we provide evidence of the internal reliability of our word trajectory features, demonstrate the effectiveness of the word trajectory features when contrasted with simple lexical features, and show that the exclusion of features that rely on external resources does not significantly impact performance.

Subject(s)

Language , Vocabulary , Child , Humans , Reproducibility of Results

7.

Precision communication: Physicians' linguistic adaptation to patients' health literacy.

Schillinger, Dean; Duran, Nicholas D; McNamara, Danielle S; Crossley, Scott A; Balyan, Renu; Karter, Andrew J.

Sci Adv ; 7(51): eabj2836, 2021 Dec 17.

Article in English | MEDLINE | ID: mdl-34919437

ABSTRACT

Little quantitative research has explored which clinician skills and behaviors facilitate communication. Mutual understanding is especially challenging when patients have limited health literacy (HL). Two strategies hypothesized to improve communication include matching the complexity of language to patients' HL ("universal tailoring"); or always using simple language ("universal precautions"). Through computational linguistic analysis of 237,126 email exchanges between dyads of 1094 physicians and 4331 English-speaking patients, we assessed matching (concordance/discordance) between physicians' linguistic complexity and patients' HL, and classified physicians' communication strategies. Among low HL patients, discordance was associated with poor understanding (P = 0.046). Physicians' "universal tailoring" strategy was associated with better understanding for all patients (P = 0.01), while "universal precautions" was not. There was an interaction between concordance and communication strategy (P = 0.021): The combination of dyadic concordance and "universal tailoring" eliminated HL-related disparities. Physicians' ability to adapt communication to match their patients' HL promotes shared understanding and equity. The 'Precision Medicine' construct should be expanded to include the domain of 'Precision Communication.'

8.

Validity of a Computational Linguistics-Derived Automated Health Literacy Measure Across Race/Ethnicity: Findings from The ECLIPPSE Project.

Schillinger, Dean; Balyan, Renu; Crossley, Scott; McNamara, Danielle; Karter, Andrew.

J Health Care Poor Underserved ; 32(2 Suppl): 347-365, 2021 05.

Article in English | MEDLINE | ID: mdl-36101652

ABSTRACT

Limited health literacy (HL) partially mediates health disparities. Measurement constraints, including lack of validity assessment across racial/ethnic groups and administration challenges, have undermined the field and impeded scaling of HL interventions. We employed computational linguistics to develop an automated and novel HL measure, analyzing >300,000 messages sent by >9,000 diabetes patients via a patient portal to create a Literacy Profiles. We carried out stratified analyses among White/non-Hispanics, Black/non-Hispanics, Hispanics, and Asian/Pacific Islanders to determine if the Literacy Profile has comparable criterion and predictive validities. We discovered that criterion validity was consistently high across all groups (c-statistics 0.82-0.89). We observed consistent relationships across racial/ethnic groups between HL and outcomes, including communication, adherence, hypoglycemia, diabetes control, and ED utilization. While concerns have arisen regarding bias in AI, the automated Literacy Profile appears sufficiently valid across race/ethnicity, enabling HL measurement at a scale that could improve clinical care and population health among diverse populations.

Subject(s)

Diabetes Mellitus , Health Literacy , Diabetes Mellitus/therapy , Ethnicity , Humans , Linguistics , Racial Groups

9.

Challenges and solutions to employing natural language processing and machine learning to measure patients' health literacy and physician writing complexity: The ECLIPPSE study.

Brown, William; Balyan, Renu; Karter, Andrew J; Crossley, Scott; Semere, Wagahta; Duran, Nicholas D; Lyles, Courtney; Liu, Jennifer; Moffet, Howard H; Daniels, Ryane; McNamara, Danielle S; Schillinger, Dean.

J Biomed Inform ; 113: 103658, 2021 01.

Article in English | MEDLINE | ID: mdl-33316421

ABSTRACT

OBJECTIVE: In the National Library of Medicine funded ECLIPPSE Project (Employing Computational Linguistics to Improve Patient-Provider Secure Emails exchange), we attempted to create novel, valid, and scalable measures of both patients' health literacy (HL) and physicians' linguistic complexity by employing natural language processing (NLP) techniques and machine learning (ML). We applied these techniques to > 400,000 patients' and physicians' secure messages (SMs) exchanged via an electronic patient portal, developing and validating an automated patient literacy profile (LP) and physician complexity profile (CP). Herein, we describe the challenges faced and the solutions implemented during this innovative endeavor. MATERIALS AND METHODS: To describe challenges and solutions, we used two data sources: study documents and interviews with study investigators. Over the five years of the project, the team tracked their research process using a combination of Google Docs tools and an online team organization, tracking, and management tool (Asana). In year 5, the team convened a number of times to discuss, categorize, and code primary challenges and solutions. RESULTS: We identified 23 challenges and associated approaches that emerged from three overarching process domains: (1) Data Mining related to the SM corpus; (2) Analyses using NLP indices on the SM corpus; and (3) Interdisciplinary Collaboration. With respect to Data Mining, problems included cleaning SMs to enable analyses, removing hidden caregiver proxies (e.g., other family members) and Spanish language SMs, and culling SMs to ensure that only patients' primary care physicians were included. With respect to Analyses, critical decisions needed to be made as to which computational linguistic indices and ML approaches should be selected; how to enable the NLP-based linguistic indices tools to run smoothly and to extract meaningful data from a large corpus of medical text; and how to best assess content and predictive validities of both the LP and the CP. With respect to the Interdisciplinary Collaboration, because the research required engagement between clinicians, health services researchers, biomedical informaticians, linguists, and cognitive scientists, continual effort was needed to identify and reconcile differences in scientific terminologies and resolve confusion; arrive at common understanding of tasks that needed to be completed and priorities therein; reach compromises regarding what represents "meaningful findings" in health services vs. cognitive science research; and address constraints regarding potential transportability of the final LP and CP to different health care settings. DISCUSSION: Our study represents a process evaluation of an innovative research initiative to harness "big linguistic data" to estimate patient HL and physician linguistic complexity. Any of the challenges we identified, if left unaddressed, would have either rendered impossible the effort to generate LPs and CPs, or invalidated analytic results related to the LPs and CPs. Investigators undertaking similar research in HL or using computational linguistic methods to assess patient-clinician exchange will face similar challenges and may find our solutions helpful when designing and executing their health communications research.

Subject(s)

Health Literacy , Physicians , Humans , Machine Learning , Natural Language Processing , Writing

10.

Developing and Testing Automatic Models of Patient Communicative Health Literacy Using Linguistic Features: Findings from the ECLIPPSE study.

Crossley, Scott A; Balyan, Renu; Liu, Jennifer; Karter, Andrew J; McNamara, Danielle; Schillinger, Dean.

Health Commun ; 36(8): 1018-1028, 2021 07.

Article in English | MEDLINE | ID: mdl-32114833

ABSTRACT

Patients with diabetes and limited health literacy (HL) may have suboptimal communication exchange with their health care providers and be at elevated risk of adverse health outcomes. These difficulties are generally attributed to patients' reduced ability to both communicate and understand health-related ideas as well as physicians' lack of skill in identifying those with limited HL. Understanding and identifying patients with barriers posed by lower HL to improve healthcare delivery and outcomes is an important research avenue. However, doing so using traditional methods has proven difficult and infeasible to scale. This study using corpus analyses, expert human ratings of HL, and natural language processing (NLP) approaches to estimate HL at the individual patient level. The goal of the study is to better understand HL from a linguistic perspective and to open new research areas to enhance population management and individualized care. Specifically, this study examines HL as a function of patients' demonstrated ability to communicate health-related information to their providers via secure messages. The study develops an NLP-based HL model and validates the model by predicting patient-related events such as medical outcomes and hospitalizations. Results indicate that the developed model predicts human ratings of HL with ~80% accuracy. Validation indicates that lower HL patients are more likely to be nonwhite and have lower educational attainment. In addition, patients with lower HL suffered more negative health outcomes and had higher healthcare service utilization.

Subject(s)

Health Literacy , Communication , Delivery of Health Care , Health Personnel , Humans , Linguistics

11.

Descriptive examination of secure messaging in a longitudinal cohort of diabetes patients in the ECLIPPSE study.

Cemballi, Anupama Gunshekar; Karter, Andrew J; Schillinger, Dean; Liu, Jennifer Y; McNamara, Danielle S; Brown, William; Crossley, Scott; Semere, Wagahta; Reed, Mary; Allen, Jill; Lyles, Courtney Rees.

J Am Med Inform Assoc ; 28(6): 1252-1258, 2021 06 12.

Article in English | MEDLINE | ID: mdl-33236117

ABSTRACT

The substantial expansion of secure messaging (SM) via the patient portal in the last decade suggests that it is becoming a standard of care, but few have examined SM use longitudinally. We examined SM patterns among a diverse cohort of patients with diabetes (N = 19 921) and the providers they exchanged messages with within a large, integrated health system over 10 years (2006-2015), linking patient demographics to SM use. We found a 10-fold increase in messaging volume. There were dramatic increases overall and for patient subgroups, with a majority of patients (including patients with lower income or with self-reported limited health literacy) messaging by 2015. Although more physicians than nurses and other providers messaged throughout the study, the distribution of health professions using SM changed over time. Given this rapid increase in SM, deeper understanding of optimizing the value of patient and provider engagement, while managing workflow and training challenges, is crucial.

Subject(s)

Diabetes Mellitus , Health Literacy , Patient Portals , Cohort Studies , Electronic Mail , Humans

12.

Employing computational linguistics techniques to identify limited patient health literacy: Findings from the ECLIPPSE study.

Schillinger, Dean; Balyan, Renu; Crossley, Scott A; McNamara, Danielle S; Liu, Jennifer Y; Karter, Andrew J.

Health Serv Res ; 56(1): 132-144, 2021 02.

Article in English | MEDLINE | ID: mdl-32966630

ABSTRACT

OBJECTIVE: To develop novel, scalable, and valid literacy profiles for identifying limited health literacy patients by harnessing natural language processing. DATA SOURCE: With respect to the linguistic content, we analyzed 283 216 secure messages sent by 6941 diabetes patients to physicians within an integrated system's electronic portal. Sociodemographic, clinical, and utilization data were obtained via questionnaire and electronic health records. STUDY DESIGN: Retrospective study used natural language processing and machine learning to generate five unique "Literacy Profiles" by employing various sets of linguistic indices: Flesch-Kincaid (LP_FK); basic indices of writing complexity, including lexical diversity (LP_LD) and writing quality (LP_WQ); and advanced indices related to syntactic complexity, lexical sophistication, and diversity, modeled from self-reported (LP_SR), and expert-rated (LP_Exp) health literacy. We first determined the performance of each literacy profile relative to self-reported and expert-rated health literacy to discriminate between high and low health literacy and then assessed Literacy Profiles' relationships with known correlates of health literacy, such as patient sociodemographics and a range of health-related outcomes, including ratings of physician communication, medication adherence, diabetes control, comorbidities, and utilization. PRINCIPAL FINDINGS: LP_SR and LP_Exp performed best in discriminating between high and low self-reported (C-statistics: 0.86 and 0.58, respectively) and expert-rated health literacy (C-statistics: 0.71 and 0.87, respectively) and were significantly associated with educational attainment, race/ethnicity, Consumer Assessment of Provider and Systems (CAHPS) scores, adherence, glycemia, comorbidities, and emergency department visits. CONCLUSIONS: Since health literacy is a potentially remediable explanatory factor in health care disparities, the development of automated health literacy indicators represents a significant accomplishment with broad clinical and population health applications. Health systems could apply literacy profiles to efficiently determine whether quality of care and outcomes vary by patient health literacy; identify at-risk populations for targeting tailored health communications and self-management support interventions; and inform clinicians to promote improvements in individual-level care.

Subject(s)

Health Literacy/methods , Patient Education as Topic/methods , Process Assessment, Health Care/methods , Diabetes Mellitus/therapy , Electronic Health Records/statistics & numerical data , Humans , Natural Language Processing , Physician-Patient Relations , Retrospective Studies

13.

Predicting the readability of physicians' secure messages to improve health communication using novel linguistic features: Findings from the ECLIPPSE study.

Crossley, Scott A; Balyan, Renu; Liu, Jennifer; Karter, Andrew J; McNamara, Danielle; Schillinger, Dean.

J Commun Healthc ; 13(4): 1-13, 2020.

Article in English | MEDLINE | ID: mdl-34306181

ABSTRACT

BACKGROUND: Low literacy skills impact important aspects of communication, including health-related information exchanges. Unsuccessful communication on the part of physician or patient contributes to lower quality of care, is associated with poorer chronic disease control, jeopardizes patient safety and can lead to unfavorable healthcare utilization patterns. To date, very little research has focused on digital communication between physicians and patients, such as secure messages sent via electronic patient portals. METHOD: The purpose of the current study is to develop an automated readability formula to better understand what elements of physicians' digital messages make them more or less difficult to understand. The formula is developed using advanced natural language processing (NLP) to predict human ratings of physician text difficulty. RESULTS: The results indicate that NLP indices that capture a diverse set of linguistic features predict the difficulty of physician messages better than classic readability tools such as Flesch Kincaid Grade Level. Our results also provide information about the textual features that best explain text readability. CONCLUSION: Implications for how the readability formula could provide feedback to physicians to improve digital health communication by promoting linguistic concordance between physician and patient are discussed.

14.

Secure Messaging with Physicians by Proxies for Patients with Diabetes: Findings from the ECLIPPSE Study.

Semere, Wagahta; Crossley, Scott; Karter, Andrew J; Lyles, Courtney R; Brown, William; Reed, Mary; McNamara, Danielle S; Liu, Jennifer Y; Schillinger, Dean.

J Gen Intern Med ; 34(11): 2490-2496, 2019 11.

Article in English | MEDLINE | ID: mdl-31428986

ABSTRACT

BACKGROUND: Little is known about patients who have caregiver proxies communicate with healthcare providers via portal secure messaging (SM). Since proxy portal use is often informal (e.g., sharing patient accounts), novel methods are needed to estimate the prevalence of proxy-authored SMs. OBJECTIVE: (1) Develop an algorithm to identify proxy-authored SMs, (2) apply this algorithm to estimate predicted proxy SM (PPSM) prevalence among patients with diabetes, and (3) explore patient characteristics associated with having PPSMs. DESIGN: Retrospective cohort study. PARTICIPANTS: We examined 9856 patients from Diabetes Study of Northern California (DISTANCE) who sent ≥ 1 English-language SM to their primary care physician between July 1, 2006, and Dec. 31, 2015. MAIN MEASURES: Using computational linguistics, we developed ProxyID, an algorithm that identifies phrases frequently found in registered proxy SMs. ProxyID was validated against blinded expert categorization of proxy status among an SM sample, then applied to identify PPSM prevalence across patients. We examined patients' sociodemographic and clinical characteristics according to PPSM penetrance, "none" (0%), "low" (≥ 0-50%), and "high" (≥ 50-100%). KEY RESULTS: Only 2.3% of patients had ≥ 1 registered proxy-authored SM. ProxyID demonstrated moderate agreement with expert classification (Κ = 0.58); 45.7% of patients had PPSMs (40.2% low and 5.5% high). Patients with high percent PPSMs were older than those with low percent and no PPSMs (66.5 vs 57.4 vs 56.2 years, p < 0.001) had higher rates of limited English proficiency (16.1% vs 3.2% vs 3.5%, p < 0.05), lower self-reported health literacy (3.83 vs 4.43 vs 4.44, p < 0.001), and more comorbidities (Charlson index 3.78 vs 2.35 vs 2.18, p < 0.001). CONCLUSIONS: Among patients with diabetes, informal proxy SM use is more common than registered use and prevalent among socially and medically vulnerable patients. Future research should explore whether proxy portal use improves patient and/or caregiver outcomes and consider policies that integrate caregivers in portal communication.

Subject(s)

Caregivers/statistics & numerical data , Diabetes Mellitus, Type 2/therapy , Electronic Mail/statistics & numerical data , Physician-Patient Relations , Adult , Aged , Confidentiality , Female , Humans , Male , Middle Aged , Proxy , Retrospective Studies

15.

Using natural language processing and machine learning to classify health literacy from secure messages: The ECLIPPSE study.

Balyan, Renu; Crossley, Scott A; Brown, William; Karter, Andrew J; McNamara, Danielle S; Liu, Jennifer Y; Lyles, Courtney R; Schillinger, Dean.

PLoS One ; 14(2): e0212488, 2019.

Article in English | MEDLINE | ID: mdl-30794616

ABSTRACT

Limited health literacy is a barrier to optimal healthcare delivery and outcomes. Current measures requiring patients to self-report limitations are time-consuming and may be considered intrusive by some. This makes widespread classification of patient health literacy challenging. The objective of this study was to develop and validate "literacy profiles" as automated indicators of patients' health literacy to facilitate a non-intrusive, economic and more comprehensive characterization of health literacy among a health care delivery system's membership. To this end, three literacy profiles were generated based on natural language processing (combining computational linguistics and machine learning) using a sample of 283,216 secure messages sent from 6,941 patients to their primary care physicians. All patients were participants in Kaiser Permanente Northern California's DISTANCE Study. Performance of the three literacy profiles were compared against a gold standard of patient self-reported health literacy. Associations were analyzed between each literacy profile and patient demographics, health outcomes and healthcare utilization. T-tests were used for numeric data such as A1C, Charlson comorbidity index and healthcare utilization rates, and chi-square tests for categorical data such as sex, race, poor adherence and severe hypoglycemia. Literacy profiles varied in their test characteristics, with C-statistics ranging from 0.61-0.74. Relations between literacy profiles and health outcomes revealed patterns consistent with previous health literacy research: patients identified via literacy profiles indicative of limited health literacy: (a) were older and more likely of minority status; (b) had poorer medication adherence and glycemic control; and (c) exhibited higher rates of hypoglycemia, comorbidities and healthcare utilization. This represents the first successful attempt to employ natural language processing to estimate health literacy. Literacy profiles can offer an automated and economical way to identify patients with limited health literacy and greater vulnerability to poor health outcomes.

Subject(s)

Health Literacy/classification , Machine Learning , Natural Language Processing , California , Computer Security , Data Mining , Demography , Diabetes Mellitus/therapy , Electronic Mail , Female , Health Literacy/statistics & numerical data , Humans , Male , Physician-Patient Relations , Physicians, Primary Care

16.

The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap.

Crossley, Scott A; Kyle, Kristopher; Dascalu, Mihai.

Behav Res Methods ; 51(1): 14-27, 2019 02.

Article in English | MEDLINE | ID: mdl-30298264

ABSTRACT

This article introduces the second version of the Tool for the Automatic Analysis of Cohesion (TAACO 2.0). Like its predecessor, TAACO 2.0 is a freely available text analysis tool that works on the Windows, Mac, and Linux operating systems; is housed on a user's hard drive; is easy to use; and allows for batch processing of text files. TAACO 2.0 includes all the original indices reported for TAACO 1.0, but it adds a number of new indices related to local and global cohesion at the semantic level, reported by latent semantic analysis, latent Dirichlet allocation, and word2vec. The tool also includes a source overlap feature, which calculates lexical and semantic overlap between a source and a response text (i.e., cohesion between the two texts based measures of text relatedness). In the first study in this article, we examined the effects that cohesion features, prompt, essay elaboration, and enhanced cohesion had on expert ratings of text coherence, finding that global semantic similarity as reported by word2vec was an important predictor of coherence ratings. A second study was conducted to examine the source and response indices. In this study we examined whether source overlap between the speaking samples found in the TOEFL-iBT integrated speaking tasks and the responses produced by test-takers was predictive of human ratings of speaking proficiency. The results indicated that the percentage of keywords found in both the source and response and the similarity between the source document and the response, as reported by word2vec, were significant predictors of speaking quality. Combined, these findings help validate the new indices reported for TAACO 2.0.

Subject(s)

Linguistics , Semantics , Software , Humans , Natural Language Processing

17.

The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0.

Kyle, Kristopher; Crossley, Scott; Berger, Cynthia.

Behav Res Methods ; 50(3): 1030-1046, 2018 06.

Article in English | MEDLINE | ID: mdl-28699123

ABSTRACT

This study introduces the second release of the Tool for the Automatic Analysis of Lexical Sophistication (TAALES 2.0), a freely available and easy-to-use text analysis tool. TAALES 2.0 is housed on a user's hard drive (allowing for secure data processing) and is available on most operating systems (Windows, Mac, and Linux). TAALES 2.0 adds 316 indices to the original tool. These indices are related to word frequency, word range, n-gram frequency, n-gram range, n-gram strength of association, contextual distinctiveness, word recognition norms, semantic network, and word neighbors. In this study, we validated TAALES 2.0 by investigating whether its indices could be used to model both holistic scores of lexical proficiency in free writes and word choice scores in narrative essays. The results indicated that the TAALES 2.0 indices could be used to explain 58% of the variance in lexical proficiency scores and 32% of the variance in word-choice scores. Newly added TAALES 2.0 indices, including those related to n-gram association strength, word neighborhood, and word recognition norms, featured heavily in these predictor models, suggesting that TAALES 2.0 represents a substantial upgrade.

Subject(s)

Natural Language Processing , Writing/standards , Electronic Data Processing/methods , Humans , Language Arts , Numerical Analysis, Computer-Assisted , Reproducibility of Results , Software Validation

18.

The Next Frontier in Communication and the ECLIPPSE Study: Bridging the Linguistic Divide in Secure Messaging.

Schillinger, Dean; McNamara, Danielle; Crossley, Scott; Lyles, Courtney; Moffet, Howard H; Sarkar, Urmimala; Duran, Nicholas; Allen, Jill; Liu, Jennifer; Oryn, Danielle; Ratanawongsa, Neda; Karter, Andrew J.

J Diabetes Res ; 2017: 1348242, 2017.

Article in English | MEDLINE | ID: mdl-28265579

ABSTRACT

Health systems are heavily promoting patient portals. However, limited health literacy (HL) can restrict online communication via secure messaging (SM) because patients' literacy skills must be sufficient to convey and comprehend content while clinicians must encourage and elicit communication from patients and match patients' literacy level. This paper describes the Employing Computational Linguistics to Improve Patient-Provider Secure Email (ECLIPPSE) study, an interdisciplinary effort bringing together scientists in communication, computational linguistics, and health services to employ computational linguistic methods to (1) create a novel Linguistic Complexity Profile (LCP) to characterize communications of patients and clinicians and demonstrate its validity and (2) examine whether providers accommodate communication needs of patients with limited HL by tailoring their SM responses. We will study >5 million SMs generated by >150,000 ethnically diverse type 2 diabetes patients and >9000 clinicians from two settings: an integrated delivery system and a public (safety net) system. Finally, we will then create an LCP-based automated aid that delivers real-time feedback to clinicians to reduce the linguistic complexity of their SMs. This research will support health systems' journeys to become health literate healthcare organizations and reduce HL-related disparities in diabetes care.

Subject(s)

Communication , Diabetes Mellitus, Type 2 , Electronic Health Records , Health Literacy , Physician-Patient Relations , Electronic Mail , Humans , Internet

19.

Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis.

Crossley, Scott A; Kyle, Kristopher; McNamara, Danielle S.

Behav Res Methods ; 49(3): 803-821, 2017 06.

Article in English | MEDLINE | ID: mdl-27193159

ABSTRACT

This study introduces the Sentiment Analysis and Cognition Engine (SEANCE), a freely available text analysis tool that is easy to use, works on most operating systems (Windows, Mac, Linux), is housed on a user's hard drive (as compared to being accessed via an Internet interface), allows for batch processing of text files, includes negation and part-of-speech (POS) features, and reports on thousands of lexical categories and 20 component scores related to sentiment, social cognition, and social order. In the study, we validated SEANCE by investigating whether its indices and related component scores can be used to classify positive and negative reviews in two well-known sentiment analysis test corpora. We contrasted the results of SEANCE with those from Linguistic Inquiry and Word Count (LIWC), a similar tool that is popular in sentiment analysis, but is pay-to-use and does not include negation or POS features. The results demonstrated that both the SEANCE indices and component scores outperformed LIWC on the categorization tasks.

Subject(s)

Cognition , Data Mining , Emotions , Software , Humans

20.

The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion.

Crossley, Scott A; Kyle, Kristopher; McNamara, Danielle S.

Behav Res Methods ; 48(4): 1227-1237, 2016 12.

Article in English | MEDLINE | ID: mdl-26416138

ABSTRACT

This study introduces the Tool for the Automatic Analysis of Cohesion (TAACO), a freely available text analysis tool that is easy to use, works on most operating systems (Windows, Mac, and Linux), is housed on a user's hard drive (rather than having an Internet interface), allows for the batch processing of text files, and incorporates over 150 classic and recently developed indices related to text cohesion. The study validates TAACO by investigating how its indices related to local, global, and overall text cohesion can predict expert judgments of text coherence and essay quality. The findings of this study provide predictive validation of TAACO and support the notion that expert judgments of text coherence and quality are either negatively correlated or not predicted by local and overall text cohesion indices, but are positively predicted by global indices of cohesion. Combined, these findings provide supporting evidence that coherence for expert raters is a property of global cohesion and not of local cohesion, and that expert ratings of text quality are positively related to global cohesion.

Subject(s)

Bibliometrics , Software , Humans , Judgment

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL