Search | VHL Regional Portal

Hate speech detection: Challenges and solutions.

MacAvaney, Sean; Yao, Hao-Ren; Yang, Eugene; Russell, Katina; Goharian, Nazli; Frieder, Ophir.

PLoS One ; 14(8): e0221152, 2019.

Article in English | MEDLINE | ID: mdl-31430308

ABSTRACT

As online content continues to grow, so does the spread of hate speech. We identify and examine challenges faced by online automatic approaches for hate speech detection in text. Among these difficulties are subtleties in language, differing definitions on what constitutes hate speech, and limitations of data availability for training and testing of these systems. Furthermore, many recent approaches suffer from an interpretability problem-that is, it can be difficult to understand why the systems make the decisions that they do. We propose a multi-view SVM approach that achieves near state-of-the-art performance, while being simpler and producing more easily interpretable decisions than neural methods. We also discuss both technical and practical challenges that remain for this task.

Subject(s)

Data Mining/methods , Hate , Language , Social Media , Violence/prevention & control , Datasets as Topic , Humans , Machine Learning , Violence/psychology

Coughing, sneezing, and aching online: Twitter and the volume of influenza-like illness in a pediatric hospital.

Hartley, David M; Giannini, Courtney M; Wilson, Stephanie; Frieder, Ophir; Margolis, Peter A; Kotagal, Uma R; White, Denise L; Connelly, Beverly L; Wheeler, Derek S; Tadesse, Dawit G; Macaluso, Maurizio.

PLoS One ; 12(7): e0182008, 2017.

Article in English | MEDLINE | ID: mdl-28753678

ABSTRACT

This study investigates the relation of the incidence of georeferenced tweets related to respiratory illness to the incidence of influenza-like illness (ILI) in the emergency department (ED) and urgent care clinics (UCCs) of a large pediatric hospital. We collected (1) tweets in English originating in our hospital's primary service area between 11/1/2014 and 5/1/2015 and containing one or more specific terms related to respiratory illness and (2) the daily number of patients presenting to our hospital's EDs and UCCs with ILI, as captured by ICD-9 codes. A Support Vector Machine classifier was applied to the set of tweets to remove those unlikely to be related to ILI. Time series of the pooled set of remaining tweets involving any term, of tweets involving individual terms, and of the ICD-9 data were constructed, and temporal cross-correlation between the social media and clinical data was computed. A statistically significant correlation (Spearman ρ = 0.23) between tweets involving the term flu and ED and UCC volume related to ILI 11 days in the future was observed. Tweets involving the terms coughing (Spearman ρ = 0.24) and headache (Spearman ρ = 0.19) individually were also significantly correlated to ILI-related clinical volume four and two days in the future, respectively. In the 2014-2015 cold and flu season, the incidence of local tweets containing the terms flu, coughing, and headache were early indicators of the incidence of ILI-related cases presenting to EDs and UCCs at our children's hospital.

Subject(s)

Cough , Pain , Sneezing , Social Media/statistics & numerical data , Disease Outbreaks/statistics & numerical data , Geographic Mapping , Hospitals, Pediatric/statistics & numerical data , Humans , Incidence

Metaphor identification in large texts corpora.

Neuman, Yair; Assaf, Dan; Cohen, Yohai; Last, Mark; Argamon, Shlomo; Howard, Newton; Frieder, Ophir.

PLoS One ; 8(4): e62343, 2013.

Article in English | MEDLINE | ID: mdl-23658625

ABSTRACT

Identifying metaphorical language-use (e.g., sweet child) is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms' performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus.

Subject(s)

Algorithms , Language , Metaphor , Natural Language Processing , Comprehension , Humans , Semantics

Data mining derived treatment algorithms from the electronic medical record improve theoretical empirical therapy for outpatient urinary tract infections.

Jackson, Hannah Alphs; Cashy, John; Frieder, Ophir; Schaeffer, Anthony J.

J Urol ; 186(6): 2257-62, 2011 Dec.

Article in English | MEDLINE | ID: mdl-22014809

ABSTRACT

PURPOSE: We determined whether data mining derived algorithms from electronic databases can improve empirical antimicrobial therapy in outpatients with a urinary tract infection. MATERIALS AND METHODS: The electronic medical records from 3,308 visits associated with a positive urine culture at Northwestern's outpatient Urology and Internal Medicine clinics and Emergency Department from 2005 to 2009 were interrogated. Bacterial species and susceptibility rates for trimethoprim-sulfamethoxazole, ciprofloxacin and nitrofurantoin were compared. Using data mining techniques we created algorithms for empirical therapy of urinary tract infections and compared the theoretical outcomes from data mining derived therapy to those from conventional therapy. RESULTS: Patients were significantly older in the Department of Urology vs Internal Medicine vs Emergency Department, and more patients in the Department of Urology were male. During the 5-year period the susceptibility rates for ciprofloxacin in the Department of Urology and trimethoprim-sulfamethoxazole in Internal Medicine decreased significantly. In the Department of Urology the susceptibility rate for nitrofurantoin was greater than for ciprofloxacin, which was greater than for trimethoprim-sulfamethoxazole. In all departments, bacteria were more resistant to trimethoprim-sulfamethoxazole than to ciprofloxacin or nitrofurantoin. All drugs were more effective in the Emergency Department and Internal Medicine than the Department of Urology. Prior resistance patterns were the strongest predictor of current susceptibility profiles. In the Department of Urology the algorithms for patients with or without prior cultures theoretically outperformed conventional therapy in men (13.2%) and women (10.1%). CONCLUSIONS: Antimicrobial resistance patterns in outpatient urinary tract infections are time dependent, and drug and site specific. Data mining directed therapy significantly improved theoretical outcomes compared to conventional therapy for Department of Urology outpatients and for female patients in the Emergency Department.

Subject(s)

Algorithms , Data Mining , Electronic Health Records , Urinary Tract Infections/drug therapy , Adult , Female , Humans , Male , Middle Aged , Outpatients

Passage relevance models for genomics search.

Urbain, Jay; Frieder, Ophir; Goharian, Nazli.

BMC Bioinformatics ; 10 Suppl 3: S3, 2009 Mar 19.

Article in English | MEDLINE | ID: mdl-19344479

ABSTRACT

We present a passage relevance model for integrating syntactic and semantic evidence of biomedical concepts and topics using a probabilistic graphical model. Component models of topics, concepts, terms, and document are represented as potential functions within a Markov Random Field. The probability of a passage being relevant to a biologist's information need is represented as the joint distribution across all potential functions. Relevance model feedback of top ranked passages is used to improve distributional estimates of query concepts and topics in context, and a dimensional indexing strategy is used for efficient aggregation of concept and term statistics. By integrating multiple sources of evidence including dependencies between topics, concepts, and terms, we seek to improve genomics literature passage retrieval precision. Using this model, we are able to demonstrate statistically significant improvements in retrieval precision using a large genomics literature corpus.

Subject(s)

Genomics/methods , Information Storage and Retrieval/methods , Models, Statistical , Algorithms , Markov Chains , Natural Language Processing

A dimensional retrieval model for integrating semantics and statistical evidence in context for genomics literature search.

Urbain, Jay; Goharian, Nazli; Frieder, Ophir.

Comput Biol Med ; 39(1): 61-8, 2009 Jan.

Article in English | MEDLINE | ID: mdl-19147128

ABSTRACT

We present a dimensional information retrieval model for combining concept-based semantics and term statistics within multiple levels of document context to identify concise, variable length passages of text that answer a user query. Our results demonstrate improved search results in the presence of varying levels of semantic evidence, and higher performance using retrieval functions that combine document, as well as sentence and passage level information. Experimental results are promising. When ranking documents based on the most relevant extracted passages, the results exceed the state-of-the-art by 15.28% as assessed by the TREC 2005 Genomics track collection of 4.5 million MEDLINE citations.

Subject(s)

Databases, Genetic , Genomics , Information Storage and Retrieval , Models, Statistical , Abstracting and Indexing

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL