Search | VHL Regional Portal

1.

Hospital Readmission and Social Risk Factors Identified from Physician Notes.

Navathe, Amol S; Zhong, Feiran; Lei, Victor J; Chang, Frank Y; Sordo, Margarita; Topaz, Maxim; Navathe, Shamkant B; Rocha, Roberto A; Zhou, Li.

Health Serv Res ; 53(2): 1110-1136, 2018 04.

Article in English | MEDLINE | ID: mdl-28295260

ABSTRACT

OBJECTIVE: To evaluate the prevalence of seven social factors using physician notes as compared to claims and structured electronic health records (EHRs) data and the resulting association with 30-day readmissions. STUDY SETTING: A multihospital academic health system in southeastern Massachusetts. STUDY DESIGN: An observational study of 49,319 patients with cardiovascular disease admitted from January 1, 2011, to December 31, 2013, using multivariable logistic regression to adjust for patient characteristics. DATA COLLECTION/EXTRACTION METHODS: All-payer claims, EHR data, and physician notes extracted from a centralized clinical registry. PRINCIPAL FINDINGS: All seven social characteristics were identified at the highest rates in physician notes. For example, we identified 14,872 patient admissions with poor social support in physician notes, increasing the prevalence from 0.4 percent using ICD-9 codes and structured EHR data to 16.0 percent. Compared to an 18.6 percent baseline readmission rate, risk-adjusted analysis showed higher readmission risk for patients with housing instability (readmission rate 24.5 percent; p < .001), depression (20.6 percent; p < .001), drug abuse (20.2 percent; p = .01), and poor social support (20.0 percent; p = .01). CONCLUSIONS: The seven social risk factors studied are substantially more prevalent than represented in administrative data. Automated methods for analyzing physician notes may enable better identification of patients with social needs.

Subject(s)

Documentation/statistics & numerical data , Electronic Health Records/statistics & numerical data , Patient Readmission/statistics & numerical data , Physicians , Accidental Falls/statistics & numerical data , Adolescent , Adult , Age Factors , Aged , Aged, 80 and over , Depression/epidemiology , Female , Ill-Housed Persons/statistics & numerical data , Humans , Insurance Claim Review/statistics & numerical data , Logistic Models , Male , Massachusetts , Middle Aged , Natural Language Processing , Risk Factors , Sex Factors , Social Support , Socioeconomic Factors , Substance-Related Disorders/epidemiology , Time Factors , Young Adult

2.

VIGOR: Interactive Visual Exploration of Graph Query Results.

Pienta, Robert; Hohman, Fred; Endert, Alex; Tamersoy, Acar; Roundy, Kevin; Gates, Chris; Navathe, Shamkant; Chau, Duen Horng.

IEEE Trans Vis Comput Graph ; 24(1): 215-225, 2018 01.

Article in English | MEDLINE | ID: mdl-28866563

ABSTRACT

Finding patterns in graphs has become a vital challenge in many domains from biological systems, network security, to finance (e.g., finding money laundering rings of bankers and business owners). While there is significant interest in graph databases and querying techniques, less research has focused on helping analysts make sense of underlying patterns within a group of subgraph results. Visualizing graph query results is challenging, requiring effective summarization of a large number of subgraphs, each having potentially shared node-values, rich node features, and flexible structure across queries. We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results. VIGOR uses multiple coordinated views, leveraging different data representations and organizations to streamline analysts sensemaking process. VIGOR contributes: (1) an exemplar-based interaction technique, where an analyst starts with a specific result and relaxes constraints to find other similar results or starts with only the structure (i.e., without node value constraints), and adds constraints to narrow in on specific results; and (2) a novel feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents. We also evaluate VIGOR with a within-subjects study, demonstrating VIGOR's ease of use over a leading graph database management system, and its ability to help analysts understand their results at higher speed and make fewer errors.

3.

Constraint based temporal event sequence mining for Glioblastoma survival prediction.

Malhotra, Kunal; Navathe, Shamkant B; Chau, Duen Horng; Hadjipanayis, Costas; Sun, Jimeng.

J Biomed Inform ; 61: 267-75, 2016 06.

Article in English | MEDLINE | ID: mdl-27064059

ABSTRACT

OBJECTIVE: A significant challenge in treating rare forms of cancer such as Glioblastoma (GBM) is to find optimal personalized treatment plans for patients. The goals of our study is to predict which patients survive longer than the median survival time for GBM based on clinical and genomic factors, and to assess the predictive power of treatment patterns. METHOD: We developed a predictive model based on the clinical and genomic data from approximately 300 newly diagnosed GBM patients for a period of 2years. We proposed sequential mining algorithms with novel clinical constraints, namely, 'exact-order' and 'temporal overlap' constraints, to extract treatment patterns as features used in predictive modeling. With diverse features from clinical, genomic information and treatment patterns, we applied both logistic regression model and Cox regression to model patient survival outcome. RESULTS: The most predictive features influencing the survival period of GBM patients included mRNA expression levels of certain genes, some clinical characteristics such as age, Karnofsky performance score, and therapeutic agents prescribed in treatment patterns. Our models achieved c-statistic of 0.85 for logistic regression and 0.84 for Cox regression. CONCLUSIONS: We demonstrated the importance of diverse sources of features in predicting GBM patient survival outcome. The predictive model presented in this study is a preliminary step in a long-term plan of developing personalized treatment plans for GBM patients that can later be extended to other types of cancers.

Subject(s)

Brain Neoplasms , Data Mining , Genetic Markers , Glioblastoma , Algorithms , Humans , Models, Theoretical , Prognosis , RNA, Messenger/metabolism , Survival Rate

4.

VISAGE: Interactive Visual Graph Querying.

Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng.

AVI ; 2016: 272-279, 2016 Jun.

Article in English | MEDLINE | ID: mdl-28553670

ABSTRACT

Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.

5.

Identifying Patients with Depression Using Free-text Clinical Documents.

Zhou, Li; Baughman, Amy W; Lei, Victor J; Lai, Kenneth H; Navathe, Amol S; Chang, Frank; Sordo, Margarita; Topaz, Maxim; Zhong, Feiran; Murrali, Madhavan; Navathe, Shamkant; Rocha, Roberto A.

Stud Health Technol Inform ; 216: 629-33, 2015.

Article in English | MEDLINE | ID: mdl-26262127

ABSTRACT

About 1 in 10 adults are reported to exhibit clinical depression and the associated personal, societal, and economic costs are significant. In this study, we applied the MTERMS NLP system and machine learning classification algorithms to identify patients with depression using discharge summaries. Domain experts reviewed both the training and test cases, and classified these cases as depression with a high, intermediate, and low confidence. For depression cases with high confidence, all of the algorithms we tested performed similarly, with MTERMS' knowledge-based decision tree slightly better than the machine learning classifiers, achieving an F-measure of 89.6%. MTERMS also achieved the highest F-measure (70.6%) on intermediate confidence cases. The RIPPER rule learner was the best performing machine learning method, with an F-measure of 70.0%, and a higher precision but lower recall than MTERMS. The proposed NLP-based approach was able to identify a significant portion of the depression cases (about 20%) that were not on the coded diagnosis list.

Subject(s)

Data Mining/methods , Decision Support Systems, Clinical/organization & administration , Depression/diagnosis , Diagnosis, Computer-Assisted/methods , Electronic Health Records/classification , Natural Language Processing , Boston , Depression/classification , Humans , Machine Learning , Reproducibility of Results , Sensitivity and Specificity

6.

Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes.

Liu, Ying; Navathe, Shamkant B; Pivoshenko, Alex; Dasigi, Venu G; Dingledine, Ray; Ciliax, Brian J.

Int J Data Min Bioinform ; 1(1): 88-110, 2006.

Article in English | MEDLINE | ID: mdl-18402044

ABSTRACT

One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.

Subject(s)

Electronic Data Processing , Gene Expression Regulation, Fungal/physiology , Genes, Fungal/physiology , MEDLINE , Saccharomyces cerevisiae/physiology , Vocabulary, Controlled

7.

Text mining biomedical literature for discovering gene-to-gene relationships: a comparative study of algorithms.

Liu, Ying; Navathe, Shamkant B; Civera, Jorge; Dasigi, Venu; Ram, Ashwin; Ciliax, Brian J; Dingledine, Ray.

IEEE/ACM Trans Comput Biol Bioinform ; 2(1): 62-76, 2005.

Article in English | MEDLINE | ID: mdl-17044165

ABSTRACT

Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.

Subject(s)

Algorithms , MEDLINE , Multigene Family/physiology , Natural Language Processing , Periodicals as Topic , Protein Interaction Mapping/methods , Proteins/metabolism , Abstracting and Indexing/methods , Gene Expression Profiling/methods , Information Storage and Retrieval/methods , Proteins/classification , Vocabulary, Controlled

8.

Investigation into biomedical literature classification using support vector machines.

Polavarapu, Nalini; Navathe, Shamkant B; Ramnarayanan, Ramprasad; ul Haque, Abrar; Sahay, Saurav; Liu, Ying.

Proc IEEE Comput Syst Bioinform Conf ; : 366-74, 2005.

Article in English | MEDLINE | ID: mdl-16447994

ABSTRACT

Specific topic search in the PubMed Database, one of the most important information resources for scientific community, presents a big challenge to the users. The researcher typically formulates boolean queries followed by scanning the retrieved records for relevance, which is very time consuming and error prone. We applied Support Vector Machines (SVM) for automatic retrieval of PubMed articles related to Human genome epidemiological research at CDC (Center for disease Control and Prevention). In this paper, we discuss various investigations into biomedical literature classification and analyze the effect of various issues related to the choice of keywords, training sets, kernel functions and parameters for the SVM technique. We report on the various factors above to show that SVM is a viable technique for automatic classification of biomedical literature into topics of interest such as epidemiology, cancer, birth defects etc. In all our experiments, we achieved high values of PPV, sensitivity and specificity.

Subject(s)

Abstracting and Indexing/methods , Database Management Systems , Information Storage and Retrieval/methods , Natural Language Processing , Pattern Recognition, Automated/methods , Periodicals as Topic , PubMed , Algorithms , Artificial Intelligence , Vocabulary, Controlled

9.

MITOMAP: a human mitochondrial genome database--2004 update.

Brandon, Marty C; Lott, Marie T; Nguyen, Kevin Cuong; Spolim, Syawal; Navathe, Shamkant B; Baldi, Pierre; Wallace, Douglas C.

Nucleic Acids Res ; 33(Database issue): D611-3, 2005 Jan 01.

Article in English | MEDLINE | ID: mdl-15608272

ABSTRACT

MITOMAP (http://www.MITOMAP.org), a database for the human mitochondrial genome, has grown rapidly in data content over the past several years as interest in the role of mitochondrial DNA (mtDNA) variation in human origins, forensics, degenerative diseases, cancer and aging has increased dramatically. To accommodate this information explosion, MITOMAP has implemented a new relational database and an improved search engine, and all programs have been rewritten. System administrative changes have been made to improve security and efficiency, and to make MITOMAP compatible with a new automatic mtDNA sequence analyzer known as Mitomaster.

Subject(s)

DNA, Mitochondrial/chemistry , Databases, Nucleic Acid , Genome, Human , Mitochondria/genetics , Database Management Systems , Genetic Predisposition to Disease , Genetic Variation , Genomics , Humans , Mutation , Systems Integration , User-Computer Interface

10.

Text mining functional keywords associated with genes.

Liu, Ying; Brandon, Martin; Navathe, Shamkant; Dingledine, Ray; Ciliax, Brian J.

Stud Health Technol Inform ; 107(Pt 1): 292-6, 2004.

Article in English | MEDLINE | ID: mdl-15360821

ABSTRACT

Modern experimental techniques provide the ability to gather vast amounts of biological data in a single experiment (e.g. DNA microarray experiment), making it extremely difficult for the researcher to interpret the data and form conclusions about the functions of the genes. Current approaches provide useful information that organizes or relates genes, but a major shortcoming is they either do not address specific functions of the genes or are constrained by functions predefined in other databases, which can be biased, incomplete, or out-of-date. We extended Andrade and Valencia's method [1] to statistically mine functional keywords associated with genes from MEDLINE abstracts. The MEDLINE abstracts are analyzed statistically to score and rank keywords for each gene using a background set of words for baseline frequencies. We generally got very good functional keyword information about the genes we tested, which was confirmed by searching for the individual keywords in context. The keywords extracted by our algorithm reveal a wealth of potential functional concepts, which were not represented in existing public databases. We feel that this approach is general enough to apply to medical and biological literature to find other relationships: drugs vs. genes, risk-factors vs. genes, etc.

Subject(s)

Genes , Information Storage and Retrieval , Subject Headings , Algorithms , Databases, Genetic , Gene Expression Profiling , MEDLINE , Oligonucleotide Array Sequence Analysis , Statistics as Topic

11.

Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.

Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray.

Proc IEEE Comput Syst Bioinform Conf ; : 394-404, 2004.

Article in English | MEDLINE | ID: mdl-16448032

ABSTRACT

One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.

Subject(s)

Artificial Intelligence , Cluster Analysis , MEDLINE , Multigene Family/genetics , Natural Language Processing , Oligonucleotide Array Sequence Analysis/methods , Vocabulary, Controlled , Information Storage and Retrieval/methods , Structure-Activity Relationship

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL