Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
J Biomed Semantics ; 12(1): 5, 2021 03 24.
Article in English | MEDLINE | ID: mdl-33761996

ABSTRACT

BACKGROUND: The amount of available data, which can facilitate answering scientific research questions, is growing. However, the different formats of published data are expanding as well, creating a serious challenge when multiple datasets need to be integrated for answering a question. RESULTS: This paper presents a semi-automated framework that provides semantic enhancement of biomedical data, specifically gene datasets. The framework involved a concept recognition task using machine learning, in combination with the BioPortal annotator. Compared to using methods which require only the BioPortal annotator for semantic enhancement, the proposed framework achieves the highest results. CONCLUSIONS: Using concept recognition combined with machine learning techniques and annotation with a biomedical ontology, the proposed framework can provide datasets to reach their full potential of providing meaningful information, which can answer scientific research questions.


Subject(s)
Biological Ontologies , Semantics , Machine Learning
2.
Sci Data ; 5: 180279, 2018 12 04.
Article in English | MEDLINE | ID: mdl-30512011

ABSTRACT

Patents are widely used to protect intellectual property and a measure of innovation output. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. In fact, there were more than 280,000 patent grants issued in the US in 2015. However, accessing, searching and analyzing those patents is often still cumbersome and inefficient. To overcome those problems, Google indexes patents and converts them to Extensible Markup Language (XML) files using Optical Character Recognition (OCR) techniques. In this article, we take this idea one step further and provide semantically rich, machine-readable patents using the Linked Data principles. We have converted the data spanning 12 years - i.e. 2005-2017 from XML to Resource Description Framework (RDF) format, conforming to the Linked Data principles and made them publicly available for re-use. This data can be integrated with other data sources in order to further simplify use cases such as trend analysis, structured patent search & exploration and societal progress measurements. We describe the conversion, publishing, interlinking process along with several use cases for the USPTO Linked Patent data.

3.
J Am Med Inform Assoc ; 24(6): 1169-1172, 2017 Nov 01.
Article in English | MEDLINE | ID: mdl-29016968

ABSTRACT

Therapeutic intent, the reason behind the choice of a therapy and the context in which a given approach should be used, is an important aspect of medical practice. There are unmet needs with respect to current electronic mapping of drug indications. For example, the active ingredient sildenafil has 2 distinct indications, which differ solely on dosage strength. In progressing toward a practice of precision medicine, there is a need to capture and structure therapeutic intent for computational reuse, thus enabling more sophisticated decision-support tools and a possible mechanism for computer-aided drug repurposing. The indications for drugs, such as those expressed in the Structured Product Labels approved by the US Food and Drug Administration, appears to be a tractable area for developing an application ontology of therapeutic intent.


Subject(s)
Drug Labeling , Drug Therapy , Vocabulary, Controlled , Drug Repositioning , Humans , Precision Medicine , United States , United States Food and Drug Administration
4.
BMC Bioinformatics ; 18(1): 415, 2017 Sep 18.
Article in English | MEDLINE | ID: mdl-28923003

ABSTRACT

BACKGROUND: The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. METHODS: In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. RESULTS: Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63). CONCLUSION: Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.


Subject(s)
Algorithms , Metadata/standards , Cluster Analysis , Data Accuracy
5.
JAMA Dermatol ; 150(9): 945-51, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24807687

ABSTRACT

IMPORTANCE: Research prioritization should be guided by impact of disease. OBJECTIVE: To determine whether systematic reviews and protocol topics in Cochrane Database of Systematic Reviews (CDSR) reflect disease burden, measured by disability-adjusted life years (DALYs) from the Global Burden of Disease (GBD) 2010 project. DESIGN, SETTING, AND PARTICIPANTS: Two investigators independently assessed 15 skin conditions in the CDSR for systematic review and protocol representation from November 1, 2013, to December 6, 2013. The 15 skin diseases were matched to their respective DALYs from GBD 2010. An official publication report of all reviews and protocols published by the Cochrane Skin Group (CSG) was also obtained to ensure that no titles were missed. There were no study participants other than the researchers, who worked with databases evaluating CDSR and GBD 2010 skin condition disability data. MAIN OUTCOMES AND MEASURES: Relationship of CDSR topic coverage (systematic reviews and protocols) with percentage of total 2010 DALYs, 2010 DALY rank, and DALY percentage change from 1990 to 2010 for 15 skin conditions. RESULTS: All 15 skin conditions were represented by at least 1 systematic review in CDSR; 69% of systematic reviews and 67% of protocols by the CSG covered the 15 skin conditions. Comparing the number of reviews/protocols and disability, dermatitis, melanoma, nonmelanoma skin cancer, viral skin diseases, and fungal skin diseases were well matched. Decubitus ulcer, psoriasis, and leprosy demonstrated review/protocol overrepresentation when matched with corresponding DALYs. In comparison, acne vulgaris, bacterial skin diseases, urticaria, pruritus, scabies, cellulitis, and alopecia areata were underrepresented in CDSR when matched with corresponding DALYs. CONCLUSIONS AND RELEVANCE: Degree of representation in CDSR is partly correlated with DALY metrics. The number of published reviews/protocols was well matched with disability metrics for 5 of the 15 studied skin diseases, while 3 skin diseases were overrepresented, and 7 were underrepresented. Our results provide high-quality and transparent data to inform future prioritization decisions.


Subject(s)
Cost of Illness , Databases, Factual , Quality-Adjusted Life Years , Review Literature as Topic , Skin Diseases/epidemiology , Humans
6.
PLoS One ; 7(5): e36759, 2012.
Article in English | MEDLINE | ID: mdl-22629329

ABSTRACT

MOTIVATION: Evidence-based medicine (EBM), in the field of neurosurgery, relies on diagnostic studies since Randomized Controlled Trials (RCTs) are uncommon. However, diagnostic study reporting is less standardized which increases the difficulty in reliably aggregating results. Although there have been several initiatives to standardize reporting, they have shown to be sub-optimal. Additionally, there is no central repository for storing and retrieving related articles. RESULTS: In our approach we formulate a computational diagnostic ontology containing 91 elements, including classes and sub-classes, which are required to conduct Systematic Reviews-Meta Analysis (SR-MA) for diagnostic studies, which will assist in standardized reporting of diagnostic articles. SR-MA are studies that aggregate several studies to come to one conclusion for a particular research question. We also report high percentage of agreement among five observers as a result of the interobserver agreement test that we conducted among them to annotate 13 articles using the diagnostic ontology. Moreover, we extend our existing repository CERR-N to include diagnostic studies. AVAILABILITY: The ontology is available for download as an.owl file at: http://bioportal.bioontology.org/ontologies/3013.


Subject(s)
Biomedical Research/standards , Evidence-Based Medicine/standards , Neurosurgery/standards , Humans
7.
Neuroinformatics ; 8(4): 261-71, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20953737

ABSTRACT

Systematic reviews and meta-analyses constitute one of the central pillars of evidence-based medicine. However, clinical trials are poorly reported which delays meta-analyses and consequently the translation of clinical research findings to clinical practice. We propose a Center of Excellence in Research Reporting in Neurosurgery (CERR-N) and the creation of a clinically significant computational ontology to encode Randomized Controlled Trials (RCT) studies in neurosurgery. A 128 element strong computational ontology was derived from the Trial Bank ontology by omitting classes which were not required to perform meta-analysis. Three researchers from our team tagged five randomly selected RCT's each, published in the last 5 years (2004-2008), in the Journal of Neurosurgery (JoN), Neurosurgery Journal (NJ) and Journal of Neurotrauma (JoNT). We evaluated inter and intra observer reliability for the ontology using percent agreement and kappa coefficient. The inter-observer agreement was 76.4%, 75.97% and 74.9% and intra-observer agreement was 89.8%, 80.8% and 86.56% for JoN, NJ and JoNT respectively. The inter-observer kappa coefficient was 0.60, 0.54 and 0.53 and the intra-observer kappa coefficient was 0.79, 0.82 and 0.79 for JoN, NJ and JoNT journals respectively. The high degree of inter and intra-observer agreement confirms tagging consistency in sections of a given scientific manuscript. Standardizing reporting for neurosurgery articles can be reliably achieved through the integration of a computational ontology within the context of a CERR-N. This approach holds potential for the overall improvement in the quality of reporting of RCTs in neurosurgery, ultimately streamlining the translation of clinical research findings to improvement in patient care.


Subject(s)
Biomedical Research , Clinical Trials as Topic , Computational Biology , Evidence-Based Medicine , Humans , Meta-Analysis as Topic , Neurosurgery , Periodicals as Topic
8.
Clin Orthop Relat Res ; 468(10): 2664-71, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20635174

ABSTRACT

BACKGROUND: Collection and analysis of clinical data can help orthopaedic surgeons to practice evidence based medicine. Spreadsheets and offline relational databases are prevalent, but not flexible, secure, workflow friendly and do not support the generation of standardized and interoperable data. Additionally these data collection applications usually do not follow a structured and planned approach which may result in failure to achieve the intended goal. QUESTIONS/PURPOSES: Our purposes are (1) to provide a brief overview of EDC systems, their types, and related pros and cons as well as to describe commonly used EDC platforms and their features; and (2) describe simple steps involved in designing a registry/clinical study in DADOS P, an open source EDC system. WHERE ARE WE NOW?: Electronic data capture systems aimed at addressing these issues are widely being adopted at an institutional/national/international level but are lacking at an individual level. A wide array of features, relative pros and cons and different business models cause confusion and indecision among orthopaedic surgeons interested in implementing EDC systems. WHERE DO WE NEED TO GO?: To answer clinical questions and actively participate in clinical studies, orthopaedic surgeons should collect data in parallel to their clinical activities. Adopting a simple, user-friendly, and robust EDC system can facilitate the data collection process. HOW DO WE GET THERE?: Conducting a balanced evaluation of available options and comparing them with intended goals and requirements can help orthopaedic surgeons to make an informed choice.


Subject(s)
Clinical Trials as Topic , Commerce , Medical Informatics , Orthopedics , Outcome and Process Assessment, Health Care , Registries , Computer Systems , Evidence-Based Medicine , Health Services Research , Humans , Medical Records Systems, Computerized , Research Design , Systems Integration , Treatment Outcome
9.
Clin Orthop Relat Res ; 468(10): 2612-20, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20496021

ABSTRACT

BACKGROUND: Information Technology (IT) plays an important role in storing and collating the vast amounts of healthcare data. However, analyzing and integrating this data to extract useful information is difficult due to the heterogeneous, siloed, disparate, and unstructured nature of the data. WHERE ARE WE NOW?: Attempts to standardize data reporting by establishing reporting standards, checklists and guidelines have not been optimal. Moreover, efforts to integrate data through the use of registries, data sharing networks, vocabularies and data standards have also yielded limited results. These efforts, when applied to orthopaedics, where theoretical knowledge is scattered over subspecialties, make it a cognitively challenging and tedious process. WHERE DO WE NEED TO GO?: Implementing data standardization is an important step towards homogenizing the data so that it can be integrated. Once integrated, the next step would be data analysis for information extraction. This information would be useful in answering important questions, especially in orthopaedic clinical practice and research, and could even help optimize methodologies in the education field. HOW DO WE GET THERE?: With the ability to describe concepts in a standardized manner and define existing interrelationships, ontologies are a potential solution. They assist in standardizing and integrating data and also impart strong inferential capabilities at a granular level. When applied to orthopaedics, they can standardize data collection, link data sources, generate knowledge based on the assumptions present in the interlinked data, thus answering important questions regarding orthopaedic clinical practice, research and education.


Subject(s)
Access to Information , Databases as Topic , Information Dissemination , Medical Informatics , Orthopedic Procedures , Terminology as Topic , Algorithms , Computer-Assisted Instruction , Data Mining , Databases as Topic/standards , Decision Support Systems, Clinical , Decision Support Techniques , Education, Medical , Humans , Orthopedic Procedures/education , Orthopedic Procedures/standards , Semantics , Systems Integration , Vocabulary, Controlled
10.
Health Res Policy Syst ; 8: 38, 2010 Dec 31.
Article in English | MEDLINE | ID: mdl-21194455

ABSTRACT

BACKGROUND: Industry standards provide rigorous descriptions of required data presentation, with the aim of ensuring compatibility across different clinical studies. However despite their crucial importance, these standards are often not used as expected in the development of clinical research. The reasons for this lack of compliance could be related to the high cost and time-intensive nature of the process of data standards implementation. The objective of this study was to evaluate the value of the extra time and cost required for different levels of data standardisation and the likelihood of researchers to comply with these levels. Since we believe that the cost and time necessary for the implementation of data standards can change over time, System Dynamics (SD) analysis was used to investigate how these variables interact and influence the adoption of data standards by clinical researchers. METHODS: Three levels of data standards implementation were defined through focus group discussion involving four clinical research investigators. Ten Brazilian and eighteen American investigators responded to an online questionnaire which presented possible standards implementation scenarios, with respondents asked to choose one of two options available in each scenario. A random effects ordered probit model was used to estimate the effect of cost and time on investigators' willingness to adhere to data standards. The SD model was used to demonstrate the relationship between degrees of data standardisation and subsequent variation in cost and time required to start the associated study. RESULTS: A preference for low cost and rapid implementation times was observed, with investigators more likely to incur costs than to accept a time delay in project start-up. SD analysis indicated that although initially extra time and cost are necessary for clinical study standardisation, there is a decrease in both over time. CONCLUSIONS: Future studies should explore ways of creating mechanisms which decrease the time and cost associated with standardisation processes. In addition, the fact that the costs and time necessary for data standards implementation decrease with time should be made known to the wider research community. Policy makers should attempt to match their data standardisation policies better with the expectations of researchers.

SELECTION OF CITATIONS
SEARCH DETAIL
...