Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
1.
Artif Intell Med ; 154: 102924, 2024 Jun 26.
Article in English | MEDLINE | ID: mdl-38964194

ABSTRACT

BACKGROUND: Radiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently, the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness, and information retrieval. We propose a pipeline to extract information from Italian free-text radiology reports that fits with the items of the reference SR registry proposed by a national society of interventional and medical radiology, focusing on CT staging of patients with lymphoma. METHODS: Our work aims to leverage the potential of Natural Language Processing and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 Italian radiology reports, we investigate a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5. To address information content discrepancies, we focus on the six most frequently filled items in the annotations made on the reports: three categorical (multichoice), one free-text (free-text), and two continuous numerical (factual). In the preprocessing phase, we encode also information that is not supposed to be entered. Two strategies (batch-truncation and ex-post combination) are implemented to comply with the IT5 context length limitations. Performance is evaluated in terms of strict accuracy, f1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. Unlike multichoice and factual, free-text answers do not have 1-to-1 correspondence with their reference annotations. For this reason, we collect human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire (evaluating the criteria of correctness and completeness). RESULTS: The combination of fine-tuning and batch splitting allows IT5 ex-post combination to achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. Human-based assessment scores of free-text answers show a high correlation with the AI performance metrics f1 (Spearman's correlation coefficients>0.5, p-values<0.001) for both IT5 ex-post combination and GPT-3.5. The latter is better at generating plausible human-like statements, even if it systematically provides answers even when they are not supposed to be given. CONCLUSIONS: In our experimental setting, a fine-tuned Transformer-based model with a modest number of parameters (i.e., IT5, 220 M) performs well as a clinical information extraction system for automatic SR registry filling task. It can extract information from more than one place in the report, elaborating it in a manner that complies with the response specifications provided by the SR registry (for multichoice and factual items), or that closely approximates the work of a human-expert (free-text items); with the ability to discern when an answer is supposed to be given or not to a user query.

2.
Math Biosci Eng ; 21(1): 1342-1355, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38303468

ABSTRACT

Extracting entity relations from unstructured Chinese electronic medical records is an important task in medical information extraction. However, Chinese electronic medical records mostly have document-level volumes, and existing models are either unable to handle long text sequences or exhibit poor performance. This paper proposes a neural network based on feature augmentation and cascade binary tagging framework. First, we utilize a pre-trained model to tokenize the original text and obtain word embedding vectors. Second, the word vectors are fed into the feature augmentation network and fused with the original features and position features. Finally, the cascade binary tagging decoder generates the results. In the current work, we built a Chinese document-level electronic medical record dataset named VSCMeD, which contains 595 real electronic medical records from vascular surgery patients. The experimental results show that the model achieves a precision of 87.82% and recall of 88.47%. It is also verified on another Chinese medical dataset CMeIE-V2 that the model achieves a precision of 54.51% and recall of 48.63%.


Subject(s)
Electronic Health Records , Neural Networks, Computer , Humans , Information Storage and Retrieval , China
3.
Sensors (Basel) ; 23(23)2023 Nov 23.
Article in English | MEDLINE | ID: mdl-38067736

ABSTRACT

The rapid growth of electronic health records (EHRs) has led to unprecedented biomedical data. Clinician access to the latest patient information can improve the quality of healthcare. However, clinicians have difficulty finding information quickly and easily due to the sheer data mining volume. Biomedical information retrieval (BIR) systems can help clinicians find the information required by automatically searching EHRs and returning relevant results. However, traditional BIR systems cannot understand the complex relationships between EHR entities. Transformers are a new type of neural network that is very effective for natural language processing (NLP) tasks. As a result, transformers are well suited for tasks such as machine translation and text summarization. In this paper, we propose a new BIR system for EHRs that uses transformers for predicting cancer treatment from EHR. Our system can understand the complex relationships between the different entities in an EHR, which allows it to return more relevant results to clinicians. We evaluated our system on a dataset of EHRs and found that it outperformed state-of-the-art BIR systems on various tasks, including medical question answering and information extraction. Our results show that Transformers are a promising approach for BIR in EHRs, reaching an accuracy and an F1-score of 86.46%, and 0.8157, respectively. We believe that our system can help clinicians find the information they need more quickly and easily, leading to improved patient care.


Subject(s)
Electronic Health Records , Neoplasms , Humans , Data Mining/methods , Natural Language Processing , Neural Networks, Computer , Information Systems , Neoplasms/therapy
4.
Front Res Metr Anal ; 8: 1250930, 2023.
Article in English | MEDLINE | ID: mdl-37841902

ABSTRACT

Biomedical experts are facing challenges in keeping up with the vast amount of biomedical knowledge published daily. With millions of citations added to databases like MEDLINE/PubMed each year, efficiently accessing relevant information becomes crucial. Traditional term-based searches may lead to irrelevant or missed documents due to homonyms, synonyms, abbreviations, or term mismatch. To address this, semantic search approaches employing predefined concepts with associated synonyms and relations have been used to expand query terms and improve information retrieval. The National Library of Medicine (NLM) plays a significant role in this area, indexing citations in the MEDLINE database with topic descriptors from the Medical Subject Headings (MeSH) thesaurus, enabling advanced semantic search strategies to retrieve relevant citations, despite synonymy, and polysemy of biomedical terms. Over time, advancements in semantic indexing have been made, with Machine Learning facilitating the transition from manual to automatic semantic indexing in the biomedical literature. The paper highlights the journey of this transition, starting with manual semantic indexing and the initial efforts toward automatic indexing. The BioASQ challenge has served as a catalyst in revolutionizing the domain of semantic indexing, further pushing the boundaries of efficient knowledge retrieval in the biomedical field.

5.
Ann Biomed Eng ; 2023 Oct 19.
Article in English | MEDLINE | ID: mdl-37855948

ABSTRACT

Large language models (LLMs) such as ChatGPT have recently attracted significant attention due to their impressive performance on many real-world tasks. These models have also demonstrated the potential in facilitating various biomedical tasks. However, little is known of their potential in biomedical information retrieval, especially identifying drug-disease associations. This study aims to explore the potential of ChatGPT, a popular LLM, in discerning drug-disease associations. We collected 2694 true drug-disease associations and 5662 false drug-disease pairs. Our approach involved creating various prompts to instruct ChatGPT in identifying these associations. Under varying prompt designs, ChatGPT's capability to identify drug-disease associations with an accuracy of 74.6-83.5% and 96.2-97.6% for the true and false pairs, respectively. This study shows that ChatGPT has the potential in identifying drug-disease associations and may serve as a helpful tool in searching pharmacy-related information. However, the accuracy of its insights warrants comprehensive examination before its implementation in medical practice.

6.
IEEE Access ; 11: 81563-81576, 2023.
Article in English | MEDLINE | ID: mdl-37691998

ABSTRACT

The fifth-generation (5G) cellular communication technology introduces technical advances that can expand medical device access to connectivity services. However, assessing the safety and effectiveness of emerging 5G-enabled medical devices is challenging as relevant evaluation methods have not yet been established. In this paper, we propose a design model for 5G testbed as a regulatory science tool (TRUST) for assessing 5G connectivity enablers of medical device functions. Specifically, we first identify application specific testing needs and general testing protocols. Next, we outline the selection and customization of key system components to create a 5G testbed. A TRUST demonstration is documented through a realistic 5G testbed implementation along with the deployment of a custom-built example use-case for 5G-enabled medical extended reality (MXR). Detailed configurations, example collected data, and implementation challenges are presented. The openness of the TRUST design model allows a TRUST testbed to be easily extended and customized to incorporate available resources and address the evaluation needs of different stakeholders.

7.
BMC Bioinformatics ; 23(1): 549, 2022 Dec 19.
Article in English | MEDLINE | ID: mdl-36536280

ABSTRACT

Extracting knowledge from heterogeneous data sources is fundamental for the construction of structured biomedical knowledge graphs (BKGs), where entities and relations are represented as nodes and edges in the graphs, respectively. Previous biomedical knowledge extraction methods simply considered limited entity types and relations by using a task-specific training set, which is insufficient for large-scale BKGs development and downstream task applications in different scenarios. To alleviate this issue, we propose a joint continual learning biomedical information extraction (JCBIE) network to extract entities and relations from different biomedical information datasets. By empirically studying different joint learning and continual learning strategies, the proposed JCBIE can learn and expand different types of entities and relations from different datasets. JCBIE uses two separated encoders in joint-feature extraction, hence can effectively avoid the feature confusion problem comparing with using one hard-parameter sharing encoder. Specifically, it allows us to adopt entity augmented inputs to establish the interaction between named entity recognition and relation extraction. Finally, a novel evaluation mechanism is proposed for measuring cross-corpus generalization errors, which was ignored by traditional evaluation methods. Our empirical studies show that JCBIE achieves promising performance when continual learning strategy is adopted with multiple corpora.


Subject(s)
Biomedical Research , Data Mining , Data Mining/methods , Neural Networks, Computer , Knowledge , Longitudinal Studies
9.
Comput Methods Programs Biomed ; 211: 106433, 2021 Nov.
Article in English | MEDLINE | ID: mdl-34614452

ABSTRACT

BACKGROUND AND OBJECTIVE: Major Depressive Disorder is a highly prevalent and disabling mental health condition. Numerous studies explored multimodal fusion systems combining visual, audio, and textual features via deep learning architectures for clinical depression recognition. Yet, no comparative analysis for multimodal depression analysis has been proposed in the literature. METHODS: In this paper, an up-to-date literature overview of multimodal depression recognition is presented and an extensive comparative analysis of different deep learning architectures for depression recognition is performed. First, audio features based Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) are studied. Then, early-level and model-level fusion of deep audio features with visual and textual features through LSTM and CNN architectures are investigated. RESULTS: The performance of the proposed architectures using an hold-out strategy on the DAIC-WOZ dataset (80% training, 10% validation, 10% test split) for binary and severity levels of depression recognition is tested. Using this strategy, a set of experiments have been performed and they have demonstrated: (1) LSTM-based audio features perform slightly better than CNN ones with an accuracy of 66.25% versus 65.60% for binary depression classes. (2) the model level fusion of deep audio and visual features using LSTM network performed the best with an accuracy of 77.16%, a precision of 53% for the depressed class, and a precision of 83% for the non-depressed class. The given network obtained a normalized Root Mean Square Error (RMSE) of 0.15 for depression severity level prediction. Using a Leave-One-Subject-Out strategy, this network achieved an accuracy of 95.38% for binary depression detection, and a normalized RMSE of 0.1476 for depression severity level prediction. Our best-performing architecture outperforms all state-of-the-art approaches on DAIC-WOZ dataset. CONCLUSIONS: The obtained results show that the proposed LSTM-based surpass the proposed CNN-based architectures allowing to learn temporal dynamics representations of multimodal features. Furthermore, model-level fusion of audio and visual features using an LSTM network leads to the best performance. Our best-performing architecture successfully detects depression using a speech segment of less than 8 seconds, and an average prediction computation time of less than 6ms; making it suitable for real-world clinical applications.


Subject(s)
Depressive Disorder, Major , Depression/diagnosis , Depressive Disorder, Major/diagnosis , Humans , Neural Networks, Computer
10.
JMIR Med Inform ; 9(6): e28272, 2021 Jun 29.
Article in English | MEDLINE | ID: mdl-34185006

ABSTRACT

BACKGROUND: With the development of biomedicine, the number of biomedical documents has increased rapidly bringing a great challenge for researchers trying to retrieve the information they need. Information retrieval aims to meet this challenge by searching relevant documents from abundant documents based on the given query. However, sometimes the relevance of search results needs to be evaluated from multiple aspects in specific retrieval tasks, thereby increasing the difficulty of biomedical information retrieval. OBJECTIVE: This study aimed to find a more systematic method for retrieving relevant scientific literature for a given patient. METHODS: In the initial retrieval stage, we supplemented query terms through query expansion strategies and applied query boosting to obtain an initial ranking list of relevant documents. In the re-ranking phase, we employed a text classification model and relevance matching model to evaluate documents from different dimensions and then combined the outputs through logistic regression to re-rank all the documents from the initial ranking list. RESULTS: The proposed ensemble method contributed to the improvement of biomedical retrieval performance. Compared with the existing deep learning-based methods, experimental results showed that our method achieved state-of-the-art performance on the data collection provided by the Text Retrieval Conference 2019 Precision Medicine Track. CONCLUSIONS: In this paper, we proposed a novel ensemble method based on deep learning. As shown in the experiments, the strategies we used in the initial retrieval phase such as query expansion and query boosting are effective. The application of the text classification model and relevance matching model better captured semantic context information and improved retrieval performance.

11.
J Biomed Inform ; 117: 103732, 2021 05.
Article in English | MEDLINE | ID: mdl-33737208

ABSTRACT

BACKGROUND: Understanding the relationships between genes, drugs, and disease states is at the core of pharmacogenomics. Two leading approaches for identifying these relationships in medical literature are: human expert led manual curation efforts, and modern data mining based automated approaches. The former generates small amounts of high-quality data, and the latter offers large volumes of mixed quality data. The algorithmically extracted relationships are often accompanied by supporting evidence, such as, confidence scores, source articles, and surrounding contexts (excerpts) from the articles, that can be used as data quality indicators. Tools that can leverage these quality indicators to help the user gain access to larger and high-quality data are needed. APPROACH: We introduce GeneDive, a web application for pharmacogenomics researchers and precision medicine practitioners that makes gene, disease, and drug interactions data easily accessible and usable. GeneDive is designed to meet three key objectives: (1) provide functionality to manage information-overload problem and facilitate easy assimilation of supporting evidence, (2) support longitudinal and exploratory research investigations, and (3) offer integration of user-provided interactions data without requiring data sharing. RESULTS: GeneDive offers multiple search modalities, visualizations, and other features that guide the user efficiently to the information of their interest. To facilitate exploratory research, GeneDive makes the supporting evidence and context for each interaction readily available and allows the data quality threshold to be controlled by the user as per their risk tolerance level. The interactive search-visualization loop enables relationship discoveries between diseases, genes, and drugs that might not be explicitly described in literature but are emergent from the source medical corpus and deductive reasoning. The ability to utilize user's data either in combination with the GeneDive native datasets or in isolation promotes richer data-driven exploration and discovery. These functionalities along with GeneDive's applicability for precision medicine, bringing the knowledge contained in biomedical literature to bear on particular clinical situations and improving patient care, are illustrated through detailed use cases. CONCLUSION: GeneDive is a comprehensive, broad-use biological interactions browser. The GeneDive application and information about its underlying system architecture are available at http://www.genedive.net. GeneDive Docker image is also available for download at this URL, allowing users to (1) import their own interaction data securely and privately; and (2) generate and test hypotheses across their own and other datasets.


Subject(s)
Pharmaceutical Preparations , Precision Medicine , Data Mining , Humans , Pharmacogenetics , Software
12.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32591802

ABSTRACT

Biomedical information extraction (BioIE) is an important task. The aim is to analyze biomedical texts and extract structured information such as named entities and semantic relations between them. In recent years, pre-trained language models have largely improved the performance of BioIE. However, they neglect to incorporate external structural knowledge, which can provide rich factual information to support the underlying understanding and reasoning for biomedical information extraction. In this paper, we first evaluate current extraction methods, including vanilla neural networks, general language models and pre-trained contextualized language models on biomedical information extraction tasks, including named entity recognition, relation extraction and event extraction. We then propose to enrich a contextualized language model by integrating a large scale of biomedical knowledge graphs (namely, BioKGLM). In order to effectively encode knowledge, we explore a three-stage training procedure and introduce different fusion strategies to facilitate knowledge injection. Experimental results on multiple tasks show that BioKGLM consistently outperforms state-of-the-art extraction models. A further analysis proves that BioKGLM can capture the underlying relations between biomedical knowledge concepts, which are crucial for BioIE.


Subject(s)
Data Mining , Natural Language Processing , Neural Networks, Computer , Semantics
13.
Rev Esp Patol ; 53(4): 226-231, 2020.
Article in English | MEDLINE | ID: mdl-33012492

ABSTRACT

A proposal of an updated system of the Organization of Scientific Biomedical Kowledge is presented, integrating the historical achievements in pathology from the 15th to the 21st century. Scientific understanding of disease (Human Biopathology) is actually acquired at consecutive levels: 1) Etiopathogenic, 2) Structural, 3) Physiopathological, and 4) Clinical. A complete spectrum of etiological factors is presented. A new organization of the structural basis of disease processes (Human Structural Biopathology) is presented. Two unique polar types of cellular pathology are proposed: cellular injury and cellular change. Translation of these two types of cellular pathology into the integrative structural cytotissular levels, gives rise to only ten basic structural processes, that can be organized in three main cytotissular (CT) structural complexes: 1) CT maldevelopment that includes: congenital malformation(1), tumoral maldevelopment (2), hereditary non malformative congenital organopathy (3). 2) Complex of CT injury or non-hereditary organopathies (4), associating: CT necrosis-inflammatory reaction- repair. 3. complex of CT change: atrophy (5), hypertrophy (6), hyperplasia (7), metaplasia (8), dysplasia (9) and neoplasia (10). This system provides a precise basis for the organization of Human Biopathology, which could be applied to: 1) the development of a Universal Medical Curriculum, 2) Departamental Organization of a Faculty of Medicine, 3) the development of a New Global System for Disease Control. As we enter the era of Big Data, 5G, digitalization and artificial intelligence, a rational, scientific and efficient organization of biomedical information will be crucial in determining the success or failure of its applications to the health system.


Subject(s)
Artificial Intelligence , Big Data , Pathology , Curriculum , Humans , Inflammation
14.
BMC Bioinformatics ; 20(Suppl 16): 590, 2019 Dec 02.
Article in English | MEDLINE | ID: mdl-31787087

ABSTRACT

BACKGROUND: The number of biomedical research articles have increased exponentially with the advancement of biomedicine in recent years. These articles have thus brought a great difficulty in obtaining the needed information of researchers. Information retrieval technologies seek to tackle the problem. However, information needs cannot be completely satisfied by directly introducing the existing information retrieval techniques. Therefore, biomedical information retrieval not only focuses on the relevance of search results, but also aims to promote the completeness of the results, which is referred as the diversity-oriented retrieval. RESULTS: We address the diversity-oriented biomedical retrieval task using a supervised term ranking model. The model is learned through a supervised query expansion process for term refinement. Based on the model, the most relevant and diversified terms are selected to enrich the original query. The expanded query is then fed into a second retrieval to improve the relevance and diversity of search results. To this end, we propose three diversity-oriented optimization strategies in our model, including the diversified term labeling strategy, the biomedical resource-based term features and a diversity-oriented group sampling learning method. Experimental results on TREC Genomics collections demonstrate the effectiveness of the proposed model in improving the relevance and the diversity of search results. CONCLUSIONS: The proposed three strategies jointly contribute to the improvement of biomedical retrieval performance. Our model yields more relevant and diversified results than the state-of-the-art baseline models. Moreover, our method provides a general framework for improving biomedical retrieval performance, and can be used as the basis for future work.


Subject(s)
Algorithms , Biomedical Research , Information Storage and Retrieval , Models, Theoretical , Genomics
15.
BMC Bioinformatics ; 20(1): 429, 2019 Aug 16.
Article in English | MEDLINE | ID: mdl-31419935

ABSTRACT

BACKGROUND: Diagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the mutational status of a patient's genome. This analysis relies on previously published information regarding the association of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical search engines to obtain such information; however, the vast majority of scientific publications focus on basic science and have no direct clinical impact. We develop the Variant-Information Search Tool (VIST), a search engine designed for the targeted search of clinically relevant publications given an oncological mutation profile. RESULTS: VIST indexes all PubMed abstracts and content from ClinicalTrials.gov. It applies advanced text mining to identify mentions of genes, variants and drugs and uses machine learning based scoring to judge the clinical relevance of indexed abstracts. Its functionality is available through a fast and intuitive web interface. We perform several evaluations, showing that VIST's ranking is superior to that of PubMed or a pure vector space model with regard to the clinical relevance of a document's content. CONCLUSION: Different user groups search repositories of scientific publications with different intentions. This diversity is not adequately reflected in the standard search engines, often leading to poor performance in specialized settings. We develop a search engine for the specific case of finding documents that are clinically relevant in the course of cancer treatment. We believe that the architecture of our engine, heavily relying on machine learning algorithms, can also act as a blueprint for search engines in other, equally specific domains. VIST is freely available at https://vist.informatik.hu-berlin.de/.


Subject(s)
Neoplasms/pathology , Precision Medicine , Search Engine , Algorithms , Databases as Topic , Documentation , Humans , Internet , User-Computer Interface
16.
J Biomed Inform ; 95: 103224, 2019 07.
Article in English | MEDLINE | ID: mdl-31200123

ABSTRACT

BACKGROUND: Information curation and literature surveillance efforts that synthesize the current knowledge about the impact of genetic variability on disease states and drug responses are vitally important for the practise of evidence-based precision medicine. For these efforts, finding the relevant and comprehensive set of articles from the ever growing scientific literature is a challenge. METHODS: We have designed and developed Article Retrieval for Precision Medicine (ARtPM), an end-to-end article retrieval system that employs multi-stage architecture to retrieve and rank relevant articles for a given medical case summary (genetic variants, disease, demographic, and other medical conditions). We compared ARtPM with five baselines, including PubMed Best Match, the improved search functionality recently introduced by PubMed. RESULTS: The differences in the performance of ARtPM and five baselines were statistically significant for four metrics that quantify different aspects of search effectiveness (P-values for P@10, R-prec, infNDCG, Recall@1000 were <.001, <.001,.003,.009, respectively). Pairwise systems' comparisons show that ARtPM is comparable or better than the best performing baseline on three metrics (R-prec: 0.324 vs 0.299, P-value=.06; infNDCG: 0.556 vs 0.465, P-value=.08; R@1000: 0.665 vs 0.572, P-value=.007), but performance in P@10 (0.603 vs 0.630, P-value:.64) needs to improve. CONCLUSION: The recall-focused phase of the ARtPM is effective at retrieving more relevant articles. The precision-focused ranking phase performs well at deeper ranks but needs further work on early ranks (e.g., richer feature set). Overall, the ARtPM system effectively facilitates evidence-based precision medicine practice, and provides a robust search framework for further work in this direction.


Subject(s)
Information Storage and Retrieval/methods , Precision Medicine , Biomedical Research , Data Curation , Databases, Factual , Humans , Periodicals as Topic
17.
Comput Med Imaging Graph ; 72: 34-46, 2019 03.
Article in English | MEDLINE | ID: mdl-30772074

ABSTRACT

BACKGROUND AND OBJECTIVE: Modern microscopes can acquire multi-channel large histological data from tissues of human beings or animals, which contain rich biomedical information for disease diagnosis and biological feature analysis. However, due to the large size, fuzzy tissue structure, and complicated multiple elements integrated in the image color space, it is still a challenge for current software systems to effectively calculate histological data, show the inner tissue structures and unveil hidden biomedical information. Therefore, we developed new algorithms and a software platform to address this issue. METHODS: This paper presents a multi-channel biomedical data computing and visualization system that can efficiently process large 3D histological images acquired from high-resolution microscopes. A novelty of our system is that it can dynamically display a volume of interest and extract tissue information using a layer-based data navigation scheme. During the data exploring process, the actual resolution of the loaded data can be dynamically determined and updated, and data rendering is synchronized in four display windows at each data layer, where 2D textures are extracted from the imaging volume and mapped onto the displayed clipping planes in 3D space. RESULTS: To test the efficiency and scalability of this system, we performed extensive evaluations using several different hardware systems and large histological color datasets acquired from a CryoViz 3D digital system. The experimental results demonstrated that our system can deliver interactive data navigation speed and display detailed imaging information in real time, which is beyond the capability of commonly available biomedical data exploration software platforms. CONCLUSION: Taking advantage of both CPU (central processing unit) main memory and GPU (graphics processing unit) graphics memory, the presented software platform can efficiently compute, process and visualize very large biomedical data and enhance data information. The performance of this system can satisfactorily address the challenges of navigating and interrogating volumetric multi-spectral large histological image at multiple resolution levels.


Subject(s)
Computer Graphics , Data Visualization , Databases, Factual , Diagnostic Imaging , Histology , Algorithms , Humans , Imaging, Three-Dimensional
18.
Open Life Sci ; 13: 355-373, 2018 Jan.
Article in English | MEDLINE | ID: mdl-33817104

ABSTRACT

With the rapid development of information technology and biomedical engineering, people can get more and more information. At the same time, they begin to study how to apply the advanced technology in biomedical information. The main research of this paper is to optimize the machine learning method by particle swarm optimization (PSO) and apply it in the classification of biomedical data. In order to improve the performance of the classification model, we compared the different inertia weight strategies and mutation strategies and their combinations with PSO, and obtained the best inertia weight strategy without mutation, the best mutation strategy without inertia weight and the best combination of the two. Then, we used the three PSO algorithms to optimize the parameters of support vector machine in the classification of biomedical data. We found that the PSO algorithm with the combination of inertia weight and mutation strategy and the inertia weight strategy that we proposed could improve the classification accuracy. This study has an important reference value for the prediction of clinical diseases.

19.
J Biomed Inform ; 63: 379-389, 2016 10.
Article in English | MEDLINE | ID: mdl-27593166

ABSTRACT

In the era of digitalization, information retrieval (IR), which retrieves and ranks documents from large collections according to users' search queries, has been popularly applied in the biomedical domain. Building patient cohorts using electronic health records (EHRs) and searching literature for topics of interest are some IR use cases. Meanwhile, natural language processing (NLP), such as tokenization or Part-Of-Speech (POS) tagging, has been developed for processing clinical documents or biomedical literature. We hypothesize that NLP can be incorporated into IR to strengthen the conventional IR models. In this study, we propose two NLP-empowered IR models, POS-BoW and POS-MRF, which incorporate automatic POS-based term weighting schemes into bag-of-word (BoW) and Markov Random Field (MRF) IR models, respectively. In the proposed models, the POS-based term weights are iteratively calculated by utilizing a cyclic coordinate method where golden section line search algorithm is applied along each coordinate to optimize the objective function defined by mean average precision (MAP). In the empirical experiments, we used the data sets from the Medical Records track in Text REtrieval Conference (TREC) 2011 and 2012 and the Genomics track in TREC 2004. The evaluation on TREC 2011 and 2012 Medical Records tracks shows that, for the POS-BoW models, the mean improvement rates for IR evaluation metrics, MAP, bpref, and P@10, are 10.88%, 4.54%, and 3.82%, compared to the BoW models; and for the POS-MRF models, these rates are 13.59%, 8.20%, and 8.78%, compared to the MRF models. Additionally, we experimentally verify that the proposed weighting approach is superior to the simple heuristic and frequency based weighting approaches, and validate our POS category selection. Using the optimal weights calculated in this experiment, we tested the proposed models on the TREC 2004 Genomics track and obtained average of 8.63% and 10.04% improvement rates for POS-BoW and POS-MRF, respectively. These significant improvements verify the effectiveness of leveraging POS tagging for biomedical IR tasks.


Subject(s)
Electronic Health Records , Information Storage and Retrieval , Natural Language Processing , Algorithms , Humans , Linguistics
20.
BMC Bioinformatics ; 17 Suppl 7: 238, 2016 Jul 25.
Article in English | MEDLINE | ID: mdl-27455377

ABSTRACT

BACKGROUND: Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. RESULTS: The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. CONCLUSIONS: We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.


Subject(s)
Algorithms , Information Systems/standards , Semantics
SELECTION OF CITATIONS
SEARCH DETAIL
...