Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 11.075
Filter
1.
Syst Rev ; 13(1): 135, 2024 May 16.
Article in English | MEDLINE | ID: mdl-38755704

ABSTRACT

We aimed to compare the concordance of information extracted and the time taken between a large language model (OpenAI's GPT-3.5 Turbo via API) against conventional human extraction methods in retrieving information from scientific articles on diabetic retinopathy (DR). The extraction was done using GPT3.5 Turbo as of October 2023. OpenAI's GPT-3.5 Turbo significantly reduced the time taken for extraction. Concordance was highest at 100% for the extraction of the country of study, 64.7% for significant risk factors of DR, 47.1% for exclusion and inclusion criteria, and lastly 41.2% for odds ratio (OR) and 95% confidence interval (CI). The concordance levels seemed to indicate the complexity associated with each prompt. This suggests that OpenAI's GPT-3.5 Turbo may be adopted to extract simple information that is easily located in the text, leaving more complex information to be extracted by the researcher. It is crucial to note that the foundation model is constantly improving significantly with new versions being released quickly. Subsequent work can focus on retrieval-augmented generation (RAG), embedding, chunking PDF into useful sections, and prompting to improve the accuracy of extraction.


Subject(s)
Diabetic Retinopathy , Humans , Information Storage and Retrieval/methods , Natural Language Processing , Data Mining/methods
2.
Med Ref Serv Q ; 43(2): 130-151, 2024.
Article in English | MEDLINE | ID: mdl-38722608

ABSTRACT

While LibGuides are widely used in libraries to curate resources for users, there are a number of common problems, including maintenance, design and layout, and curating relevant and concise content. One health sciences library sought to improve our LibGuides, consulting usage statistics, user feedback, and recommendations from the literature to inform decision making. Our team recommended a number of changes to make LibGuides more usable, including creating robust maintenance and content guidelines, scheduling regular updates, and various changes to the format of the guides themselves to make them more user-friendly.


Subject(s)
Libraries, Medical , Organizational Case Studies , Libraries, Medical/organization & administration , Humans , Information Storage and Retrieval/methods
3.
Med Ref Serv Q ; 43(2): 196-202, 2024.
Article in English | MEDLINE | ID: mdl-38722609

ABSTRACT

Named entity recognition (NER) is a powerful computer system that utilizes various computing strategies to extract information from raw text input, since the early 1990s. With rapid advancement in AI and computing, NER models have gained significant attention and been serving as foundational tools across numerus professional domains to organize unstructured data for research and practical applications. This is particularly evident in the medical and healthcare fields, where NER models are essential in efficiently extract critical information from complex documents that are challenging for manual review. Despite its successes, NER present limitations in fully comprehending natural language nuances. However, the development of more advanced and user-friendly models promises to improve work experiences of professional users significantly.


Subject(s)
Information Storage and Retrieval , Natural Language Processing , Information Storage and Retrieval/methods , Humans , Artificial Intelligence
5.
Med Ref Serv Q ; 43(2): 182-190, 2024.
Article in English | MEDLINE | ID: mdl-38722607

ABSTRACT

Created by the NIH in 2015, the Common Data Elements (CDE) Repository provides free online access to search and use Common Data Elements. This tool helps to ensure consistent data collection, saves time and resources, and ultimately improves the accuracy of and interoperability among datasets. The purpose of this column is to provide an overview of the database, discuss why it is important for researchers and relevant for health sciences librarians, and review the basic layout of the website, including sample searches that will demonstrate how it can be used.


Subject(s)
Common Data Elements , United States , Humans , Databases, Factual , Information Storage and Retrieval/methods , National Institutes of Health (U.S.)
6.
J Med Internet Res ; 26: e52655, 2024 May 30.
Article in English | MEDLINE | ID: mdl-38814687

ABSTRACT

BACKGROUND: Since the beginning of the COVID-19 pandemic, >1 million studies have been collected within the COVID-19 Open Research Dataset, a corpus of manuscripts created to accelerate research against the disease. Their related abstracts hold a wealth of information that remains largely unexplored and difficult to search due to its unstructured nature. Keyword-based search is the standard approach, which allows users to retrieve the documents of a corpus that contain (all or some of) the words in a target list. This type of search, however, does not provide visual support to the task and is not suited to expressing complex queries or compensating for missing specifications. OBJECTIVE: This study aims to consider small graphs of concepts and exploit them for expressing graph searches over existing COVID-19-related literature, leveraging the increasing use of graphs to represent and query scientific knowledge and providing a user-friendly search and exploration experience. METHODS: We considered the COVID-19 Open Research Dataset corpus and summarized its content by annotating the publications' abstracts using terms selected from the Unified Medical Language System and the Ontology of Coronavirus Infectious Disease. Then, we built a co-occurrence network that includes all relevant concepts mentioned in the corpus, establishing connections when their mutual information is relevant. A sophisticated graph query engine was built to allow the identification of the best matches of graph queries on the network. It also supports partial matches and suggests potential query completions using shortest paths. RESULTS: We built a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships; the GRAPH-SEARCH interface allows users to explore the network by formulating or adapting graph queries; it produces a bibliography of publications, which are globally ranked; and each publication is further associated with the specific parts of the query that it explains, thereby allowing the user to understand each aspect of the matching. CONCLUSIONS: Our approach supports the process of query formulation and evidence search upon a large text corpus; it can be reapplied to any scientific domain where documents corpora and curated ontologies are made available.


Subject(s)
Algorithms , COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Humans , Pandemics , Information Storage and Retrieval/methods , Biomedical Research/methods , Unified Medical Language System , Search Engine
7.
BMC Med Inform Decis Mak ; 24(1): 109, 2024 Apr 25.
Article in English | MEDLINE | ID: mdl-38664792

ABSTRACT

BACKGROUND: A blockchain can be described as a distributed ledger database where, under a consensus mechanism, data are permanently stored in records, called blocks, linked together with cryptography. Each block contains a cryptographic hash function of the previous block, a timestamp, and transaction data, which are permanently stored in thousands of nodes and never altered. This provides a potential real-world application for generating a permanent, decentralized record of scientific data, taking advantage of blockchain features such as timestamping and immutability. IMPLEMENTATION: Here, we propose INNBC DApp, a Web3 decentralized application providing a simple front-end user interface connected with a smart contract for recording scientific data on a modern, proof-of-stake (POS) blockchain such as BNB Smart Chain. Unlike previously proposed blockchain tools that only store a hash of the data on-chain, here the data are stored fully on-chain within the transaction itself as "transaction input data", with a true decentralized storage solution. In addition to plain text, the DApp can record various types of files, such as documents, images, audio, and video, by using Base64 encoding. In this study, we describe how to use the DApp and perform real-world transactions storing different kinds of data from previously published research articles, describing the advantages and limitations of using such a technology, analyzing the cost in terms of transaction fees, and discussing possible use cases. RESULTS: We have been able to store several different types of data on the BNB Smart Chain: raw text, documents, images, audio, and video. Notably, we stored several complete research articles at a reasonable cost. We found a limit of 95KB for each single file upload. Considering that Base64 encoding increases file size by approximately 33%, this provides us with a theoretical limit of 126KB. We successfully overcome this limitation by splitting larger files into smaller chunks and uploading them as multi-volume archives. Additionally, we propose AES encryption to protect sensitive data. Accordingly, we show that it is possible to include enough data to be useful for storing and sharing scientific documents and images on the blockchain at a reasonable cost for the users. CONCLUSION: INNBC DApp represents a real use case for blockchain technology in decentralizing biomedical data storage and sharing, providing us with features such as immutability, timestamp, and identity that can be used to ensure permanent availability of the data and to provide proof-of-existence as well as to protect authorship, a freely available decentralized science (DeSci) tool aiming to help bring mass adoption of blockchain technology among the scientific community.


Subject(s)
Blockchain , Humans , Information Storage and Retrieval/methods , Computer Security/standards
8.
J Biomed Semantics ; 15(1): 3, 2024 Apr 23.
Article in English | MEDLINE | ID: mdl-38654304

ABSTRACT

BACKGROUND: Systematic reviews of Randomized Controlled Trials (RCTs) are an important part of the evidence-based medicine paradigm. However, the creation of such systematic reviews by clinical experts is costly as well as time-consuming, and results can get quickly outdated after publication. Most RCTs are structured based on the Patient, Intervention, Comparison, Outcomes (PICO) framework and there exist many approaches which aim to extract PICO elements automatically. The automatic extraction of PICO information from RCTs has the potential to significantly speed up the creation process of systematic reviews and this way also benefit the field of evidence-based medicine. RESULTS: Previous work has addressed the extraction of PICO elements as the task of identifying relevant text spans or sentences, but without populating a structured representation of a trial. In contrast, in this work, we treat PICO elements as structured templates with slots to do justice to the complex nature of the information they represent. We present two different approaches to extract this structured information from the abstracts of RCTs. The first approach is an extractive approach based on our previous work that is extended to capture full document representations as well as by a clustering step to infer the number of instances of each template type. The second approach is a generative approach based on a seq2seq model that encodes the abstract describing the RCT and uses a decoder to infer a structured representation of a trial including its arms, treatments, endpoints and outcomes. Both approaches are evaluated with different base models on a manually annotated dataset consisting of RCT abstracts on an existing dataset comprising 211 annotated clinical trial abstracts for Type 2 Diabetes and Glaucoma. For both diseases, the extractive approach (with flan-t5-base) reached the best F 1 score, i.e. 0.547 ( ± 0.006 ) for type 2 diabetes and 0.636 ( ± 0.006 ) for glaucoma. Generally, the F 1 scores were higher for glaucoma than for type 2 diabetes and the standard deviation was higher for the generative approach. CONCLUSION: In our experiments, both approaches show promising performance extracting structured PICO information from RCTs, especially considering that most related work focuses on the far easier task of predicting less structured objects. In our experimental results, the extractive approach performs best in both cases, although the lead is greater for glaucoma than for type 2 diabetes. For future work, it remains to be investigated how the base model size affects the performance of both approaches in comparison. Although the extractive approach currently leaves more room for direct improvements, the generative approach might benefit from larger models.


Subject(s)
Abstracting and Indexing , Randomized Controlled Trials as Topic , Humans , Natural Language Processing , Information Storage and Retrieval/methods
9.
PLoS One ; 19(4): e0301277, 2024.
Article in English | MEDLINE | ID: mdl-38662720

ABSTRACT

Outsourcing data to remote cloud providers is becoming increasingly popular amongst organizations and individuals. A semi-trusted server uses Searchable Symmetric Encryption (SSE) to keep the search information under acceptable leakage levels whilst searching an encrypted database. A dynamic SSE (DSSE) scheme enables the adding and removing of documents by performing update queries, where some information is leaked to the server each time a record is added or removed. The complexity of structures and cryptographic primitives in most existing DSSE schemes makes them inefficient, in terms of storage, and query requests generate overhead costs on the Smart Device Client (SDC) side. Achieving constant storage cost for SDCs enhances the viability, efficiency, and easy user experience of smart devices, promoting their widespread adoption in various applications while upholding robust privacy and security standards. DSSE schemes must address two important privacy requirements: forward and backward privacy. Due to the increasing number of keywords, the cost of storage on the client side is also increasing at a linear rate. This article introduces an innovative, secure, and lightweight Dynamic Searchable Symmetric Encryption (DSSE) scheme, ensuring Type-II backward and forward privacy without incurring ongoing storage costs and high-cost query generation for the SDC. The proposed scheme, based on an inverted index structure, merges the hash table with linked nodes, linking encrypted keywords in all hash tables. Achieving a one-time O(1) storage cost without keyword counters on the SDC side, the scheme enhances security by generating a fresh key for each update. Experimental results show low-cost query generation on the SDC side (6,460 nanoseconds), making it compatible with resource-limited devices. The scheme outperforms existing ones, reducing server-side search costs significantly.


Subject(s)
Computer Security , Humans , Cloud Computing , Information Storage and Retrieval/methods , Algorithms , Privacy
10.
Stud Health Technol Inform ; 313: 74-80, 2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38682508

ABSTRACT

While adherence to clinical guidelines improves the quality and consistency of care, personalized healthcare also requires a deep understanding of individual disease models and treatment plans. The structured preparation of medical routine data in a certain clinical context, e.g. a treatment pathway outlined in a medical guideline, is currently a challenging task. Medical data is often stored in diverse formats and systems, and the relevant clinical knowledge defining the context is not available in machine-readable formats. We present an approach to extract information from medical free text documentation by using structured clinical knowledge to guide information extraction into a structured and encoded format, overcoming the known challenges for natural language processing algorithms. Preliminary results have been encouraging, as one of our methods managed to extract 100% of all data-points with 85% accuracy in details. These advancements show the potential of our approach to effectively use unstructured clinical data to elevate the quality of patient care and reduce the workload of medical personnel.


Subject(s)
Electronic Health Records , Natural Language Processing , Humans , Data Mining/methods , Information Storage and Retrieval/methods , Algorithms
11.
Stud Health Technol Inform ; 313: 198-202, 2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38682530

ABSTRACT

Secondary use of clinical health data implies a prior integration of mostly heterogenous and multidimensional data sets. A clinical data warehouse addresses the technological and organizational framework conditions required for this, by making any data available for analysis. However, users of a data warehouse often do not have a comprehensive overview of all available data and only know about their own data in their own systems - a situation which is also referred to as 'data siloed state'. This problem can be addressed and ultimately solved by implementation of a data catalog. Its core function is a search engine, which allows for searching the metadata collected from different data sources and thereby accessing all data there is. With this in mind, we conducted an explorative online market survey followed by vendor comparison as a pre-requisite for system selection of a data catalog. Assessment of vendor performance was based on seven predetermined and weighted selection criteria. Although three vendors achieved the highest score, results were lying closely together. Detailed investigations and test installations are needed for further narrowing down the selection process.


Subject(s)
Data Warehousing , Electronic Health Records , Search Engine , Humans , Information Storage and Retrieval/methods , Metadata
12.
Stud Health Technol Inform ; 313: 215-220, 2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38682533

ABSTRACT

BACKGROUND: Tele-ophthalmology is gaining recognition for its role in improving eye care accessibility via cloud-based solutions. The Google Cloud Platform (GCP) Healthcare API enables secure and efficient management of medical image data such as high-resolution ophthalmic images. OBJECTIVES: This study investigates cloud-based solutions' effectiveness in tele-ophthalmology, with a focus on GCP's role in data management, annotation, and integration for a novel imaging device. METHODS: Leveraging the Integrating the Healthcare Enterprise (IHE) Eye Care profile, the cloud platform was utilized as a PACS and integrated with the Open Health Imaging Foundation (OHIF) Viewer for image display and annotation capabilities for ophthalmic images. RESULTS: The setup of a GCP DICOM storage and the OHIF Viewer facilitated remote image data analytics. Prolonged loading times and relatively large individual image file sizes indicated system challenges. CONCLUSION: Cloud platforms have the potential to ease distributed data analytics, as needed for efficient tele-ophthalmology scenarios in research and clinical practice, by providing scalable and secure image management solutions.


Subject(s)
Cloud Computing , Ophthalmology , Telemedicine , Humans , Radiology Information Systems , Information Storage and Retrieval/methods
13.
Biotechniques ; 76(5): 203-215, 2024 May.
Article in English | MEDLINE | ID: mdl-38573592

ABSTRACT

In the absence of a DNA template, the ab initio production of long double-stranded DNA molecules of predefined sequences is particularly challenging. The DNA synthesis step remains a bottleneck for many applications such as functional assessment of ancestral genes, analysis of alternative splicing or DNA-based data storage. In this report we propose a fully in vitro protocol to generate very long double-stranded DNA molecules starting from commercially available short DNA blocks in less than 3 days using Golden Gate assembly. This innovative application allowed us to streamline the process to produce a 24 kb-long DNA molecule storing part of the Declaration of the Rights of Man and of the Citizen of 1789 . The DNA molecule produced can be readily cloned into a suitable host/vector system for amplification and selection.


Subject(s)
DNA , DNA/genetics , DNA/chemistry , Information Storage and Retrieval/methods , Humans , Base Sequence/genetics , Cloning, Molecular/methods
14.
Med Image Anal ; 95: 103163, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38626665

ABSTRACT

Large-scale digital whole slide image (WSI) datasets analysis have gained significant attention in computer-aided cancer diagnosis. Content-based histopathological image retrieval (CBHIR) is a technique that searches a large database for data samples matching input objects in both details and semantics, offering relevant diagnostic information to pathologists. However, the current methods are limited by the difficulty of gigapixels, the variable size of WSIs, and the dependence on manual annotations. In this work, we propose a novel histopathology language-image representation learning framework for fine-grained digital pathology cross-modal retrieval, which utilizes paired diagnosis reports to learn fine-grained semantics from the WSI. An anchor-based WSI encoder is built to extract hierarchical region features and a prompt-based text encoder is introduced to learn fine-grained semantics from the diagnosis reports. The proposed framework is trained with a multivariate cross-modal loss function to learn semantic information from the diagnosis report at both the instance level and region level. After training, it can perform four types of retrieval tasks based on the multi-modal database to support diagnostic requirements. We conducted experiments on an in-house dataset and a public dataset to evaluate the proposed method. Extensive experiments have demonstrated the effectiveness of the proposed method and its advantages to the present histopathology retrieval methods. The code is available at https://github.com/hudingyi/FGCR.


Subject(s)
Semantics , Humans , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Machine Learning , Databases, Factual , Algorithms , Diagnosis, Computer-Assisted/methods
15.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38648049

ABSTRACT

MOTIVATION: As data storage challenges grow and existing technologies approach their limits, synthetic DNA emerges as a promising storage solution due to its remarkable density and durability advantages. While cost remains a concern, emerging sequencing and synthetic technologies aim to mitigate it, yet introduce challenges such as errors in the storage and retrieval process. One crucial task in a DNA storage system is clustering numerous DNA reads into groups that represent the original input strands. RESULTS: In this paper, we review different methods for evaluating clustering algorithms and introduce a novel clustering algorithm for DNA storage systems, named Gradual Hash-based clustering (GradHC). The primary strength of GradHC lies in its capability to cluster with excellent accuracy various types of designs, including varying strand lengths, cluster sizes (including extremely small clusters), and different error ranges. Benchmark analysis demonstrates that GradHC is significantly more stable and robust than other clustering algorithms previously proposed for DNA storage, while also producing highly reliable clustering results. AVAILABILITY AND IMPLEMENTATION: https://github.com/bensdvir/GradHC.


Subject(s)
Algorithms , DNA , Sequence Analysis, DNA , DNA/chemistry , Cluster Analysis , Sequence Analysis, DNA/methods , Software , Information Storage and Retrieval/methods
16.
Pol Arch Intern Med ; 134(5)2024 May 28.
Article in English | MEDLINE | ID: mdl-38501989

ABSTRACT

INTRODUCTION: Electronic health records (EHRs) contain data valuable for clinical research. However, they are in textual format and require manual encoding to databases, which is a lengthy and costly process. Natural language processing (NLP) is a computational technique that allows for text analysis. OBJECTIVES: Our study aimed to demonstrate a practical use case of NLP for a large retrospective study cohort characterization and comparison with human retrieval. PATIENTS AND METHODS: Anonymized discharge documentation of 10 314 patients from a cardiology tertiary care department was analyzed for inclusion in the CRAFT registry (Multicenter Experience in Atrial Fibrillation Patients Treated with Oral Anticoagulants; NCT02987062). Extensive clinical characteristics regarding concomitant diseases, medications, daily drug dosages, and echocardiography were collected manually and through NLP. RESULTS: There were 3030 and 3029 patients identified by human and NLP­based approaches, respectively, reflecting 99.93% accuracy of NLP in detecting AF. Comprehensive baseline patient characteristics by NLP was faster than human analysis (3 h and 15 min vs 71 h and 12 min). The calculated CHA2DS2VASc and HAS­BLED scores based on both methods did not differ (human vs NLP; median [interquartile range], 3 [2-5] vs 3 [2-5]; P = 0.74 and 1 [1-2] vs 1 [1-2]; P = 0.63, respectively). For most data, an almost perfect agreement between NLP- and human-retrieved characteristics was found; daily dosage identification was the least accurate NLP feature. Similar conclusions on cohort characteristics would be made; however, daily dosage detection for some drug groups would require additional human validation in the NLP­based cohort. CONCLUSIONS: NLP utilization in EHRs may accelerate data acquisition and provide accurate information for retrospective studies.


Subject(s)
Atrial Fibrillation , Electronic Health Records , Natural Language Processing , Humans , Female , Male , Aged , Retrospective Studies , Middle Aged , Atrial Fibrillation/drug therapy , Information Storage and Retrieval/methods , Anticoagulants/therapeutic use
17.
J Am Med Inform Assoc ; 31(6): 1356-1366, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38447590

ABSTRACT

OBJECTIVE: This study evaluates an AI assistant developed using OpenAI's GPT-4 for interpreting pharmacogenomic (PGx) testing results, aiming to improve decision-making and knowledge sharing in clinical genetics and to enhance patient care with equitable access. MATERIALS AND METHODS: The AI assistant employs retrieval-augmented generation (RAG), which combines retrieval and generative techniques, by harnessing a knowledge base (KB) that comprises data from the Clinical Pharmacogenetics Implementation Consortium (CPIC). It uses context-aware GPT-4 to generate tailored responses to user queries from this KB, further refined through prompt engineering and guardrails. RESULTS: Evaluated against a specialized PGx question catalog, the AI assistant showed high efficacy in addressing user queries. Compared with OpenAI's ChatGPT 3.5, it demonstrated better performance, especially in provider-specific queries requiring specialized data and citations. Key areas for improvement include enhancing accuracy, relevancy, and representative language in responses. DISCUSSION: The integration of context-aware GPT-4 with RAG significantly enhanced the AI assistant's utility. RAG's ability to incorporate domain-specific CPIC data, including recent literature, proved beneficial. Challenges persist, such as the need for specialized genetic/PGx models to improve accuracy and relevancy and addressing ethical, regulatory, and safety concerns. CONCLUSION: This study underscores generative AI's potential for transforming healthcare provider support and patient accessibility to complex pharmacogenomic information. While careful implementation of large language models like GPT-4 is necessary, it is clear that they can substantially improve understanding of pharmacogenomic data. With further development, these tools could augment healthcare expertise, provider productivity, and the delivery of equitable, patient-centered healthcare services.


Subject(s)
Pharmacogenetics , Precision Medicine , Humans , Artificial Intelligence , Knowledge Bases , Information Storage and Retrieval/methods , Pharmacogenomic Testing
18.
J Am Med Inform Assoc ; 31(6): 1380-1387, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38531680

ABSTRACT

OBJECTIVES: This study focuses on refining temporal relation extraction within medical documents by introducing an innovative bimodal architecture. The overarching goal is to enhance our understanding of narrative processes in the medical domain, particularly through the analysis of extensive reports and notes concerning patient experiences. MATERIALS AND METHODS: Our approach involves the development of a bimodal architecture that seamlessly integrates information from both text documents and knowledge graphs. This integration serves to infuse common knowledge about events into the temporal relation extraction process. Rigorous testing was conducted on diverse clinical datasets, emulating real-world scenarios where the extraction of temporal relationships is paramount. RESULTS: The performance of our proposed bimodal architecture was thoroughly evaluated across multiple clinical datasets. Comparative analyses demonstrated its superiority over existing methods reliant solely on textual information for temporal relation extraction. Notably, the model showcased its effectiveness even in scenarios where not provided with additional information. DISCUSSION: The amalgamation of textual data and knowledge graph information in our bimodal architecture signifies a notable advancement in the field of temporal relation extraction. This approach addresses the critical need for a more profound understanding of narrative processes in medical contexts. CONCLUSION: In conclusion, our study introduces a pioneering bimodal architecture that harnesses the synergy of text and knowledge graph data, exhibiting superior performance in temporal relation extraction from medical documents. This advancement holds significant promise for improving the comprehension of patients' healthcare journeys and enhancing the overall effectiveness of extracting temporal relationships in complex medical narratives.


Subject(s)
Electronic Health Records , Natural Language Processing , Humans , Data Mining/methods , Narration , Machine Learning , Datasets as Topic , Information Storage and Retrieval/methods
19.
Cell Rep ; 43(4): 113699, 2024 Apr 23.
Article in English | MEDLINE | ID: mdl-38517891

ABSTRACT

Over the past decade, the rapid development of DNA synthesis and sequencing technologies has enabled preliminary use of DNA molecules for digital data storage, overcoming the capacity and persistence bottlenecks of silicon-based storage media. DNA storage has now been fully accomplished in the laboratory through existing biotechnology, which again demonstrates the viability of carbon-based storage media. However, the high cost and latency of data reconstruction pose challenges that hinder the practical implementation of DNA storage beyond the laboratory. In this article, we review existing advanced DNA storage methods, analyze the characteristics and performance of biotechnological approaches at various stages of data writing and reading, and discuss potential factors influencing DNA storage from the perspective of data reconstruction.


Subject(s)
DNA , DNA/metabolism , Information Storage and Retrieval/methods , Humans
20.
J Clin Epidemiol ; 169: 111300, 2024 May.
Article in English | MEDLINE | ID: mdl-38402998

ABSTRACT

OBJECTIVES: To determine whether clinical trial register (CTR) searches can accurately identify a greater number of completed randomized clinical trials (RCTs) than electronic bibliographic database (EBD) searches for systematic reviews of interventions, and to quantify the number of eligible ongoing trials. STUDY DESIGN AND SETTING: We performed an evaluation study and based our search for RCTs on the eligibility criteria of a systematic review that focused on the underrepresentation of people with chronic kidney disease in cardiovascular RCTs. We conducted a combined search of ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform through the Cochrane Central Register of Controlled Trials to identify eligible RCTs registered up to June 1, 2023. We searched Cochrane Central Register of Controlled Trials, EMBASE, and MEDLINE for publications of eligible RCTs published up to June 5, 2023. Finally, we compared the search results to determine the extent to which the two sources identified the same RCTs. RESULTS: We included 92 completed RCTs. Of these, 81 had results available. Sixty-six completed RCTs with available results were identified by both sources (81% agreement [95% CI: 71-88]). We identified seven completed RCTs with results exclusively by CTR search (9% [95% CI: 4-17]) and eight exclusively by EBD search (10% [95% CI: 5-18]). Eleven RCTs were completed but lacked results (four identified by both sources (36% [95% CI: 15-65]), one exclusively by EBD search (9% [95% CI: 1-38]), and six exclusively by CTR search (55% [95% CI: 28-79])). Also, we identified 42 eligible ongoing RCTs: 16 by both sources (38% [95% CI: 25-53]) and 26 exclusively by CTR search (62% [95% CI: 47-75]). Lastly, we identified four RCTs of unknown status by both sources. CONCLUSION: CTR searches identify a greater number of completed RCTs than EBD searches. Both searches missed some included RCTs. Based on our case study, researchers (eg, information specialists, systematic reviewers) aiming to identify all available RCTs should continue to search both sources. Once the barriers to performing CTR searches alone are targeted, CTR searches may be a suitable alternative.


Subject(s)
Databases, Bibliographic , Randomized Controlled Trials as Topic , Registries , Systematic Reviews as Topic , Randomized Controlled Trials as Topic/statistics & numerical data , Randomized Controlled Trials as Topic/standards , Randomized Controlled Trials as Topic/methods , Humans , Systematic Reviews as Topic/methods , Databases, Bibliographic/statistics & numerical data , Registries/statistics & numerical data , Information Storage and Retrieval/methods , Information Storage and Retrieval/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL
...