Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
J Med Internet Res ; 24(5): e32845, 2022 05 11.
Article in English | MEDLINE | ID: mdl-35544299

ABSTRACT

Organizational, administrative, and educational challenges in establishing and sustaining biomedical data science infrastructures lead to the inefficient use of Research Patient Data Repositories (RPDRs). The challenges, including but not limited to deployment, sustainability, cost optimization, collaboration, governance, security, rapid response, reliability, stability, scalability, and convenience, restrict each other and may not be naturally alleviated through traditional hardware upgrades or protocol enhancements. This article attempts to borrow data science thinking and practices in the business realm, which we call the data industry viewpoint, to improve RPDRs.


Subject(s)
Databases as Topic , Humans
2.
Int J Qual Health Care ; 33(3)2021 Sep 25.
Article in English | MEDLINE | ID: mdl-34508642

ABSTRACT

Big data epidemiology facilitates pandemic response by providing data-driven insights by utilizing big data tools that differ from traditional methods. Aspects regarding 'garbage in, garbage out', such as insufficient data, inaccessibility of data, missing data, uncertainty in handling data and bias in analysis or common findings are addressable by combining techniques across disciplines.


Subject(s)
COVID-19 , Pandemics , Big Data , Epidemiologic Studies , Humans , SARS-CoV-2
3.
Eur Radiol ; 31(6): 3864-3873, 2021 Jun.
Article in English | MEDLINE | ID: mdl-33372243

ABSTRACT

OBJECTIVES: Based on the current clinical routine, we aimed to develop a novel deep learning model to distinguish coronavirus disease 2019 (COVID-19) pneumonia from other types of pneumonia and validate it with a real-world dataset (RWD). METHODS: A total of 563 chest CT scans of 380 patients (227/380 were diagnosed with COVID-19 pneumonia) from 5 hospitals were collected to train our deep learning (DL) model. Lung regions were extracted by U-net, then transformed and fed to pre-trained ResNet-50-based IDANNet (Identification and Analysis of New covid-19 Net) to produce a diagnostic probability. Fivefold cross-validation was employed to validate the application of our model. Another 318 scans of 316 patients (243/316 were diagnosed with COVID-19 pneumonia) from 2 other hospitals were enrolled prospectively as the RWDs to testify our DL model's performance and compared it with that from 3 experienced radiologists. RESULTS: A three-dimensional DL model was successfully established. The diagnostic threshold to differentiate COVID-19 and non-COVID-19 pneumonia was 0.685 with an AUC of 0.906 (95% CI: 0.886-0.913) in the internal validation group. In the RWD cohort, our model achieved an AUC of 0.868 (95% CI: 0.851-0.876) with the sensitivity of 0.811 and the specificity of 0.822, non-inferior to the performance of 3 experienced radiologists, suggesting promising clinical practical usage. CONCLUSIONS: The established DL model was able to achieve accurate identification of COVID-19 pneumonia from other suspected ones in the real-world situation, which could become a reliable tool in clinical routine. KEY POINTS: • In an internal validation set, our DL model achieved the best performance to differentiate COVID-19 from non-COVID-19 pneumonia with a sensitivity of 0.836, a specificity of 0.800, and an AUC of 0.906 (95% CI: 0.886-0.913) when the threshold was set at 0.685. • In the prospective RWD cohort, our DL diagnostic model achieved a sensitivity of 0.811, a specificity of 0.822, and AUC of 0.868 (95% CI: 0.851-0.876), non-inferior to the performance of 3 experienced radiologists. • The attention heatmaps were fully generated by the model without additional manual annotation and the attention regions were highly aligned with the ROIs acquired by human radiologists for diagnosis.


Subject(s)
COVID-19 , Deep Learning , Pneumonia, Viral , Humans , Neural Networks, Computer , Pneumonia, Viral/diagnostic imaging , Prospective Studies , SARS-CoV-2 , Tomography, X-Ray Computed
4.
J Am Med Inform Assoc ; 27(7): 1139-1141, 2020 07 01.
Article in English | MEDLINE | ID: mdl-32311047

ABSTRACT

Data change the game in terms of how we respond to pandemics. Global data on disease trajectories and the effectiveness and economic impact of different social distancing measures are essential to facilitate effective local responses to pandemics. COVID-19 data flowing across geographic borders are extremely useful to public health professionals for many purposes such as accelerating the pharmaceutical development pipeline, and for making vital decisions about intensive care unit rooms, where to build temporary hospitals, or where to boost supplies of personal protection equipment, ventilators, or diagnostic tests. Sharing data enables quicker dissemination and validation of pharmaceutical innovations, as well as improved knowledge of what prevention and mitigation measures work. Even if physical borders around the globe are closed, it is crucial that data continues to transparently flow across borders to enable a data economy to thrive, which will promote global public health through global cooperation and solidarity.


Subject(s)
Betacoronavirus , Coronavirus Infections/epidemiology , Health Information Interoperability , Information Dissemination , Pandemics/statistics & numerical data , Pneumonia, Viral/epidemiology , COVID-19 , Humans , Internationality , SARS-CoV-2
5.
BMC Med Genomics ; 12(Suppl 10): 186, 2019 12 23.
Article in English | MEDLINE | ID: mdl-31865913

ABSTRACT

BACKGROUND: It is significant to identificate complex biological mechanisms of various diseases in biomedical research. Recently, the growing generation of tremendous amount of data in genomics, epigenomics, metagenomics, proteomics, metabolomics, nutriomics, etc., has resulted in the rise of systematic biological means of exploring complex diseases. However, the disparity between the production of the multiple data and our capability of analyzing data has been broaden gradually. Furthermore, we observe that networks can represent many of the above-mentioned data, and founded on the vector representations learned by network embedding methods, entities which are in close proximity but at present do not actually possess direct links are very likely to be related, therefore they are promising candidate subjects for biological investigation. RESULTS: We incorporate six public biological databases to construct a heterogeneous biological network containing three categories of entities (i.e., genes, diseases, miRNAs) and multiple types of edges (i.e., the known relationships). To tackle the inherent heterogeneity, we develop a heterogeneous network embedding model for mapping the network into a low dimensional vector space in which the relationships between entities are preserved well. And in order to assess the effectiveness of our method, we conduct gene-disease as well as miRNA-disease associations predictions, results of which show the superiority of our novel method over several state-of-the-arts. Furthermore, many associations predicted by our method are verified in the latest real-world dataset. CONCLUSIONS: We propose a novel heterogeneous network embedding method which can adequately take advantage of the abundant contextual information and structures of heterogeneous network. Moreover, we illustrate the performance of the proposed method on directing studies in biology, which can assist in identifying new hypotheses in biological investigation.


Subject(s)
Computational Biology/methods , Disease , Humans
6.
Bioinformatics ; 24(20): 2416-7, 2008 Oct 15.
Article in English | MEDLINE | ID: mdl-18713790

ABSTRACT

Investigation of transcription factors (TFs) and their downstream regulated genes (targets) is a significant issue in post-genome era, which can provide a brand new vision for some vital biological process. However, information of TFs and their targets in mammalian is far from sufficient. Here, we developed an integrated TF platform (ITFP), which included abundant TFs and their targets of mammalian. In current release, ITFP includes 4105 putative TFs and 69 496 potential TF-target pairs for human, 3134 putative TFs and 37 040 potential TF-target pairs for mouse, and 1114 putative TFs and 18 055 potential TF-target pairs for rat. In short, ITFP will serve as an important resource for the research community of transcription and provide strong support for regulatory network study.


Subject(s)
Computational Biology/methods , Transcription Factors/metabolism , Animals , Databases, Protein , Gene Regulatory Networks , Genome , Humans , Mammals/genetics , Mice , Rabbits , Transcription Factors/chemistry , Transcription Factors/genetics , User-Computer Interface
7.
BMC Bioinformatics ; 9: 282, 2008 Jun 16.
Article in English | MEDLINE | ID: mdl-18554421

ABSTRACT

BACKGROUND: Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand. RESULTS: The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL). CONCLUSION: The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.


Subject(s)
Algorithms , Artificial Intelligence , Open Reading Frames/genetics , Pattern Recognition, Automated/methods , Sequence Analysis, DNA/methods , Transcription Factors/chemistry , Transcription Factors/genetics , Base Sequence , Molecular Sequence Data
8.
Acta Biochim Biophys Sin (Shanghai) ; 36(5): 365-70, 2004 May.
Article in English | MEDLINE | ID: mdl-15156279

ABSTRACT

Semantic search is a key issue in integration of heterogeneous biological databases. In this paper, we present a methodology for implementing semantic search in BioDW, an integrated biological data warehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entries from BioDW data sources with GO, and the semantic similarity table to record similarity scores derived from any pair of GO terms. Based on the two tables, multifarious ways for semantic search are provided and the corresponding entries in heterogeneous biological databases in semantic terms can be expediently searched.


Subject(s)
Database Management Systems , Databases, Genetic , Information Storage and Retrieval/methods , Natural Language Processing , Semantics , Vocabulary, Controlled , Databases, Factual , Documentation , Phylogeny , Systems Integration , Terminology as Topic
SELECTION OF CITATIONS
SEARCH DETAIL
...