Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 206
Filter
1.
Sci Rep ; 12(1): 13878, 2022 08 16.
Article in English | MEDLINE | ID: mdl-35974033

ABSTRACT

Compound mixtures represent an alternative, additional approach to DNA and synthetic sequence-defined macromolecules in the field of non-conventional molecular data storage, which may be useful depending on the target application. Here, we report a fast and efficient method for information storage in molecular mixtures by the direct use of commercially available chemicals and thus, zero synthetic steps need to be performed. As a proof of principle, a binary coding language is used for encoding words in ASCII or black and white pixels of a bitmap. This way, we stored a 25 × 25-pixel QR code (625 bits) and a picture of the same size. Decoding of the written information is achieved via spectroscopic (1H NMR) or chromatographic (gas chromatography) analysis. In addition, for a faster and automated read-out of the data, we developed a decoding software, which also orders the data sets according to an internal "ordering" standard. Molecular keys or anticounterfeiting are possible areas of application for information-containing compound mixtures.


Subject(s)
Information Storage and Retrieval , Software , DNA/genetics , Datasets as Topic/statistics & numerical data , Information Storage and Retrieval/methods , Information Storage and Retrieval/standards , Magnetic Resonance Spectroscopy
2.
Genome Med ; 14(1): 18, 2022 02 21.
Article in English | MEDLINE | ID: mdl-35184750

ABSTRACT

BACKGROUND: Measuring host gene expression is a promising diagnostic strategy to discriminate bacterial and viral infections. Multiple signatures of varying size, complexity, and target populations have been described. However, there is little information to indicate how the performance of various published signatures compare to one another. METHODS: This systematic comparison of host gene expression signatures evaluated the performance of 28 signatures, validating them in 4589 subjects from 51 publicly available datasets. Thirteen COVID-specific datasets with 1416 subjects were included in a separate analysis. Individual signature performance was evaluated using the area under the receiving operating characteristic curve (AUC) value. Overall signature performance was evaluated using median AUCs and accuracies. RESULTS: Signature performance varied widely, with median AUCs ranging from 0.55 to 0.96 for bacterial classification and 0.69-0.97 for viral classification. Signature size varied (1-398 genes), with smaller signatures generally performing more poorly (P < 0.04). Viral infection was easier to diagnose than bacterial infection (84% vs. 79% overall accuracy, respectively; P < .001). Host gene expression classifiers performed more poorly in some pediatric populations (3 months-1 year and 2-11 years) compared to the adult population for both bacterial infection (73% and 70% vs. 82%, respectively; P < .001) and viral infection (80% and 79% vs. 88%, respectively; P < .001). We did not observe classification differences based on illness severity as defined by ICU admission for bacterial or viral infections. The median AUC across all signatures for COVID-19 classification was 0.80 compared to 0.83 for viral classification in the same datasets. CONCLUSIONS: In this systematic comparison of 28 host gene expression signatures, we observed differences based on a signature's size and characteristics of the validation population, including age and infection type. However, populations used for signature discovery did not impact performance, underscoring the redundancy among many of these signatures. Furthermore, differential performance in specific populations may only be observable through this type of large-scale validation.


Subject(s)
Bacterial Infections/diagnosis , Datasets as Topic/statistics & numerical data , Host-Pathogen Interactions/genetics , Transcriptome , Virus Diseases/diagnosis , Adult , Bacterial Infections/epidemiology , Bacterial Infections/genetics , Biomarkers/analysis , COVID-19/diagnosis , COVID-19/genetics , Child , Cohort Studies , Diagnosis, Differential , Gene Expression Profiling/statistics & numerical data , Genetic Association Studies/statistics & numerical data , Humans , Publications/statistics & numerical data , SARS-CoV-2/pathogenicity , Validation Studies as Topic , Virus Diseases/epidemiology , Virus Diseases/genetics
3.
PLoS One ; 17(2): e0262992, 2022.
Article in English | MEDLINE | ID: mdl-35139109

ABSTRACT

This paper presents a study on the dynamics of sentiment polarisation in the active online discussion communities formed around a controversial topic-immigration. Using a collection of tweets in the Swedish language from 2012 to 2019, we track the development of the communities and their sentiment polarisation trajectories over time and in the context of an exogenous shock represented by the European refugee crisis in 2015. To achieve the goal of the study, we apply methods of network and sentiment analysis to map users' interactions in the network communities and quantify users' sentiment polarities. The results of the analysis give little evidence for users' polarisation in the network and its communities, as well as suggest that the crisis had a limited effect on the polarisation dynamics on this social media platform. Yet, we notice a shift towards more negative tonality of users' sentiments after the crisis and discuss possible explanations for the above-mentioned observations.


Subject(s)
Aggression/psychology , Attitude , Refugees/psychology , Social Media/statistics & numerical data , Attitude/ethnology , Datasets as Topic/statistics & numerical data , Emigration and Immigration/statistics & numerical data , Group Processes , History, 20th Century , History, 21st Century , Humans , Public Opinion , Sentiment Analysis , Social Identification , Social Network Analysis , Sweden/epidemiology
4.
PLoS One ; 17(1): e0262463, 2022.
Article in English | MEDLINE | ID: mdl-35015791

ABSTRACT

We propose a simple anomaly detection method that is applicable to unlabeled time series data and is sufficiently tractable, even for non-technical entities, by using the density ratio estimation based on the state space model. Our detection rule is based on the ratio of log-likelihoods estimated by the dynamic linear model, i.e. the ratio of log-likelihood in our model to that in an over-dispersed model that we will call the NULL model. Using the Yahoo S5 data set and the Numenta Anomaly Benchmark data set, publicly available and commonly used benchmark data sets, we find that our method achieves better or comparable performance compared to the existing methods. The result implies that it is essential in time series anomaly detection to incorporate the specific information on time series data into the model. In addition, we apply the proposed method to unlabeled Web time series data, specifically, daily page view and average session duration data on an electronic commerce site that deals in insurance goods to show the applicability of our method to unlabeled real-world data. We find that the increase in page view caused by e-mail newsletter deliveries is less likely to contribute to completing an insurance contract. The result also suggests the importance of the simultaneous monitoring of more than one time series.


Subject(s)
Algorithms , Datasets as Topic/statistics & numerical data , Internet , Neural Networks, Computer , Statistics as Topic/methods , Humans , Search Engine , Time Factors
5.
BMJ Open ; 11(12): e054832, 2021 12 17.
Article in English | MEDLINE | ID: mdl-34921086

ABSTRACT

OBJECTIVE: Chronic cough (CC) is a debilitating respiratory symptom, now increasingly recognised as a discrete disease entity. This study evaluated the burden of CC in a primary care setting. DESIGN: Cross-sectional, retrospective cohort study. SETTING: Discover dataset from North West London, which links coded data from primary and secondary care. The index date depicted CC persisting for ≥8 weeks and was taken as a surrogate for date of CC diagnosis. PARTICIPANTS: Data were extracted for individuals aged ≥18 years with a cough persisting ≥8 weeks or cough remedy prescription, between Jan 2015 and Sep 2019. MAIN OUTCOME MEASURES: Demographic characteristics, comorbidities and service utilisation cost, including investigations performed and treatments prescribed were determined. RESULTS: CC was identified in 43 453 patients from a total cohort of 2 109 430 (2%). Median (IQR) age was 64 years (41-87). Among the cohort, 31% had no recorded comorbidities, 26% had been given a diagnosis of asthma, 17% chronic obstructive pulmonary disease, 12% rhinitis and 15% reflux. Prevalence of CC was greater in women (57%) and highest in the 65-74 year age range. There was an increase in the number of all investigations performed in the 12 months before and after the index date of CC diagnosis, and in particular for primary care chest X-ray and spirometry which increased from 6535 to 12 880 and from 5791 to 8720, respectively. This was accompanied by an increase in CC-associated healthcare utilisation costs. CONCLUSION: One-third of individuals had CC in the absence of associated comorbidities, highlighting the importance of recognising CC as a condition in its own right. Overall outpatient costs increased in the year after the CC index date for all comorbidities, but varied significantly with age. Linked primary-care datasets may enable earlier detection of individuals with CC for specialist clinic referral and targeted treatment.


Subject(s)
Cough , Primary Health Care , Adult , Aged , Aged, 80 and over , Chronic Disease , Comorbidity , Cost of Illness , Cough/diagnosis , Cough/epidemiology , Cross-Sectional Studies , Datasets as Topic/statistics & numerical data , Female , Humans , London/epidemiology , Male , Middle Aged , Primary Health Care/statistics & numerical data , Retrospective Studies , United Kingdom/epidemiology
6.
Genes (Basel) ; 12(11)2021 10 21.
Article in English | MEDLINE | ID: mdl-34828267

ABSTRACT

The Alzheimer's Disease Neuroimaging Initiative (ADNI) contains extensive patient measurements (e.g., magnetic resonance imaging [MRI], biometrics, RNA expression, etc.) from Alzheimer's disease (AD) cases and controls that have recently been used by machine learning algorithms to evaluate AD onset and progression. While using a variety of biomarkers is essential to AD research, highly correlated input features can significantly decrease machine learning model generalizability and performance. Additionally, redundant features unnecessarily increase computational time and resources necessary to train predictive models. Therefore, we used 49,288 biomarkers and 793,600 extracted MRI features to assess feature correlation within the ADNI dataset to determine the extent to which this issue might impact large scale analyses using these data. We found that 93.457% of biomarkers, 92.549% of the gene expression values, and 100% of MRI features were strongly correlated with at least one other feature in ADNI based on our Bonferroni corrected α (p-value ≤ 1.40754 × 10-13). We provide a comprehensive mapping of all ADNI biomarkers to highly correlated features within the dataset. Additionally, we show that significant correlation within the ADNI dataset should be resolved before performing bulk data analyses, and we provide recommendations to address these issues. We anticipate that these recommendations and resources will help guide researchers utilizing the ADNI dataset to increase model performance and reduce the cost and complexity of their analyses.


Subject(s)
Alzheimer Disease/diagnosis , Alzheimer Disease/genetics , Genetic Association Studies , Neuroimaging , Transcriptome , Alzheimer Disease/epidemiology , Alzheimer Disease/therapy , Biomarkers/analysis , Datasets as Topic/statistics & numerical data , Genetic Association Studies/statistics & numerical data , Humans , Machine Learning , Magnetic Resonance Imaging/methods , Neuroimaging/methods , Neuroimaging/statistics & numerical data
7.
BMC Med Imaging ; 21(1): 174, 2021 11 22.
Article in English | MEDLINE | ID: mdl-34809589

ABSTRACT

BACKGROUND: With the rapid spread of COVID-19 worldwide, quick screening for possible COVID-19 patients has become the focus of international researchers. Recently, many deep learning-based Computed Tomography (CT) image/X-ray image fast screening models for potential COVID-19 patients have been proposed. However, the existing models still have two main problems. First, most of the existing supervised models are based on pre-trained model parameters. The pre-training model needs to be constructed on a dataset with features similar to those in COVID-19 X-ray images, which limits the construction and use of the model. Second, the number of categories based on the X-ray dataset of COVID-19 and other pneumonia patients is usually imbalanced. In addition, the quality is difficult to distinguish, leading to non-ideal results with the existing model in the multi-class classification COVID-19 recognition task. Moreover, no researchers have proposed a COVID-19 X-ray image learning model based on unsupervised meta-learning. METHODS: This paper first constructed an unsupervised meta-learning model for fast screening of COVID-19 patients (UMLF-COVID). This model does not require a pre-trained model, which solves the limitation problem of model construction, and the proposed unsupervised meta-learning framework solves the problem of sample imbalance and sample quality. RESULTS: The UMLF-COVID model is tested on two real datasets, each of which builds a three-category and four-category model. And the experimental results show that the accuracy of the UMLF-COVID model is 3-10% higher than that of the existing models. CONCLUSION: In summary, we believe that the UMLF-COVID model is a good complement to COVID-19 X-ray fast screening models.


Subject(s)
COVID-19/diagnostic imaging , Deep Learning , Tomography, X-Ray Computed/methods , Algorithms , Datasets as Topic/statistics & numerical data , Humans , Image Processing, Computer-Assisted , SARS-CoV-2
8.
Adv Sci (Weinh) ; 8(24): e2102092, 2021 12.
Article in English | MEDLINE | ID: mdl-34723439

ABSTRACT

Combinational therapy is used for a long time in cancer treatment to overcome drug resistance related to monotherapy. Increased pharmacological data and the rapid development of deep learning methods have enabled the construction of models to predict and screen drug pairs. However, the size of drug libraries is restricted to hundreds to thousands of compounds. The ScaffComb framework, which aims to bridge the gaps in the virtual screening of drug combinations in large-scale databases, is proposed here. Inspired by phenotype-based drug design, ScaffComb integrates phenotypic information into molecular scaffolds, which can be used to screen the drug library and identify potent drug combinations. First, ScaffComb is validated using the US food and drug administration dataset and known drug combinations are successfully reidentified. Then, ScaffComb is applied to screen the ZINC and ChEMBL databases, which yield novel drug combinations and reveal an ability to discover new synergistic mechanisms. To our knowledge, ScaffComb is the first method to use phenotype-based virtual screening of drug combinations in large-scale chemical datasets.


Subject(s)
Antineoplastic Agents/therapeutic use , Datasets as Topic/statistics & numerical data , Drug Evaluation, Preclinical/methods , Neoplasms/drug therapy , Cell Line, Tumor , Drug Combinations , Drug Design , Humans , Phenotype
9.
J Clin Epidemiol ; 136: 136-145, 2021 08.
Article in English | MEDLINE | ID: mdl-33932483

ABSTRACT

BACKGROUND: Probabilistic linkage can link patients from different clinical databases without the need for personal information. If accurate linkage can be achieved, it would accelerate the use of linked datasets to address important clinical and public health questions. OBJECTIVE: We developed a step-by-step process for probabilistic linkage of national clinical and administrative datasets without personal information, and validated it against deterministic linkage using patient identifiers. STUDY DESIGN AND SETTING: We used electronic health records from the National Bowel Cancer Audit and Hospital Episode Statistics databases for 10,566 bowel cancer patients undergoing emergency surgery in the English National Health Service. RESULTS: Probabilistic linkage linked 81.4% of National Bowel Cancer Audit records to Hospital Episode Statistics, vs. 82.8% using deterministic linkage. No systematic differences were seen between patients that were and were not linked, and regression models for mortality and length of hospital stay according to patient and tumour characteristics were not sensitive to the linkage approach. CONCLUSION: Probabilistic linkage was successful in linking national clinical and administrative datasets for patients undergoing a major surgical procedure. It allows analysts outside highly secure data environments to undertake linkage while minimizing costs and delays, protecting data security, and maintaining linkage quality.


Subject(s)
Data Management/methods , Data Management/statistics & numerical data , Datasets as Topic/standards , Electronic Health Records/statistics & numerical data , Electronic Health Records/standards , Intestinal Neoplasms/epidemiology , Medical Record Linkage/methods , Datasets as Topic/statistics & numerical data , Humans , Intestinal Neoplasms/mortality , Intestinal Neoplasms/surgery , Models, Statistical , Reproducibility of Results , State Medicine , United Kingdom
10.
Methods Mol Biol ; 2284: 147-179, 2021.
Article in English | MEDLINE | ID: mdl-33835442

ABSTRACT

The main purpose of pathway or gene set analysis methods is to provide mechanistic insight into the large amount of data produced in high-throughput studies. These tools were developed for gene expression analyses, but they have been rapidly adopted by other high-throughput techniques, becoming one of the foremost tools of omics research.Currently, according to different biological questions and data, we can choose among a vast plethora of methods and databases. Here we use two published examples of RNAseq datasets to approach multiple analyses of gene sets, networks and pathways using freely available and frequently updated software. Finally, we conclude this chapter by presenting a survival pathway analysis of a multiomics dataset. During this overview of different methods, we focus on visualization, which is a fundamental but challenging step in this computational field.


Subject(s)
Computational Biology/methods , Datasets as Topic/statistics & numerical data , RNA-Seq/statistics & numerical data , Animals , Computational Biology/statistics & numerical data , Data Interpretation, Statistical , Databases, Genetic/statistics & numerical data , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Gene Regulatory Networks , Humans , Metabolic Networks and Pathways/genetics , RNA-Seq/methods , Software , Systems Integration , Transcriptome , Exome Sequencing/methods , Exome Sequencing/statistics & numerical data
11.
Methods Mol Biol ; 2284: 181-192, 2021.
Article in English | MEDLINE | ID: mdl-33835443

ABSTRACT

Analysis of circular RNA (circRNA) expression from RNA-Seq data can be performed with different algorithms and analysis pipelines, tools allowing the extraction of heterogeneous information on the expression of this novel class of RNAs. Computational pipelines were developed to facilitate the analysis of circRNA expression by leveraging different public tools in easy-to-use pipelines. This chapter describes the complete workflow for a computationally reproducible analysis of circRNA expression starting for a public RNA-Seq experiment. The main steps of circRNA prediction, annotation, classification, sequence reconstruction, quantification, and differential expression are illustrated.


Subject(s)
Computational Biology/methods , RNA, Circular/analysis , RNA-Seq/methods , Algorithms , Datasets as Topic/statistics & numerical data , Humans , RNA, Circular/chemistry , RNA, Circular/genetics , RNA, Untranslated/analysis , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , RNA-Seq/statistics & numerical data , Sequence Analysis, RNA , Software , Transcriptome
12.
Methods Mol Biol ; 2284: 331-342, 2021.
Article in English | MEDLINE | ID: mdl-33835451

ABSTRACT

Dimensionality reduction is a crucial step in essentially every single-cell RNA-sequencing (scRNA-seq) analysis. In this chapter, we describe the typical dimensionality reduction workflow that is used for scRNA-seq datasets, specifically highlighting the roles of principal component analysis, t-distributed stochastic neighborhood embedding, and uniform manifold approximation and projection in this setting. We particularly emphasize efficient computation; the software implementations used in this chapter can scale to datasets with millions of cells.


Subject(s)
Computational Biology/methods , RNA-Seq , Single-Cell Analysis , Algorithms , Animals , Data Analysis , Datasets as Topic/statistics & numerical data , Humans , Principal Component Analysis , RNA-Seq/methods , RNA-Seq/standards , RNA-Seq/statistics & numerical data , Single-Cell Analysis/methods , Single-Cell Analysis/standards , Single-Cell Analysis/statistics & numerical data , Software
13.
Methods Mol Biol ; 2284: 367-392, 2021.
Article in English | MEDLINE | ID: mdl-33835453

ABSTRACT

A complete RNA-Seq analysis involves the use of several different tools, with substantial software and computational requirements. The Galaxy platform simplifies the execution of such bioinformatics analyses by embedding the needed tools in its web interface, while also providing reproducibility. Here, we describe how to perform a reference-based RNA-Seq analysis using Galaxy, from data upload to visualization and functional enrichment analysis of differentially expressed genes.


Subject(s)
RNA-Seq/methods , Software , Animals , Computational Biology/methods , Data Analysis , Datasets as Topic/statistics & numerical data , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Reproducibility of Results , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , Exome Sequencing/methods , Exome Sequencing/statistics & numerical data
14.
J Clin Epidemiol ; 137: 83-91, 2021 09.
Article in English | MEDLINE | ID: mdl-33836256

ABSTRACT

OBJECTIVE: To illustrate how to evaluate the need of complex strategies for developing generalizable prediction models in large clustered datasets. STUDY DESIGN AND SETTING: We developed eight Cox regression models to estimate the risk of heart failure using a large population-level dataset. These models differed in the number of predictors, the functional form of the predictor effects (non-linear effects and interaction) and the estimation method (maximum likelihood and penalization). Internal-external cross-validation was used to evaluate the models' generalizability across the included general practices. RESULTS: Among 871,687 individuals from 225 general practices, 43,987 (5.5%) developed heart failure during a median follow-up time of 5.8 years. For discrimination, the simplest prediction model yielded a good concordance statistic, which was not much improved by adopting complex strategies. Between-practice heterogeneity in discrimination was similar in all models. For calibration, the simplest model performed satisfactorily. Although accounting for non-linear effects and interaction slightly improved the calibration slope, it also led to more heterogeneity in the observed/expected ratio. Similar results were found in a second case study involving patients with stroke. CONCLUSION: In large clustered datasets, prediction model studies may adopt internal-external cross-validation to evaluate the generalizability of competing models, and to identify promising modelling strategies.


Subject(s)
Cluster Analysis , Datasets as Topic/statistics & numerical data , Forecasting , Models, Statistical , Humans
15.
Nat Med ; 27(3): 546-559, 2021 03.
Article in English | MEDLINE | ID: mdl-33654293

ABSTRACT

Angiotensin-converting enzyme 2 (ACE2) and accessory proteases (TMPRSS2 and CTSL) are needed for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cellular entry, and their expression may shed light on viral tropism and impact across the body. We assessed the cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues. ACE2, TMPRSS2 and CTSL are coexpressed in specific subsets of respiratory epithelial cells in the nasal passages, airways and alveoli, and in cells from other organs associated with coronavirus disease 2019 (COVID-19) transmission or pathology. We performed a meta-analysis of 31 lung single-cell RNA-sequencing studies with 1,320,896 cells from 377 nasal, airway and lung parenchyma samples from 228 individuals. This revealed cell-type-specific associations of age, sex and smoking with expression levels of ACE2, TMPRSS2 and CTSL. Expression of entry factors increased with age and in males, including in airway secretory cells and alveolar type 2 cells. Expression programs shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues included genes that may mediate viral entry, key immune functions and epithelial-macrophage cross-talk, such as genes involved in the interleukin-6, interleukin-1, tumor necrosis factor and complement pathways. Cell-type-specific expression patterns may contribute to the pathogenesis of COVID-19, and our work highlights putative molecular pathways for therapeutic intervention.


Subject(s)
COVID-19/epidemiology , COVID-19/genetics , Host-Pathogen Interactions/genetics , SARS-CoV-2/physiology , Sequence Analysis, RNA/statistics & numerical data , Single-Cell Analysis/statistics & numerical data , Virus Internalization , Adult , Aged , Aged, 80 and over , Alveolar Epithelial Cells/metabolism , Alveolar Epithelial Cells/virology , Angiotensin-Converting Enzyme 2/genetics , Angiotensin-Converting Enzyme 2/metabolism , COVID-19/pathology , COVID-19/virology , Cathepsin L/genetics , Cathepsin L/metabolism , Datasets as Topic/statistics & numerical data , Demography , Female , Gene Expression Profiling/statistics & numerical data , Humans , Lung/metabolism , Lung/virology , Male , Middle Aged , Organ Specificity/genetics , Respiratory System/metabolism , Respiratory System/virology , Sequence Analysis, RNA/methods , Serine Endopeptidases/genetics , Serine Endopeptidases/metabolism , Single-Cell Analysis/methods
16.
J Dev Behav Pediatr ; 42(4): 322-330, 2021 05 01.
Article in English | MEDLINE | ID: mdl-33560045

ABSTRACT

ABSTRACT: Secondary analysis of existing large, national data sets is a powerful method to address many of the complex, key research questions in developmental behavioral pediatrics (DBP). Major advantages include decreasing the time needed to complete a study and reducing expenses associated with research by eliminating the need to collect primary data. It can also increase the generalizability of research and, with some data sets, provide national estimates that may form the basis for developing policy. However, few resources are available to direct researchers who seek to develop expertise in this area. This study aims to guide investigators with limited experience in this area who wish to improve their skills in performing secondary analysis of existing large data sets. This study provides direction on the steps to perform secondary analysis of existing data sets. It describes where and how data sets can be identified to answer questions of interest to DBP. Finally, it offers an overview of a number of data sets relevant to DBP.


Subject(s)
Child Behavior , Child Development , Datasets as Topic/statistics & numerical data , Child , Humans , Pediatrics
17.
Am J Hematol ; 96(5): E168-E171, 2021 05 01.
Article in English | MEDLINE | ID: mdl-33580969
18.
J Hum Genet ; 66(3): 297-306, 2021 Mar.
Article in English | MEDLINE | ID: mdl-32948839

ABSTRACT

Metabolic syndrome is a cluster of symptoms including excessive body fat and insulin resistance which may lead to obesity and type 2 diabetes (T2D). The physiological and pathological cross-talk between T2D and obesity is crucial and complex, meanwhile, the genetic connection between T2D and obesity is largely unknown. The purpose of this study is to identify pleiotropic SNPs and genes between these two associated conditions by applying genetic analysis incorporating pleiotropy and annotation (GPA) on two large genome-wide association studies (GWAS) data sets: a body mass index (BMI) data set containing 339,224 subjects and a T2D data set containing 110,452 subjects. In all, 5182 SNPs showed pleiotropy in both T2D and obesity. After further prioritization based on suggested local false discovery rates (FDR) by the GPA model, 2146 SNPs corresponding to 217 unique genes are significantly associated with both traits (FDR < 0.2), among which 187 are newly identified pleiotropic genes compare with original GWAS in individual traits. Subsequently, gene enrichment and pathway analyses highlighted several pleiotropic SNPs including rs849135 (FDR = 0.0002), rs2119812 (FDR = 0.0018), rs4506565 (FDR = 1.23E-08), rs1558902 (7.23E-10) and corresponding genes JAZF1, SYN2, TCF7L2, FTO which may play crucial rol5es in the etiology of both T2D and obesity. Additional evidences from expression data analysis of pleiotropic genes strongly supports that the pleiotropic genes including JAZF1 (p = 1.39E-05 and p = 2.13E-05), SYN2 (p = 5.49E-03 and p = 5.27E-04), CDKN2C (p = 1.99E-12 and p = 6.27E-11), RABGAP1 (p = 3.08E-03 and p = 7.46E-03), and UBE2E2 (p = 1.83E-04 and p = 8.22E-03) play crucial roles in both obesity and T2D pathogenesis. Pleiotropic analysis integrated with functional network identified several novel and causal SNPs and genes involved in both BMI and T2D which may be ignored in single GWAS.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Genetic Pleiotropy , Genome-Wide Association Study , Obesity/genetics , Polymorphism, Single Nucleotide , Alpha-Ketoglutarate-Dependent Dioxygenase FTO/genetics , Body Mass Index , Comorbidity , Datasets as Topic/statistics & numerical data , Diabetes Mellitus, Type 2/epidemiology , Gene Expression , Gene Regulatory Networks , Genetic Association Studies , Genetic Predisposition to Disease , Humans , Meta-Analysis as Topic , Metabolic Syndrome/epidemiology , Metabolic Syndrome/genetics , Molecular Sequence Annotation , Obesity/epidemiology , Prevalence , Transcription Factor 7-Like 2 Protein/genetics
19.
Transfusion ; 61(2): 423-434, 2021 02.
Article in English | MEDLINE | ID: mdl-33305364

ABSTRACT

BACKGROUND: Maternal hemorrhage protocols involve risk screening. These protocols prepare clinicians for potential hemorrhage and transfusion in individual patients. Patient-specific estimation and stratification of risk may improve maternal outcomes. STUDY DESIGN AND METHODS: Prediction models for hemorrhage and transfusion were trained and tested in a data set of 74 variables from 63 973 deliveries (97.6% of the source population of 65 560 deliveries included in a perinatal database from an academic urban delivery center) with sufficient data at pertinent time points: antepartum, peripartum, and postpartum. Hemorrhage and transfusion were present in 6% and 1.6% of deliveries, respectively. Model performance was evaluated with the receiver operating characteristic (ROC), precision-recall curves, and the Hosmer-Lemeshow calibration statistic. RESULTS: For hemorrhage risk prediction, logistic regression model discrimination showed ROCs of 0.633, 0.643, and 0.661 for the antepartum, peripartum, and postpartum models, respectively. These improve upon the California Maternal Quality Care Collaborative (CMQCC) accuracy of 0.613 for hemorrhage. Predictions of transfusion resulted in ROCs of 0.806, 0.822, and 0.854 for the antepartum, peripartum, and postpartum models, respectively. Previously described and new risk factors were identified. Models were not well calibrated with Hosmer-Lemeshow statistic P values between .001 and .6. CONCLUSIONS: Our models improve on existing risk assessment; however, further enhancement might require the inclusion of more granular, dynamic data. With the goal of increasing translatability, this work was distilled to an online open-source repository, including a form allowing risk factor inputs and outputs of CMQCC risk, alongside our numerical risk estimation and stratification of hemorrhage and transfusion.


Subject(s)
Blood Transfusion/statistics & numerical data , Logistic Models , Postpartum Hemorrhage/epidemiology , Pregnancy Complications, Hematologic/epidemiology , ROC Curve , Risk Assessment/methods , Uterine Hemorrhage/epidemiology , Adult , Cesarean Section/statistics & numerical data , Databases, Factual/statistics & numerical data , Datasets as Topic/statistics & numerical data , Delivery, Obstetric/methods , Female , Humans , Peripartum Period , Postpartum Hemorrhage/therapy , Pregnancy , Pregnancy Complications/epidemiology , Pregnancy Complications, Hematologic/therapy , Procedures and Techniques Utilization/statistics & numerical data , Risk Assessment/statistics & numerical data , Risk Factors , Smoking/epidemiology , Uterine Hemorrhage/therapy
20.
Front Public Health ; 8: 611325, 2020.
Article in English | MEDLINE | ID: mdl-33363099

ABSTRACT

This paper introduces a health index for measuring the health level of societies during the lockdown era, i. e., for the period from March 21, 2020 to April 7, 2020. For this purpose, individual-level survey data from the Global Behaviors and Perceptions in the COVID-19 Pandemic dataset are considered. We focus on cases in the United States and the United Kingdom, and the data come from 11,270 and 11,459 respondents, respectively. We then use unit root tests with structural breaks to examine whether COVID-19-related economic shocks significantly affect the health levels of the United States and the United Kingdom. The empirical results indicate that the health levels in the United States and the United Kingdom are not significantly affected by the COVID-19-related economic shocks. The evidence shows that government directives (such as lockdowns) did not significantly change the health levels of these societies.


Subject(s)
COVID-19/economics , Economic Factors , Health Status , Physical Distancing , Datasets as Topic/statistics & numerical data , Humans , SARS-CoV-2 , United Kingdom , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...