ABSTRACT
A utilização de procedimentos estatísticos é de fundamental importância para a interpretação apropriada de um conjunto de dados. Desta forma, a baixa aderência do teste aos dados selecionados pode levar a conclusões inadequadas. Portanto, a escolha do teste paramétrico e não paramétrico para dados pareados deve levar em conta a normalidade dos dados. Com isso, aplicar o coeficiente de correlação de Pearson (teste paramétrico) em dados não paramétricos aumenta as chances de associações espúrias (por acaso ou erro sistemático), as quais resultam em erro do Tipo I. Entendendo que as vezes o pensamento do jovem pesquisador e também de editores de periódicos científicos serão guiados por resultados positivos. É comum a possibilidade de editores selecionarem artigos para publicação tendo como base o valor de p <0,05. Contudo, também seria importante selecionar os artigos levando em consideração os cumprimentos dos pressupostos para a utilização de testes paramétricos e não-paramétricos. Com isso, objetivo do presente estudo foi abordar os dois testes de coeficiente de correlação de Pearson e Spearman e sugerir recomendações para praticantes de estatística na área de Ciências da Saúde para a utilização segura e adequada dos dados antes da publicação.(AU)
The use of statistical procedures is of fundamental importance for the proper interpretation of data analysis. In this way, the low adherence of the test to the selected data can lead to inadequate conclusions. Therefore, the choice of parametric and non-parametric tests for paired data should take into account the normality of the data. Therefore, applying the Pearson correlation coefficient (non-parametric test) in non-parametric data increases the chances of spurious associations (by chance or systematic error), which result in a Type I error. Knowing that young researcher and editors of scientific journals might be guided by positive results. It is common for editors to select articles for publication based on p < 0.05 value. However, it would also be important to select papers taking into account the fulfillment of the assumptions for the use of parametric and non-parametric tests. Thus, the aim of the present study was to address the two Pearson and Spearman correlation coefficient tests and to suggest recommendations for practitioners of statistics in the area of Health Sciences for the safe and adequate use of data prior publication.(AU)
Subject(s)
Humans , Bias , Health , Data Interpretation, Statistical , Statistics , Correlation of Data , Publications , Statistics as Topic , Test Taking Skills , Dataset , HypertensionABSTRACT
Although microscopic analysis of tissue slides has been the basis for disease diagnosis for decades, intra- and inter-observer variabilities remain issues to be resolved. The recent introduction of digital scanners has allowed for using deep learning in the analysis of tissue images because many whole slide images (WSIs) are accessible to researchers. In the present study, we investigated the possibility of a deep learning-based, fully automated, computer-aided diagnosis system with WSIs from a stomach adenocarcinoma dataset. Three different convolutional neural network architectures were tested to determine the better architecture for tissue classifier. Each network was trained to classify small tissue patches into normal or tumor. Based on the patch-level classification, tumor probability heatmaps can be overlaid on tissue images. We observed three different tissue patterns, including clear normal, clear tumor and ambiguous cases. We suggest that longer inspection time can be assigned to ambiguous cases compared to clear normal cases, increasing the accuracy and efficiency of histopathologic diagnosis by pre-evaluating the status of the WSIs. When the classifier was tested with completely different WSI dataset, the performance was not optimal because of the different tissue preparation quality. By including a small amount of data from the new dataset for training, the performance for the new dataset was much enhanced. These results indicated that WSI dataset should include tissues prepared from many different preparation conditions to construct a generalized tissue classifier. Thus, multi-national/multi-center dataset should be built for the application of deep learning in the real world medical practice.
Subject(s)
Adenocarcinoma , Classification , Dataset , Diagnosis , Learning , Observer Variation , StomachABSTRACT
10% of labeled tumor cells) of TNF receptor 1 (TNFR1), the protein product of TNFRSF1A gene, was correlated with sarcomatoid dedifferentiation and was an independent predictive factor of clinically unfavorable response and shorter survivals in separated TKI-treated ccRCC cohort.CONCLUSION: TNF-α signaling may play a role in TKI resistance, and TNFR1 expression may serve as a predictive biomarker for clinically unfavorable TKI responses in ccRCC.
Subject(s)
Biomarkers , Carcinoma, Renal Cell , Cohort Studies , Dataset , Drug Resistance , Gene Expression , Gene Expression Profiling , Heterografts , Humans , Immunohistochemistry , Protein-Tyrosine Kinases , Receptors, Tumor Necrosis Factor , Receptors, Tumor Necrosis Factor, Type I , Tumor Necrosis Factor-alphaABSTRACT
Subject(s)
Bays , Chronic Disease , Cohort Studies , Comorbidity , Dataset , Decision Trees , Florida , Forests , Hospitalization , Humans , Inpatients , International Classification of Diseases , Learning , Length of Stay , Machine Learning , Masks , Patient Discharge , Resource Allocation , Sensitivity and Specificity , Socioeconomic Factors , Supervised Machine Learning , Support Vector MachineABSTRACT
Subject(s)
Area Under Curve , Computing Methodologies , Databases, Pharmaceutical , Dataset , Dermatoglyphics , Drug Interactions , Drug-Related Side Effects and Adverse ReactionsABSTRACT
Subject(s)
Anti-Bacterial Agents , Benchmarking , Dataset , Delivery of Health Care , Drug Prescriptions , Electronic Health Records , Electronic Prescribing , Health Care Evaluation Mechanisms , Hospitals, General , Humans , Korea , National Health Programs , Polypharmacy , Prescriptions , Primary Health Care , Quality of Health Care , Surveys and QuestionnairesABSTRACT
Subject(s)
Abdominal Muscles , Adipose Tissue , Artificial Intelligence , Dataset , Intra-Abdominal Fat , Learning , Muscle, Skeletal , Muscles , Sarcopenia , Spine , Subcutaneous Fat , Tomography, X-Ray ComputedABSTRACT
Subject(s)
Biopsy , Classification , Dataset , Sensitivity and Specificity , Thyroid Gland , Thyroid Neoplasms , Thyroid Nodule , UltrasonographyABSTRACT
Subject(s)
Alzheimer Disease , Amyloid , Amyloid beta-Peptides , Brain , Cognition , Dataset , Dementia , Humans , Magnetic Resonance Imaging , Mass Screening , Methods , Positron-Emission Tomography , Prospective Studies , Sensitivity and Specificity , SeoulABSTRACT
Post-transcriptional regulations of mRNA transcripts such as alternative splicing and alternative polyadenylation can affect the expression of genes without changing the transcript levels. Recent studies have demonstrated that these post-transcriptional events can have significant physiological impacts on various biological systems and play important roles in the pathogenesis of a number of diseases, including cancers. Nevertheless, how cellular signaling pathways control these post-transcriptional processes in cells are not very well explored in the field yet. The mammalian target of rapamycin complex 1 (mTORC1) pathway plays a key role in sensing cellular nutrient and energy status and regulating the proliferation and growth of cells by controlling various anabolic and catabolic processes. Dysregulation of mTORC1 pathway can tip the metabolic balance of cells and is associated with a number of pathological conditions, including various types of cancers, diabetes, and cardiovascular diseases. Numerous reports have shown that mTORC1 controls its downstream pathways through translational and/or transcriptional regulation of the expression of key downstream effectors. And, recent studies have also shown that mTORC1 can control downstream pathways via post-transcriptional regulations. In this review, we will discuss the roles of post-transcriptional processes in gene expression regulations and how mTORC1-mediated post-transcriptional regulations contribute to cellular physiological changes. We highlight post-transcriptional regulation as an additional layer of gene expression control by mTORC1 to steer cellular biology. These emphasize the importance of studying post-transcriptional events in transcriptome datasets for gaining a fuller understanding of gene expression regulations in the biological systems of interest.
Subject(s)
Alternative Splicing , Cardiovascular Diseases , Dataset , Gene Expression , Polyadenylation , RNA, Messenger , Sirolimus , Social Control, Formal , TranscriptomeABSTRACT
PURPOSE: We determined whether elevated serum alkaline phosphatase (ALP) was related to prevalence, location, type, length, and recurrence of pterygium in a population from the Republic of Korea.METHODS: A nationwide cross-sectional dataset, the Korean National Health and Nutrition Examination Survey (2008–2011), was used in this study. All participants were > 30 years of age and underwent the ALP test and ophthalmic evaluation (n = 22,359). One-way analysis of variance, the chi-square test, and Fisher's exact test were used to compare characteristics and outcomes among participants. Multivariable logistic regression was used to examine the possible associations between serum ALP levels and various types of pterygium. Data were adjusted for known risk factors for development of pterygium and ALP elevation (age, sex, residence, sunlight exposure, drinking, smoking, hypertension, diabetes, BMI, AST, ALT, vitamin D, and HDL).RESULTS: The overall prevalence of pterygium was 8.1%, and participants with pterygium had higher levels of serum ALP (p < 0.001). Participants with higher serum ALP had a significantly higher prevalence of all types of pterygium than those in the lower serum ALP quartiles. After adjusting for potential confounding factors, multivariate logistic regression analysis revealed that ALP was associated with the prevalence of pterygium (odds ratio [OR], 1.001; p = 0.038). Trend analysis between the OR and ALP quartiles revealed a linear trend in overall prevalence and in the intermediate type of pterygium. Subgroup analysis revealed a stronger correlation in participants > 50 years of age. One-way analysis of variance revealed an association between the size of pterygium and serum ALP quartile levels. Serum ALP was not associated with recurrence of pterygium.CONCLUSIONS: Increased serum ALP was associated with the prevalence and size of pterygium.
Subject(s)
Alkaline Phosphatase , Cross-Sectional Studies , Dataset , Drinking , Hypertension , Korea , Logistic Models , Nutrition Surveys , Prevalence , Pterygium , Recurrence , Republic of Korea , Risk Factors , Smoke , Smoking , Sunlight , Vitamin DABSTRACT
OBJECTIVE: To investigate pathologic discrepancies between colposcopy-directed biopsy (CDB) of the cervix and loop electrosurgical excision procedure (LEEP) in women with cytologic high-grade squamous intraepithelial lesions (HSILs).METHODS: We retrospectively identified 297 patients who underwent both CDB and LEEP for HSILs in cervical cytology between 2015 and 2018, and compared their pathologic results. Considering the LEEP to be the gold standard, we evaluated the diagnostic performance of CDB for identifying cervical intraepithelial neoplasia (CIN) grades 2 and 3, adenocarcinoma in situ, and cancer (HSIL+). We also performed age subgroup analyses.RESULTS: Among the study population, 90.9% (270/297) had pathologic HSIL+ using the LEEP. The diagnostic performance of CDB for identifying HSIL+ was as follows: sensitivity, 87.8%; specificity, 59.3%; balanced accuracy, 73.6%; positive predictive value, 95.6%; and negative predictive value, 32.7%. Thirty-three false negative cases of CDB included CIN2,3 (n=29) and cervical cancer (n=4). The pathologic HSIL+ rate in patients with HSIL− by CDB was 67.3% (33/49). CDB exhibited a significant difference in the diagnosis of HSIL+ compared to LEEP in all patients (p<0.001). In age subgroup analyses, age groups <35 years and 35–50 years showed good agreement with the entire data set (p=0.496 and p=0.406, respectively), while age group ≥50 years did not (p=0.036).CONCLUSION: A significant pathologic discrepancy was observed between CDB and LEEP results in women with cytologic HSILs. The diagnostic inaccuracy of CDB increased in those ≥50 years of age.
Subject(s)
Adenocarcinoma in Situ , Biopsy , Uterine Cervical Dysplasia , Cervix Uteri , Colposcopy , Conization , Dataset , Diagnosis , Early Detection of Cancer , Female , Humans , Papanicolaou Test , Retrospective Studies , Sensitivity and Specificity , Squamous Intraepithelial Lesions of the Cervix , Uterine Cervical NeoplasmsABSTRACT
BACKGROUND: Atopic dermatitis (AD) is recognized as a common inflammatory skin disease and frequently occurred in Asian and Black individuals.OBJECTIVE: Since the limitation of dataset associated with human severe AD, this study aimed to screen potential novel biomarkers involved in mild AD.METHODS: Expression profile data (GSE75890) were obtained from the database of Gene Expression Omnibus. Using limma package, the differentially expressed genes (DEGs) between samples from AD and healthy control were selected. Furthermore, function analysis was conducted. Meanwhile, the protein-protein interaction (PPI) network and transcription factor (TF)-miRNA-target regulatory network were constructed. And quantitative real-time polymerase chain reaction (qRT-PCR) was used to validate the expressions patterns of key genes.RESULTS: In total, 285 DEGs including 214 upregulated and 71 downregulated genes were identified between samples from two groups. The upregulated DEGs were mainly involved in nine pathways, such as hematopoietic cell lineage, pertussis, p53 signaling pathway, staphylococcus aureus infection, and cell cycle, while tight junction was the only pathway enriched by the downregulated DEGs. Cyclin B (CCNB)1, CCNB2, cyclin A (CCNA)2, C-X-C motif chemokine ligand (CXCL)10, and CXCL9 were key nodes in PPI network. The TF-miRNA-target gene regulatory network focused on miRNAs such as miR-106b, miR-106a, and miR-17, TFs such as nuclear factor kappa B subunit 1, RELA proto-oncogene, Sp1 transcription factor, and genes such as matrix metallopeptidase 9, peroxisome proliferator activated receptor gamma , and serpin family E member 1. Moreover, the upregulation of these genes, including CCNB1, CCNB2, CCNA2, CXCL10, and CXCL9 were confirmed by qRT-PCR.CONCLUSION: CCNB1, CCNB2, CCNA2, and CXCL9 might be novel markers of mild AD. miR-106b and miR-17 may involve in regulation of immune response in AD patients.
Subject(s)
Asian People , Biomarkers , Cell Cycle , Cell Lineage , Computational Biology , Cyclin A , Cyclin B , Dataset , Dermatitis , Dermatitis, Atopic , Gene Expression , Gene Regulatory Networks , Humans , MicroRNAs , NF-kappa B , PPAR gamma , Proto-Oncogenes , Real-Time Polymerase Chain Reaction , Skin Diseases , Sp1 Transcription Factor , Staphylococcus aureus , Tight Junctions , Transcription Factors , Up-Regulation , Whooping CoughABSTRACT
Identification of fusion gene is of prominent importance in cancer research field because of their potential as carcinogenic drivers. RNA sequencing (RNA-Seq) data have been the most useful source for identification of fusion transcripts. Although a number of algorithms have been developed thus far, most programs produce too many false-positives, thus making experimental confirmation almost impossible. We still lack a reliable program that achieves high precision with reasonable recall rate. Here, we present FusionScan, a highly optimized tool for predicting fusion transcripts from RNA-Seq data. We specifically search for split reads composed of intact exons at the fusion boundaries. Using 269 known fusion cases as the reference, we have implemented various mapping and filtering strategies to remove false-positives without discarding genuine fusions. In the performance test using three cell line datasets with validated fusion cases (NCI-H660, K562, and MCF-7), FusionScan outperformed other existing programs by a considerable margin, achieving the precision and recall rates of 60% and 79%, respectively. Simulation test also demonstrated that FusionScan recovered most of true positives without producing an overwhelming number of false-positives regardless of sequencing depth and read length. The computation time was comparable to other leading tools. We also provide several curative means to help users investigate the details of fusion candidates easily. We believe that FusionScan would be a reliable, efficient and convenient program for detecting fusion transcripts that meet the requirements in the clinical and experimental community. FusionScan is freely available at http://fusionscan.ewha.ac.kr/.
Subject(s)
Cell Line , Dataset , Exons , Gene Fusion , Sequence Analysis, RNA , Translocation, GeneticABSTRACT
The Wellcome Trust Case Control Consortium (WTCCC) study was a large genome-wide association study that aimed to identify common variants associated with seven diseases. That study combined two control datasets (58C and UK Blood Services) as shared controls. Prior to using the combined controls, the WTCCC performed analyses to show that the genomic content of the control datasets was not significantly different. Recently, the analysis of human leukocyte antigen (HLA) genes has become prevalent due to the development of HLA imputation technology. In this project, we extended the between-control homogeneity analysis of the WTCCC to HLA. We imputed HLA information in the WTCCC control dataset and showed that the HLA content was not significantly different between the two control datasets, suggesting that the combined controls can be used as controls for HLA fine-mapping analysis based on HLA imputation.
Subject(s)
Case-Control Studies , Dataset , Genome-Wide Association Study , Humans , LeukocytesABSTRACT
Neuroblastoma is a major cause of cancer death in early childhood, and its timely and correct diagnosis is critical. Gene expression datasets have recently been considered as a powerful tool for cancer diagnosis and subtype classification. However, no attempts have yet been made to apply deep learning using gene expression to neuroblastoma classification, although deep learning has been applied to cancer diagnosis using image data. Taking the International Neuroblastoma Staging System stages as multiple classes, we designed a deep neural network using the gene expression patterns and stages of neuroblastoma patients. Despite a small patient population (n = 280), stage 1 and 4 patients were well distinguished. If it is possible to replicate this approach in a larger population, deep learning could play an important role in neuroblastoma staging.
Subject(s)
Classification , Dataset , Diagnosis , Gene Expression , Humans , Learning , NeuroblastomaABSTRACT
Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.
Subject(s)
Benchmarking , Biology , Data Mining , Dataset , Machine Learning , Methods , Molecular Biology , Natural Language Processing , Oryza , PlantsABSTRACT
Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.
Subject(s)
Dataset , Information Storage and Retrieval , Methods , Semantics , VocabularyABSTRACT
Alveolar type II cells constitute a small fraction of the total lung cell mass. However, they play an important role in many cellular processes including trans-differentiation into type I cells as well as repair of lung injury in response to toxic chemicals and respiratory pathogens. Transcription factors are the regulatory proteins dynamically modulating DNA structure and gene expression. Transcription factor profiling in microarray datasets revealed that several members of AP1, ATF, NF-kB, and C/EBP families involved in diverse responses were expressed in mouse lung type II cells. A transcriptional factor signature consisting of Cebpa, Srebf1, Stat3, Klf5, and Elf3 was identified in lung type II cells, Sox9+ pluripotent lung stem cells as well as in mouse lung development. Identification of the transcription factor profile in mouse lung type II cells will serve as a useful resource and facilitate the integrated analysis of signal transduction pathways and specific gene targets in a variety of physiological conditions.
Subject(s)
Animals , Dataset , DNA , Gene Expression , Humans , Lung Injury , Lung , Mice , NF-kappa B , Signal Transduction , Stem Cells , Transcription Factors , TranscriptomeABSTRACT
OBJECTIVE: Myeloproliferative neoplasm (MPN) is considered as one of the risk factors of ischemic stroke. Some MPN patients manifest stroke as their first symptom. Our purpose was to assess diagnostic rate of MPN in newly diagnosed acute ischemic stroke patients. METHODS: This study was performed using National Health Insurance Service Ilsan Hospital dataset. Data retrieving was performed by defining by defining the patient with coding of acute ischemic stroke from January 2013 to June 2017. We selected only the patients who had checked brain magnetic resonance imaging and complete blood cell count (CBC) in emergency room or on admission. Among the results of CBC finding, hemoglobin and platelet count were analyzed. Erythrocytosis was defined >16.5 g/dL (male), >16 g/dL (female) according to revised World Health Organization (WHO) classification of polycythemia vera (PV) criteria. Thrombocytosis was >450,000/µL according to revised WHO classification of essential thrombocythemia (ET). RESULTS: Total number of newly diagnosed acute ischemic stroke was 1,613 patients. Seven patients (0.43%) were diagnosed MPN (ET=2, PV=5) after ischemic stroke. Patients who had thrombocytosis and erythrocytosis were 18 and 105, respectively. Three patients who had thrombocytosis were diagnosed MPN (ET=2, PV=1). Two patients with erythrocytosis were diagnosed MPN (PV=2). Two patients had both thrombocytosis and erythrocytosis, and two of them were diagnosed PV. Seventy-one patients who had erythrocytosis were normalized in follow-up period. Six patients who had thrombocytosis and 30 patients who had erythrocytosis did not further evaluate. CONCLUSION: CBC has to be carefully read and MPN can be suspected. Diagnosis must be confirmed by hematologist to initiate appropriate treatment. It is important to recognized suspected MPN patients to prevent stroke.