Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Transl Oncol ; 29: 101629, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36689862

ABSTRACT

TP53 is the most frequently mutated gene in muscle invasive bladder cancer (MIBC) and there are two gene signatures regarding TP53 developed for MIBC prognosis. However, they are limited to immune genes only and unable to be used individually across platforms due to their quantitative manners. We used 827 gene expression profiles from seven MIBC cohorts with varied platforms to build a pairwise TP53-derived transcriptome signature, 13 gene pairs (13-GPs). Since the 13-GPs model is a single sample prognostic predictor, it can be applied individually in practice and is applicable to any gene-expression platforms without specific normalization requirements. Survival difference between high-risk and low-risk patients stratified by the 13-GPs test was statistically significant (HR range: 2.26-2.76, all P < .0001). Discovery and validation sets showed that the 13-GPs was an independent prognostic factor after adjusting other clinical features (HR range: 2.21-2.82, all P < .05). Moreover, it was a potential supplement to the consensus molecular classification of MIBC to further stratify the LumP subtype (patients with better prognoses). High- and low-risk patients by the 13-GPs model presented distinct immune microenvironment and DDR mutation rates, suggesting that it might have the potential for immunotherapy. Being a general approach to other cancer types, this study demonstrated how we integrated gene variants with pairwise gene panels to build a single sample prognostic test in translational oncology.

2.
Comput Struct Biotechnol J ; 20: 2672-2679, 2022.
Article in English | MEDLINE | ID: mdl-35685355

ABSTRACT

There is a growing need to build a model that uses single cell RNA-seq (scRNA-seq) to separate malignant cells from nonmalignant cells and to identify tumor of origin of single cells and/or circulating tumor cells (CTCs). Currently, it is infeasible to build a tumor of origin model learnt from scRNA-seq by machine learning (ML). We then wondered if an ML model learnt from bulk transcriptomes is applicable to scRNA-seq to infer single cells' tumor presence and further indicate their tumor of origin. We used k-nearest neighbors, one-versus-all support vector machine, one-versus-one support vector machine, random forest and introduced scTumorTrace to conduct a pioneering experiment containing leukocytes and seven major cancer types where bulk RNA-seq and scRNA-seq data were available. 13 ML models learnt from bulk RNA-seq were all reliable to use (F-score > 96%) shown by a validation set of bulk transcriptomes, but none of them was applicable to scRNA-seq except scTumorTrace. Making inferences from bulk RNA-seq to scRNA-seq was impaired by feature selection and improved by log2-transformed TPM units. scTumorTrace with transcriptome-wide 2-tuples showed F-score beyond 98.74 and 94.29% in inferring tumor presence and tumor of origin at single-cell resolution and correctly identified 45 single candidate prostate CTCs but lineage-confirmed non-CTCs as leukocytes. We concluded that modern ML techniques are quantitative and could hardly address the raised questions. scTumorTrace with transcriptome-wide 2-tuples is qualitative, standardization-free and not subject to log2-transformed quantities, enabling us to infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes.

3.
J Biomed Inform ; 131: 104112, 2022 07.
Article in English | MEDLINE | ID: mdl-35680073

ABSTRACT

Extended endocrine therapy beyond 5 years is of major concern to ER + breast cancer survivors. However, it might be unsuitable to apply routinely used genomic tests designed for early recurrence risks to distant recurrence within 10 years in extended treatment context. These tests initially aim at high sensitivities with Type I errors much higher than Type II. Having lower positive predictive values (PPVs), these tests can bring many false positives who might not need further treatment options to avoid adversely affecting quality of life. Alternatively, we proposed a top-down approach to the raised issues. We built 149 targeted genes from four genomic tests upon 381 ER-positive node-negative patients with either metastasis free beyond 10 years (n = 202) or metastasis within 10 years (n = 179). By a basket of SVM-wrapped length-constraint feature selection (LCFS), we discovered four genomic SVMs that traded off Type I against Type II errors. Two independent cohorts were used to validate disease outcome predictions. A 36-gene SVM balanced sensitivities with PPVs at good levels: 74% vs 76% on 10-fold cross validation (n = 347) and 75% vs 71% on a test set (n = 34). Neither Oncotype DX RS (cutoff = 18, 31, 60.97) nor PAM50 ROR-S (cutoff = 29, 53, 61.18) could. Independent cohorts showed the 36-gene SVM predicted disease free survival (n = 136, HR = 2.59; 95% CI, 1.4-4.8) and disease specific survival (n = 127, HR = 4.06; 95% CI, 1.63-10.11) better than RS (DFS, HR = 2.15; DSS, HR = 3.86) and ROR-S (DFS, HR = 2.29; DSS, HR = 2.76). The case study demonstrated how we identified a genomic test to balance Type I against Type II errors for risk stratification. The top-down approach centered around the LCFS-metaheuristics basket is a generic methodology for clinical decision-making and quality of life using targeted profiling data where the number of dimensions (p) is smaller than the number of samples (n).


Subject(s)
Breast Neoplasms , Breast Neoplasms/drug therapy , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Female , Humans , Neoplasm Recurrence, Local/genetics , Neoplasm Recurrence, Local/pathology , Predictive Value of Tests , Prognosis , Quality of Life
4.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32607548

ABSTRACT

The accuracy of prostate-specific antigen or clinical examination in prostate cancer (PCa) screening is in question, and circulating microRNAs (miRNAs) can be alternatives to PCa diagnosis. However, recent circulating miRNA biomarkers either are identified upon small sample sizes or cannot have robust diagnostic performance in every aspect of performance indicators. These may decrease applicability of potential biomarkers for the early detection of PCa. We reviewed recent studies on blood-derived miRNAs for prostate cancer diagnosis and carried out a large case study to understand whether circulating miRNA pairs, rather than single circulating miRNAs, could contribute to a more robust diagnostic model to significantly improve PCa diagnosis. We used 1231 high-throughput miRNA-profiled serum samples from two cohorts to design and verify a model based on class separability miRNA pairs (cs-miRPs). The pairwise model was composed of five circulating miRNAs coupled to miR-5100 and miR-1290 (i.e. five miRNA pairs, 5-cs-miRPs), reaching approximately 99% diagnostic performance in almost all indicators (sensitivity = 98.96%, specificity = 100%, accuracy = 99.17%, PPV = 100%, NPV = 96.15%) shown by a test set (n = 484: PCa = 384, negative prostate biopsies = 100). The nearly 99% diagnostic performance was also verified by an additional validation set (n = 140: PCa = 40, healthy controls = 100). Overall, the 5-cs-miRP model had 1 false positive and 7 false negatives among the 1231 serum samples and was superior to a recent 2-miRNA model (so far the best for PCa diagnosis) with 18 false positives and 80 false negatives. The present large case study demonstrated that circulating miRNA pairs could potentially bring more benefits to PCa early diagnosis for clinical practice.


Subject(s)
Biomarkers, Tumor , Circulating MicroRNA , Early Detection of Cancer , Prostatic Neoplasms , RNA, Neoplasm , Biomarkers, Tumor/blood , Biomarkers, Tumor/genetics , Biopsy , Circulating MicroRNA/blood , Circulating MicroRNA/genetics , Humans , Male , Prostatic Neoplasms/blood , Prostatic Neoplasms/diagnosis , Prostatic Neoplasms/genetics , RNA, Neoplasm/blood , RNA, Neoplasm/genetics
5.
Front Oncol ; 9: 629, 2019.
Article in English | MEDLINE | ID: mdl-31355144

ABSTRACT

Background: Previously reported transcriptional signatures for predicting the prognosis of stage I-III bladder cancer (BLCA) patients after surgical resection are commonly based on risk scores summarized from quantitative measurements of gene expression levels, which are highly sensitive to the measurement variation and sample quality and thus hardly applicable under clinical settings. It is necessary to develop a signature which can robustly predict recurrence risk of BLCA patients after surgical resection. Methods: The signature is developed based on the within-sample relative expression orderings (REOs) of genes, which are qualitative transcriptional characteristics of the samples. Results: A signature consisting of 12 gene pairs (12-GPS) was identified in training data with 158 samples. In the first validation dataset with 114 samples, the low-risk group of 54 patients had a significantly better overall survival than the high-risk group of 60 patients (HR = 3.59, 95% CI: 1.34~9.62, p = 6.61 × 10-03). The signature was also validated in the second validation dataset with 57 samples (HR = 2.75 × 1008, 95% CI: 0~Inf, p = 0.05). Comparison analysis showed that the transcriptional differences between the low- and high-risk groups were highly reproducible and significantly concordant with DNA methylation differences between the two groups. Conclusions: The 12-GPS signature can robustly predict the recurrence risk of stage I-III BLCA patients after surgical resection. It can also aid the identification of reproducible transcriptional and epigenomic features characterizing BLCA metastasis.

6.
J Transl Med ; 17(1): 63, 2019 02 28.
Article in English | MEDLINE | ID: mdl-30819200

ABSTRACT

BACKGROUND: Currently, pathological examination of gastroscopy biopsy specimens is the gold standard for gastric cancer (GC) diagnosis. However, it has a false-negative rate of 10-20% due to inaccurate sampling locations and/or insufficient sampling amount. A signature should be developed to aid the early diagnosis of GC using biopsy specimens even when they are sampled from inaccurate locations. METHODS: We extracted a robust qualitative transcriptional signature, based on the within-sample relative expression orderings (REOs) of gene pairs, to discriminate both GC tissues and adjacent-normal tissues from non-GC gastritis, intestinal metaplasia and normal gastric tissues. RESULTS: A signature consisting of two gene pairs for GC diagnosis was identified and validated in data of both biopsy specimens and surgical resection specimens pooled from publicly available datasets measured by different laboratories with different platforms. For gastroscopy biopsy specimens, 96.20% of 79 non-GC tissues were correctly identified as non-GC, and 96.84% of 158 GC tissues and six of seven adjacent-normal tissues were correctly identified as GC. For surgical resection specimens, 98.37% of 2560 GC tissues and 97.28% of 221 adjacent-normal tissues were correctly identified as GC. Especially, 97.67% of the 257 GC patients at stage I were exactly diagnosed as GC. We additionally measured 21 GC tissues from seven different GC patients, each with three specimens sampled from three tumor locations with different proportions of the tumor epithelial cell. All these GC tissues were correctly identified as GC, even when the proportion of the tumor epithelial cell was as low as 14%. CONCLUSIONS: The qualitative transcriptional signature can distinguish both GC and adjacent-normal tissues from normal, gastritis and intestinal metaplasia tissues of non-GC patients even using inaccurately sampled biopsy specimens, which can be applied robustly at the individual level to aid the early GC diagnosis.


Subject(s)
Stomach Neoplasms/genetics , Stomach Neoplasms/pathology , Transcriptome/genetics , Databases, Genetic , Epithelial Cells/metabolism , Epithelial Cells/pathology , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , ROC Curve , Reproducibility of Results , Stomach Neoplasms/diagnosis
7.
J Vis Exp ; (136)2018 06 19.
Article in English | MEDLINE | ID: mdl-29985320

ABSTRACT

The maintenance of the genome and its faithful replication is paramount for conserving genetic information. To assess high fidelity replication, we have developed a simple non-labeled and non-radio-isotopic method using a matrix-assisted laser desorption ionization with time-of-flight (MALDI-TOF) mass spectrometry (MS) analysis for a proofreading study. Here, a DNA polymerase [e.g., the Klenow fragment (KF) of Escherichia coli DNA polymerase I (pol I) in this study] in the presence of all four dideoxyribonucleotide triphosphates is used to process a mismatched primer-template duplex. The mismatched primer is then proofread/extended and subjected to MALDI-TOF MS. The products are distinguished by the mass change of the primer down to single nucleotide variations. Importantly, a proofreading can also be determined for internal single mismatches, albeit at different efficiencies. Mismatches located at 2-4-nucleotides (nt) from the 3' end were efficiently proofread by pol I, and a mismatch at 5 nt from the primer terminus showed only a partial correction. No proofreading occurred for internal mismatches located at 6 - 9 nt from the primer 3' end. This method can also be applied to DNA repair assays (e.g., assessing a base-lesion repair of substrates for the endo V repair pathway). Primers containing 3' penultimate deoxyinosine (dI) lesions could be corrected by pol I. Indeed, penultimate T-I, G-I, and A-I substrates had their last 2 dI-containing nucleotides excised by pol I before adding a correct ddN 5'-monophosphate (ddNMP) while penultimate C-I mismatches were tolerated by pol I, allowing the primer to be extended without repair, demonstrating the sensitivity and resolution of the MS assay to measure DNA repair.


Subject(s)
DNA Repair/genetics , DNA Replication/genetics , Nucleotides/metabolism , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Humans
8.
DNA Repair (Amst) ; 61: 63-75, 2018 01.
Article in English | MEDLINE | ID: mdl-29223016

ABSTRACT

Proofreading and DNA repair are important factors in maintaining the high fidelity of genetic information during DNA replication. Herein, we designed a non-labeled and non-radio-isotopic simple method to measure proofreading. An oligonucleotide primer is annealed to a template DNA forming a mismatched site and is proofread by Klenow fragment of Escherichia coli DNA polymerase I (pol I) in the presence of all four dideoxyribonucleotide triphosphates. The proofreading excision products and re-synthesis products of single nucleotide extension are subjected to MALDI-TOF mass spectrometry (MS). The proofreading at the mismatched site is identified by the mass change of the primer. We examined proofreading of Klenow fragment with DNAs containing various base mismatches. Single mismatches at the primer terminus can be proofread efficiently. Internal single mismatches can also be proofread at different efficiencies, with the best correction for mismatches located 2-4-nucleotides from the primer terminus. For mismatches located 5-nucleotides from the primer terminus there was partial correction and extension. No significant proofreading was observed for mismatches located 6-9-nucleotides from the primer terminus. We also subjected primers containing 3' penultimate deoxyinosine (dI) lesions, which mimic endonuclease V nicked repair intermediates, to pol I repair assay. The results showed that T-I was a better substrate than G-I and A-I, however C-I was refractory to repair. The high resolution of MS results clearly demonstrated that all the penultimate T-I, G-I and A-I substrates had been excised last 2 dI-containing nucleotides by pol I before adding a correct ddNMP, however, pol I proofreading exonuclease tolerated the penultimate C-I mismatch allowing the primer to be extended by polymerase activity.


Subject(s)
DNA Repair , DNA Replication , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization , DNA Polymerase I/metabolism , Templates, Genetic
9.
BMC Genomics ; 16: 1041, 2015 Dec 09.
Article in English | MEDLINE | ID: mdl-26647162

ABSTRACT

BACKGROUND: Gene expression profiling using high-throughput screening (HTS) technologies allows clinical researchers to find prognosis gene signatures that could better discriminate between different phenotypes and serve as potential biological markers in disease diagnoses. In recent years, many feature selection methods have been devised for finding such discriminative genes, and more recently information theoretic filters have also been introduced for capturing feature-to-class relevance and feature-to-feature correlations in microarray-based classification. METHODS: In this paper, we present and fully formulate a new multivariate filter, iRDA, for the discovery of HTS gene-expression candidate genes. The filter constitutes a four-step framework and includes feature relevance, feature redundancy, and feature interdependence in the context of feature-pairs. The method is based upon approximate Markov blankets, information theory, several heuristic search strategies with forward, backward and insertion phases, and the method is aiming at higher order gene interactions. RESULTS: To show the strengths of iRDA, three performance measures, two evaluation schemes, two stability index sets, and the gene set enrichment analysis (GSEA) are all employed in our experimental studies. Its effectiveness has been validated by using seven well-known cancer gene-expression benchmarks and four other disease experiments, including a comparison to three popular information theoretic filters. In terms of classification performance, candidate genes selected by iRDA perform better than the sets discovered by the other three filters. Two stability measures indicate that iRDA is the most robust with the least variance. GSEA shows that iRDA produces more statistically enriched gene sets on five out of the six benchmark datasets. CONCLUSIONS: Through the classification performance, the stability performance, and the enrichment analysis, iRDA is a promising filter to find predictive, stable, and enriched gene-expression candidate genes.


Subject(s)
Computational Biology/methods , Algorithms , Computational Biology/standards , Gene Expression , Genetic Association Studies/methods , High-Throughput Nucleotide Sequencing , Humans , Oligonucleotide Array Sequence Analysis
10.
Comput Biol Chem ; 57: 54-60, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25748535

ABSTRACT

Gene expression profiles based on high-throughput technologies contribute to molecular classifications of different cell lines and consequently to clinical diagnostic tests for cancer types and other diseases. Statistical techniques and dimension reduction methods have been devised for identifying minimal gene subset with maximal discriminative power. For sets of in silico candidate genes, assuming a unique gene signature or performing a parsimonious signature evaluation seems to be too restrictive in the context of in vitro signature validation. This is mainly due to the high complexity of largely correlated expression measurements and the existence of various oncogenic pathways. Consequently, it might be more advantageous to identify and evaluate multiple gene signatures with a similar good predictive power, which are referred to as near-optimal signatures, to be made available for biological validation. For this purpose we propose the bead-chain-plot approach originating from swarm intelligence techniques, and a small scale computational experiment is conducted in order to convey our vision. We simulate the acquisition of candidate genes by using a small pool of differentially expressed genes derived from microarray-based CNS tumour data. The application of the bead-chain-plot provides experimental evidence for improved classifications by using near-optimal signatures in validation procedures.


Subject(s)
Transcriptome , Algorithms , Central Nervous System , Humans , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis , Reproducibility of Results
11.
Microarrays (Basel) ; 3(1): 1-23, 2014 Jan 03.
Article in English | MEDLINE | ID: mdl-27605027

ABSTRACT

Genomic DNA-based probe selection by using high density oligonucleotide arrays has recently been applied to heterologous species (Xspecies). With the advent of this new approach, researchers are able to study the genome and transcriptome of a non-model or an underutilised crop species through current state-of-the-art microarray platforms. However, a software package with a graphical user interface (GUI) to analyse and parse the oligonucleotide probe pair level data is still lacking when an experiment is designed on the basis of this cross species approach. A novel computer program called Pigeons has been developed for customised array data analysis to allow the user to import and analyse Affymetrix GeneChip(®) probe level data through XSpecies. One can determine empirical boundaries for removing poor probes based on genomic hybridisation of the test species to the Xspecies array, followed by making a species-specific Chip Description File (CDF) file for transcriptomics in the heterologous species, or Pigeons can be used to examine an experimental design to identify potential Single-Feature Polymorphisms (SFPs) at the DNA or RNA level. Pigeons is also focused around visualization and interactive analysis of the datasets. The software with its manual (the current release number version 1.2.1) is freely available at the website of the Nottingham Arabidopsis Stock Centre (NASC).

SELECTION OF CITATIONS
SEARCH DETAIL
...