Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
BMC Cancer ; 18(1): 603, 2018 May 29.
Article in English | MEDLINE | ID: mdl-29843660

ABSTRACT

BACKGROUND: Pancreatic ductal adenocarcinoma (PDAC) is the fourth leading cause of cancer related death in the world with a five-year survival rate of less than 5%. Not all PDAC are the same, because there exist intra-tumoral heterogeneity between PDAC, which poses a great challenge to personalized treatments for PDAC. METHODS: To dissect the molecular heterogeneity of PDAC, we performed a retrospective meta-analysis on whole transcriptome data from more than 1200 PDAC patients. Subtypes were identified based on non-negative matrix factorization (NMF) biclustering method. We used the gene set enrichment analysis (GSEA) and survival analysis to conduct the molecular and clinical characterization of the identified subtypes, respectively. RESULTS: Six molecular and clinical distinct subtypes of PDAC: L1-L6, are identified and grouped into tumor-specific (L1, L2 and L6) and stroma-specific subtypes (L3, L4 and L5). For tumor-specific subtypes, L1 (~ 22%) has enriched carbohydrate metabolism-related gene sets and has intermediate survival. L2 (~ 22%) has the worst clinical outcomes, and is enriched for cell proliferation-related gene sets. About 23% patients can be classified into L6, which leads to intermediate survival and is enriched for lipid and protein metabolism-related gene sets. Stroma-specific subtypes may contain high non-epithelial contents such as collagen, immune and islet cells, respectively. For instance, L3 (~ 12%) has poor survival and is enriched for collagen-associated gene sets. L4 (~ 14%) is enriched for various immune-related gene sets and has relatively good survival. And L5 (~ 7%) has good clinical outcomes and is enriched for neurotransmitter and insulin secretion related gene sets. In the meantime, we identified 160 subtype-specific markers and built a deep learning-based classifier for PDAC. We also applied our classification system on validation datasets and observed much similar molecular and clinical characteristics between subtypes. CONCLUSIONS: Our study is the largest cohort of PDAC gene expression profiles investigated so far, which greatly increased the statistical power and provided more robust results. We identified six molecular and clinical distinct subtypes to describe a more complete picture of the PDAC heterogeneity. The 160 subtype-specific markers and a deep learning based classification system may be used to better stratify PDAC patients for personalized treatments.


Subject(s)
Biomarkers, Tumor/genetics , Carcinoma, Pancreatic Ductal/genetics , Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Pancreatic Neoplasms/genetics , Aged , Carcinoma, Pancreatic Ductal/mortality , Carcinoma, Pancreatic Ductal/pathology , Carcinoma, Pancreatic Ductal/therapy , Cluster Analysis , Data Analysis , Datasets as Topic , Deep Learning , Female , Humans , Male , Microarray Analysis , Middle Aged , Pancreatic Neoplasms/mortality , Pancreatic Neoplasms/pathology , Pancreatic Neoplasms/therapy , Precision Medicine/methods , Prognosis , Retrospective Studies , Survival Analysis , Transcriptome/genetics
2.
PLoS One ; 11(9): e0162293, 2016.
Article in English | MEDLINE | ID: mdl-27598575

ABSTRACT

Co-clustering, often called biclustering for two-dimensional data, has found many applications, such as gene expression data analysis and text mining. Nowadays, a variety of multi-dimensional arrays (tensors) frequently occur in data analysis tasks, and co-clustering techniques play a key role in dealing with such datasets. Co-clusters represent coherent patterns and exhibit important properties along all the modes. Development of robust co-clustering techniques is important for the detection and analysis of these patterns. In this paper, a co-clustering method based on hyperplane detection in singular vector spaces (HDSVS) is proposed. Specifically in this method, higher-order singular value decomposition (HOSVD) transforms a tensor into a core part and a singular vector matrix along each mode, whose row vectors can be clustered by a linear grouping algorithm (LGA). Meanwhile, hyperplanar patterns are extracted and successfully supported the identification of multi-dimensional co-clusters. To validate HDSVS, a number of synthetic and biological tensors were adopted. The synthetic tensors attested a favorable performance of this algorithm on noisy or overlapped data. Experiments with gene expression data and lineage data of embryonic cells further verified the reliability of HDSVS to practical problems. Moreover, the detected co-clusters are well consistent with important genetic pathways and gene ontology annotations. Finally, a series of comparisons between HDSVS and state-of-the-art methods on synthetic tensors and a yeast gene expression tensor were implemented, verifying the robust and stable performance of our method.


Subject(s)
Algorithms , Data Mining/statistics & numerical data , Disease/genetics , Genes, Fungal , Genes, Helminth , Animals , Caenorhabditis elegans/genetics , Caenorhabditis elegans/growth & development , Cell Cycle/genetics , Cluster Analysis , Datasets as Topic , Gene Expression , Gene Ontology , Humans , Molecular Sequence Annotation , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/growth & development
3.
Microbiol Res ; 170: 69-77, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25267486

ABSTRACT

Tachyplesin I is a 17 amino acid, cationic, antimicrobial peptide with a typical cyclic antiparallel ß-sheet structure. Interactions of tachyplesin I with living bacteria are not well understood, although models have been used to elucidate how tachyplesin I permeabilizes membranes. There are several questions to be answered, such as (i) how does tachyplesin I kill bacteria after it penetrates the membrane and (ii) does bacterial death result from the inactivation of intracellular esterases as well as cell injury? In this study, the dynamic antibacterial processes of tachyplesin I and its interactions with Escherichia coli and Staphylococcus aureus were investigated using laser confocal scanning microscopy in combination with electron microscopy. The effects of tachyplesin I on E. coli cell membrane integrity, intracellular enzyme activity, and cell injury and death were investigated by flow cytometric analysis of cells following single- or double-staining with carboxyfluorescein diacetate or propidium iodide. The results of microscopy indicated that tachyplesin I kills bacteria by acting on the cell membrane and intracellular contents, with the cell membrane representing the primary target. Microscopy results also revealed that tachyplesin I uses different modes of action against E. coli and S. aureus. The results of flow cytometry showed that tachyplesin I caused E. coli cell death mainly by compromising cell membrane integrity and causing the inactivation of intracellular esterases. Flow cytometry also revealed dynamic changes in the different subpopulations of cells with increase in tachyplesin I concentrations. Bacteria exposed to 5 µg/mL of tachyplesin I did not die instantaneously; instead, they died gradually via a sublethal injury. However, upon exposure to 10-40 µg/mL of tachyplesin I, the bacteria died almost immediately. These results contribute to our understanding of the antibacterial mechanism employed by tachyplesin I.


Subject(s)
Anti-Infective Agents/pharmacology , Antimicrobial Cationic Peptides/pharmacology , Bacteria/drug effects , Bacteria/enzymology , Cell Wall/enzymology , DNA-Binding Proteins/pharmacology , Peptides, Cyclic/pharmacology , Amino Acid Sequence , Anti-Infective Agents/chemistry , Antimicrobial Cationic Peptides/chemistry , Bacteria/ultrastructure , DNA-Binding Proteins/chemistry , Enzyme Activation/drug effects , Escherichia coli/drug effects , Escherichia coli/metabolism , Esterases/metabolism , Flow Cytometry , Intracellular Space/metabolism , Microbial Sensitivity Tests , Microbial Viability/drug effects , Microscopy, Confocal , Peptides, Cyclic/chemistry
4.
Comput Math Methods Med ; 2014: 454310, 2014.
Article in English | MEDLINE | ID: mdl-24872839

ABSTRACT

Tachyplesin I (TP I) is an antimicrobial peptide isolated from the hemocytes of the horseshoe crab. With the developments of DNA microarray technology, the genetic analysis of the toxic effect of TP I on embryo was originally considered in our recent study. Based on our microarray data of the embryonic samples of zebrafish treated with the different doses of TP I, we performed a series of statistical data analyses to explore the toxic effect of TP I at the genomic level. In this paper, we first employed the hexaMplot to illustrate the continuous variation of the gene expressions of the embryonic cells treated with the different doses of TP I. The probabilistic model-based Hough transform was used to classify these differentially coexpressed genes of TP I on the zebrafish embryos. As a result, three line rays supported with the corresponding 174 genes were detected in our analysis. Some biological processes of the featured genes, such as antigen processing, nuclear chromatin, and structural constituent of eye lens, were significantly filtered with the smaller P values.


Subject(s)
Antimicrobial Cationic Peptides/toxicity , DNA-Binding Proteins/toxicity , Gene Expression Regulation, Developmental/drug effects , Peptides, Cyclic/toxicity , Zebrafish/embryology , Algorithms , Animals , Anti-Infective Agents/chemistry , Antigens/chemistry , Antimicrobial Cationic Peptides/chemistry , Chromatin/chemistry , Computational Biology/methods , Gene Expression Profiling , Genomics , Models, Statistical , Oligonucleotide Array Sequence Analysis , Probability
5.
Comput Math Methods Med ; 2013: 917502, 2013.
Article in English | MEDLINE | ID: mdl-24367394

ABSTRACT

Predicting disease progression is one of the most challenging problems in prostate cancer research. Adding gene expression data to prediction models that are based on clinical features has been proposed to improve accuracy. In the current study, we applied a logistic regression (LR) model combining clinical features and gene co-expression data to improve the accuracy of the prediction of prostate cancer progression. The top-scoring pair (TSP) method was used to select genes for the model. The proposed models not only preserved the basic properties of the TSP algorithm but also incorporated the clinical features into the prognostic models. Based on the statistical inference with the iterative cross validation, we demonstrated that prediction LR models that included genes selected by the TSP method provided better predictions of prostate cancer progression than those using clinical variables only and/or those that included genes selected by the one-gene-at-a-time approach. Thus, we conclude that TSP selection is a useful tool for feature (and/or gene) selection to use in prognostic models and our model also provides an alternative for predicting prostate cancer progression.


Subject(s)
Computational Biology , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Logistic Models , Prostatic Neoplasms/diagnosis , Prostatic Neoplasms/metabolism , Algorithms , Area Under Curve , DNA, Complementary/metabolism , Disease Progression , Genotype , Humans , Male , Oligonucleotide Array Sequence Analysis , Phenotype , Probability , Prognosis
6.
J Bioinform Comput Biol ; 10(2): 1241013, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22809348

ABSTRACT

Identifying genes associated with cancer development is typically accomplished by comparing mean expression values in normal and tumor tissues, which identifies differentially expressed (DE) genes. Interindividual variation (IV) in gene expression is indirectly included in DE gene identification because given the same absolute differences in means, genes with lower variance tend to have lower p-values. We explored the direct use of IV in gene expression to identify candidate genes associated with cancer development. We focused on prostate (PCa) and lung (LC) cancers and compared IV in the expression level of genes shown to be cancer related with that in all other genes in the human genome. Compared with all those other genes, cancer-related genes tended to have greater IV in normal tissues and a greater increase in IV during the transition from normal to tumorous tissue. Genes without significantly different mean expression values between tumor and normal tissues but with greater IV in tumor than in normal tissue (note: the DE-based approach completely ignores those genes) had stronger associations with clinically important features like Gleason score in PCa or tumor histology in LC than all other genes were. Our results suggest that analyzing IV in gene expression level is useful in identifying novel candidate genes associated with cancer development.


Subject(s)
Gene Expression Regulation, Neoplastic , Genetic Variation , Neoplasms/genetics , Biomarkers, Tumor/genetics , Gene Expression Profiling , Humans , Lung Neoplasms/genetics , Lung Neoplasms/metabolism , Male , Neoplasms/metabolism , Prostatic Neoplasms/genetics , Prostatic Neoplasms/metabolism
7.
Article in English | MEDLINE | ID: mdl-21116039

ABSTRACT

The effects of a drug on the genomic scale can be assessed in a three-color cDNA microarray with the three color intensities represented through the so-called hexaMplot. In our recent study, we have shown that the Hough Transform (HT) applied to the hexaMplot can be used to detect groups of coexpressed genes in the normal-disease-drug samples. However, the standard HT is not well suited for the purpose because 1) the assayed genes need first to be hard-partitioned into equally and differentially expressed genes, with HT ignoring possible information in the former group; 2) the hexaMplot coordinates are negatively correlated and there is no direct way of expressing this in the standard HT and 3) it is not clear how to quantify the association of coexpressed genes with the line along which they cluster. We address these deficiencies by formulating a dedicated probabilistic model-based HT. The approach is demonstrated by assessing effects of the drug Rg1 on homocysteine-treated human umbilical vein endothetial cells. Compared with our previous study, we robustly detect stronger natural groupings of coexpressed genes. Moreover, the gene groups show coherent biological functions with high significance, as detected by the Gene Ontology analysis.


Subject(s)
Cluster Analysis , Gene Expression Profiling , Image Processing, Computer-Assisted/methods , Models, Statistical , Oligonucleotide Array Sequence Analysis , Algorithms , Databases, Genetic , Endothelial Cells/drug effects , Endothelial Cells/metabolism , Gene Expression/drug effects , Ginsenosides/pharmacology , Homocysteine , Humans , Models, Biological , Regression Analysis
8.
BMC Cancer ; 10: 599, 2010 Nov 02.
Article in English | MEDLINE | ID: mdl-21044312

ABSTRACT

BACKGROUND: The genetic control of prostate cancer development is poorly understood. Large numbers of gene-expression datasets on different aspects of prostate tumorigenesis are available. We used these data to identify and prioritize candidate genes associated with the development of prostate cancer and bone metastases. Our working hypothesis was that combining meta-analyses on different but overlapping steps of prostate tumorigenesis will improve identification of genes associated with prostate cancer development. METHODS: A Z score-based meta-analysis of gene-expression data was used to identify candidate genes associated with prostate cancer development. To put together different datasets, we conducted a meta-analysis on 3 levels that follow the natural history of prostate cancer development. For experimental verification of candidates, we used in silico validation as well as in-house gene-expression data. RESULTS: Genes with experimental evidence of an association with prostate cancer development were overrepresented among our top candidates. The meta-analysis also identified a considerable number of novel candidate genes with no published evidence of a role in prostate cancer development. Functional annotation identified cytoskeleton, cell adhesion, extracellular matrix, and cell motility as the top functions associated with prostate cancer development. We identified 10 genes--CDC2, CCNA2, IGF1, EGR1, SRF, CTGF, CCL2, CAV1, SMAD4, and AURKA--that form hubs of the interaction network and therefore are likely to be primary drivers of prostate cancer development. CONCLUSIONS: By using this large 3-level meta-analysis of the gene-expression data to identify candidate genes associated with prostate cancer development, we have generated a list of candidate genes that may be a useful resource for researchers studying the molecular mechanisms underlying prostate cancer development.


Subject(s)
Gene Expression Regulation, Neoplastic , Prostatic Neoplasms/metabolism , Algorithms , Bone Neoplasms/secondary , Gene Expression Profiling , Humans , Male , Models, Statistical , Neoplasm Metastasis , Phenotype , Prostatic Neoplasms/pathology
9.
Phys Rev E Stat Nonlin Soft Matter Phys ; 80(4 Pt 1): 041917, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19905352

ABSTRACT

DNA rigidity is an important physical property originating from the DNA three-dimensional structure. Although the general DNA rigidity patterns in human promoters have been investigated, their distinct roles in transcription are largely unknown. In this paper, we discover four highly distinct human promoter groups based on similarity of their rigidity profiles. First, we find that all promoter groups conserve relatively rigid DNAs at the canonical TATA box [a consensus TATA(A/T)A(A/T) sequence] position, which are important physical signals in binding transcription factors. Second, we find that the genes activated by each group of promoters share significant biological functions based on their gene ontology annotations. Finally, we find that these human promoter groups correlate with the tissue-specific gene expression.


Subject(s)
DNA/genetics , Promoter Regions, Genetic , Base Sequence , Consensus Sequence , DNA/chemistry , DNA/metabolism , Humans , TATA Box , Transcriptional Activation
10.
BMC Bioinformatics ; 9 Suppl 1: S9, 2008.
Article in English | MEDLINE | ID: mdl-18315862

ABSTRACT

BACKGROUND: Identification of differentially expressed genes is a typical objective when analyzing gene expression data. Recently, Bayesian hierarchical models have become increasingly popular to solve this type of problems. These models show good performance in accommodating noise, variability and low replication of microarray data. However, the correlation between different fluorescent signals measured from a gene spot is ignored, which can diversely affect the data analysis step. In fact, the intensities of the two signals are significantly correlated across samples. The larger the log-transformed intensities are, the smaller the correlation is. RESULTS: Motivated by the complicated error relations in microarray data, we propose a multivariate hierarchical Bayesian framework for data analysis in the replicated microarray experiments. Gene expression data are modelled by a multivariate normal distribution, parameterized by the corresponding mean vectors and covariance matrixes with a conjugate prior distribution. Within the Bayesian framework, a generalized likelihood ratio test (GLRT) is also developed to infer the gene expression patterns. Simulation studies show that the proposed approach presents better operating characteristics and lower false discovery rate (FDR) than existing methods, especially when the correlation coefficient is large. The approach is illustrated with two examples of microarray analysis. The proposed method successfully detects significant genes closely related to the experimental states, which are verified by the biological information. CONCLUSIONS: The multivariate Bayesian model, compatible with the dependence between mean and variance in the univariate Bayesian model, relaxes the constant coefficient of variation assumption between measurements by adding a covariance structure. This model improves the identification of differentially expressed genes significantly since the Bayesian model fit well with the microarray data.


Subject(s)
Algorithms , Artificial Intelligence , Gene Expression Profiling/methods , Models, Genetic , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Bayes Theorem , Computer Simulation , Data Interpretation, Statistical , Multivariate Analysis , Reproducibility of Results , Sensitivity and Specificity
11.
J Theor Biol ; 251(2): 264-74, 2008 Mar 21.
Article in English | MEDLINE | ID: mdl-18199458

ABSTRACT

Biclustering is an important tool in microarray analysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a column-pair space. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes.


Subject(s)
Algorithms , Models, Genetic , Oligonucleotide Array Sequence Analysis , Pattern Recognition, Automated , Animals , Cluster Analysis , Computational Biology , Databases, Genetic , Gene Expression Profiling , Humans
12.
BMC Bioinformatics ; 8: 256, 2007 Jul 17.
Article in English | MEDLINE | ID: mdl-17634089

ABSTRACT

BACKGROUND: Three-color microarray experiments can be performed to assess drug effects on the genomic scale. The methodology may be useful in shortening the cycle, reducing the cost, and improving the efficiency in drug discovery and development compared with the commonly used dual-color technology. A visualization tool, the hexaMplot, is able to show the interrelations of gene expressions in normal-disease-drug samples in three-color microarray data. However, it is not enough to assess the complicated drug therapeutic effects based on the plot alone. It is important to explore more effective tools so that a deeper insight into gene expression patterns can be gained with three-color microarrays. RESULTS: Based on the celebrated Hough transform, a novel algorithm, HoughFeature, is proposed to extract line features in the hexaMplot corresponding to different drug effects. Drug therapy results can then be divided into a number of levels in relation to different groups of genes. We apply the framework to experimental microarray data to assess the complex effects of Rg1 (an extract of Chinese medicine) on Hcy-related HUVECs in details. Differentially expressed genes are classified into 15 functional groups corresponding to different levels of drug effects. CONCLUSION: Our study shows that the HoughFeature algorithm can reveal natural cluster patterns in gene expression data of normal-disease-drug samples. It provides both qualitative and quantitative information about up- or down-regulated genes. The methodology can be employed to predict disease susceptibility in gene therapy and assess drug effects on the disease based on three-color microarray data.


Subject(s)
Computer Graphics , Gene Expression Profiling/methods , Gene Expression/drug effects , Microscopy, Fluorescence, Multiphoton/methods , Oligonucleotide Array Sequence Analysis/methods , Pharmaceutical Preparations/administration & dosage , User-Computer Interface , Algorithms , Software
13.
Ying Yong Sheng Tai Xue Bao ; 17(2): 197-200, 2006 Feb.
Article in Chinese | MEDLINE | ID: mdl-16706037

ABSTRACT

Based on the study of pollen, stratum and 14C dating of Daqiao fen in Dunhua of Jilin Province, four pollen zones were distinguished, i.e., Pinus-Picea-Abies assemblage (2195 +/- 70 to appromimately 2045 +/- 70 B. P., Carex-Pinus-BetulaCorylus-Juglans assemblage (2045 +/- 70 to appromimately 1745 +/- 70 B.P.), Pinus-Corylus-Carpinus-Carex-Ranunculus assemblage (1745 +/- 70 to appromimately 705 +/- 70 B.P.), and Pinus-Picea-Abies-Betula-Carex assemblage (705 +/- 70 to appromimately 1950 AD). The vegetation changed from coniferous forest (similar to the vegetation currently foundover 1100 m a.s. 1. in this area), through conifer-broad-leaved mixed forest (similar to the vegetation currently found between 400-600 m a.s.l.) and conifer-broad-leaved mixed forest (similar to the vegetation currently found between 600-800 m a.s.l.), to conifer-broad-leaved mixed forest (similar to the vegetation currently found between 800-1100 m a.s.l.). Accordingly, Daqiao fen underwent the periods of gestation, fast development, expansion, and dieing out.


Subject(s)
Fossils , Pollen , Soil/analysis , Trees/classification , China , Ecosystem , Paleontology
14.
J Biopharm Stat ; 14(3): 629-46, 2004 Aug.
Article in English | MEDLINE | ID: mdl-15468756

ABSTRACT

DNA microarray offers a powerful and effective technology to monitor the changes in the gene expression levels for thousands of genes simultaneously. It is being widely applied to explore the quantitative alternation in gene regulation in response to a variety of aspects including diseases and exposure of toxicant. A common task in analyzing microarray data is to identify the differentially expressed genes under two different experimental conditions. Because of the large number of genes and small number of arrays, and higher signal-noise ratio in microarray data, many traditional approaches seem improper. In this paper, a multivariate mixture model is applied to model the expression level of replicated arrays, considering the differentially expressed genes as the outliers of the expression data. In order to detect the outliers of the multivariate mixture model, an effective and robust statistical method is first applied to microarray analysis. This method is based on the analysis of kurtosis coefficient (KC) of the projected multivariate data arising from a mixture model so as to identify the outliers. We utilize the multivariate KC algorithm to our microarray experiment with the control and toxic treatment. After the processing of data, the differential genes are successfully identified from 1824 genes on the UCLA M07 microarray chip. We also use the RT-PCR method and two robust statistical methods, minimum covariance determinant (MCD) and minimum volume ellipsoid (MVE), to verify the expression level of outlier genes identified by KC algorithm. We conclude that the robust multivariate tool is practical and effective for the detection of differentially expressed genes.


Subject(s)
Multivariate Analysis , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Algorithms , Animals , Cadmium Chloride/toxicity , Data Interpretation, Statistical , Male , Mice , Mice, Inbred ICR , Microcomputers , Models, Statistical , Mutagens/toxicity , Reverse Transcriptase Polymerase Chain Reaction
SELECTION OF CITATIONS
SEARCH DETAIL
...