Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
1.
J Theor Biol ; 584: 111794, 2024 May 07.
Article in English | MEDLINE | ID: mdl-38499267

ABSTRACT

Tree shape statistics based on peripheral structures have been utilized to study evolutionary mechanisms and inference methods. Partially motivated by a recent study by Pouryahya and Sankoff on modeling the accumulation of subgenomes in the evolution of polyploids, we present the distribution of subtree patterns with four or fewer leaves for the unrooted Proportional to Distinguishable Arrangements (PDA) model. We derive a recursive formula for computing the joint distributions, as well as a Strong Law of Large Numbers and a Central Limit Theorem for the joint distributions. This enables us to confirm several conjectures proposed by Pouryahya and Sankoff, as well as provide some theoretical insights into their observations. Based on their empirical datasets, we demonstrate that the statistical test based on the joint distribution could be more sensitive than those based on one individual subtree pattern to detect the existence of evolutionary forces such as whole genome duplication.


Subject(s)
Algorithms , Models, Genetic , Phylogeny
2.
Sci Rep ; 13(1): 5291, 2023 Mar 31.
Article in English | MEDLINE | ID: mdl-37002274

ABSTRACT

Nature-inspired swarm-based algorithms are increasingly applied to tackle high-dimensional and complex optimization problems across disciplines. They are general purpose optimization algorithms, easy to implement and assumption-free. Some common drawbacks of these algorithms are their premature convergence and the solution found may not be a global optimum. We propose a general, simple and effective strategy, called heterogeneous Perturbation-Projection (HPP), to enhance an algorithm's exploration capability so that our sufficient convergence conditions are guaranteed to hold and the algorithm converges almost surely to a global optimum. In summary, HPP applies stochastic perturbation on half of the swarm agents and then project all agents onto the set of feasible solutions. We illustrate this approach using three widely used nature-inspired swarm-based optimization algorithms: particle swarm optimization (PSO), bat algorithm (BAT) and Ant Colony Optimization for continuous domains (ACO). Extensive numerical experiments show that the three algorithms with the HPP strategy outperform the original versions with 60-80% the times with significant margins.

3.
Theor Popul Biol ; 149: 27-38, 2023 02.
Article in English | MEDLINE | ID: mdl-36566944

ABSTRACT

Distributional properties of tree shape statistics under random phylogenetic tree models play an important role in investigating the evolutionary forces underlying the observed phylogenies. In this paper, we study two subtree counting statistics, the number of cherries and that of pitchforks for the Ford model, the alpha model introduced by Daniel Ford. It is a one-parameter family of random phylogenetic tree models which includes the proportional to distinguishable arrangement (PDA) and the Yule models, two tree models commonly used in phylogenetics. Based on a non-uniform version of the extended Pólya urn models in which negative entries are permitted for their replacement matrices, we obtain the strong law of large numbers and the central limit theorem for the joint distribution of these two statistics for the Ford model. Furthermore, we derive a recursive formula for computing the exact joint distribution of these two statistics. This leads to exact formulas for their means and higher order asymptotic expansions of their second moments, which allows us to identify a critical parameter value for the correlation between these two statistics. That is, when the number of tree leaves is sufficiently large, they are negatively correlated for 0≤α≤1/2 and positively correlated for 1/2<α<1.


Subject(s)
Biological Evolution , Models, Genetic , Phylogeny
4.
J Math Biol ; 83(4): 40, 2021 09 23.
Article in English | MEDLINE | ID: mdl-34554333

ABSTRACT

Tree shape statistics provide valuable quantitative insights into evolutionary mechanisms underpinning phylogenetic trees, a commonly used graph representation of evolutionary relationships among taxonomic units ranging from viruses to species. We study two subtree counting statistics, the number of cherries and the number of pitchforks, for random phylogenetic trees generated by two widely used null tree models: the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. By developing limit theorems for a version of extended Pólya urn models in which negative entries are permitted for their replacement matrices, we deduce the strong laws of large numbers and the central limit theorems for the joint distributions of these two counting statistics for the PDA and the YHK models. Our results indicate that the limiting behaviour of these two statistics, when appropriately scaled using the number of leaves in the underlying trees, is independent of the initial tree used in the tree generating process.


Subject(s)
Biological Evolution , Plant Leaves , Models, Genetic , Phylogeny
5.
Metabolites ; 11(4)2021 Apr 08.
Article in English | MEDLINE | ID: mdl-33918080

ABSTRACT

We conducted untargeted metabolomics analysis of plasma samples from a cross-sectional case-control study with 30 healthy controls, 30 patients with diabetes mellitus and normal renal function (DM-N), and 30 early diabetic nephropathy (DKD) patients using liquid chromatography-mass spectrometry (LC-MS). We employed two different modes of MS acquisition on a high-resolution MS instrument for identification and semi-quantification, and analyzed data using an advanced multivariate method for prioritizing differentially abundant metabolites. We obtained semi-quantification data for 1088 unique compounds (~55% lipids), excluding compounds that may be either exogenous compounds or treated as medication. Supervised classification analysis over a confounding-free partial correlation network shows that prostaglandins, phospholipids, nucleotides, sugars, and glycans are elevated in the DM-N and DKD patients, whereas glutamine, phenylacetylglutamine, 3-indoxyl sulfate, acetylphenylalanine, xanthine, dimethyluric acid, and asymmetric dimethylarginine are increased in DKD compared to DM-N. The data recapitulate the well-established plasma metabolome changes associated with DM-N and suggest uremic solutes and oxidative stress markers as the compounds indicating early renal function decline in DM patients.

6.
BMC Med Genomics ; 13(Suppl 10): 150, 2020 10 22.
Article in English | MEDLINE | ID: mdl-33087126

ABSTRACT

BACKGROUND: Understanding the mechanisms underlying the malignant progression of cancer cells is crucial for early diagnosis and therapeutic treatment for cancer. Mutational heterogeneity of breast cancer suggests that about a dozen of cancer genes consistently mutate, together with many other genes mutating occasionally, in patients. METHODS: Using the whole-exome sequences and clinical information of 468 patients in the TCGA project data portal, we analyzed mutated protein domains and signaling pathway alterations in order to understand how infrequent mutations contribute aggregately to tumor progression in different stages. RESULTS: Our findings suggest that while the spectrum of mutated domains was diverse, mutations were aggregated in Pkinase, Pkinase Tyr, Y-Phosphatase and Src-homology 2 domains, highlighting the genetic heterogeneity in activating the protein tyrosine kinase signaling pathways in invasive ductal breast cancer. CONCLUSIONS: The study provides new clues to the functional role of infrequent mutations in protein domain regions in different stages for invasive ductal breast cancer, yielding biological insights into metastasis for invasive ductal breast cancer.


Subject(s)
Carcinoma, Ductal, Breast/genetics , DNA Mutational Analysis , Mutation , Neoplasm Proteins/genetics , Biomarkers, Tumor/genetics , Carcinoma, Ductal, Breast/pathology , Disease Progression , Female , Humans , Neoplasm Staging , Exome Sequencing
7.
Theor Popul Biol ; 132: 92-104, 2020 04.
Article in English | MEDLINE | ID: mdl-32135170

ABSTRACT

Tree shape statistics are important for investigating evolutionary mechanisms mediating phylogenetic trees. As a step towards bridging shape statistics between rooted and unrooted trees, we present a comparison study on two subtree statistics known as numbers of cherries and pitchforks for the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. Based on recursive formulas on the joint distribution of the number of cherries and that of pitchforks, it is shown that cherry distributions are log-concave for both rooted and unrooted trees under these two models. Furthermore, the mean number of cherries and that of pitchforks for unrooted trees converge respectively to those for rooted trees under the YHK model while there exists a limiting gap of 1∕4 for the PDA model. Finally, the total variation distances between the cherry distributions of rooted and those of unrooted trees converge for both models. Our results indicate that caution is required for conducting statistical analysis for tree shapes involving both rooted and unrooted trees.


Subject(s)
Biological Evolution , Models, Genetic , Algorithms , Phylogeny
8.
NPJ Syst Biol Appl ; 5: 22, 2019.
Article in English | MEDLINE | ID: mdl-31312515

ABSTRACT

Computational tools for multiomics data integration have usually been designed for unsupervised detection of multiomics features explaining large phenotypic variations. To achieve this, some approaches extract latent signals in heterogeneous data sets from a joint statistical error model, while others use biological networks to propagate differential expression signals and find consensus signatures. However, few approaches directly consider molecular interaction as a data feature, the essential linker between different omics data sets. The increasing availability of genome-scale interactome data connecting different molecular levels motivates a new class of methods to extract interactive signals from multiomics data. Here we developed iOmicsPASS, a tool to search for predictive subnetworks consisting of molecular interactions within and between related omics data types in a supervised analysis setting. Based on user-provided network data and relevant omics data sets, iOmicsPASS computes a score for each molecular interaction, and applies a modified nearest shrunken centroid algorithm to the scores to select densely connected subnetworks that can accurately predict each phenotypic group. iOmicsPASS detects a sparse set of predictive molecular interactions without loss of prediction accuracy compared to alternative methods, and the selected network signature immediately provides mechanistic interpretation of the multiomics profile representing each sample group. Extensive simulation studies demonstrate clear benefit of interaction-level modeling. iOmicsPASS analysis of TCGA/CPTAC breast cancer data also highlights new transcriptional regulatory network underlying the basal-like subtype as positive protein markers, a result not seen through analysis of individual omics data.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Algorithms , Breast Neoplasms/genetics , Gene Expression Profiling/methods , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Genomics/methods , Humans , Models, Statistical , Models, Theoretical , Proteomics/methods , Software
9.
Theor Popul Biol ; 108: 13-23, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26607430

ABSTRACT

In population and evolutionary biology, hypotheses about micro-evolutionary and macro-evolutionary processes are commonly tested by comparing the shape indices of empirical evolutionary trees with those predicted by neutral models. A key ingredient in this approach is the ability to compute and quantify distributions of various tree shape indices under random models of interest. As a step to meet this challenge, in this paper we investigate the joint distribution of cherries and pitchforks (that is, subtrees with two and three leaves) under two widely used null models: the Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model. Based on two novel recursive formulae, we propose a dynamic approach to numerically compute the exact joint distribution (and hence the marginal distributions) for trees of any size. We also obtained insights into the statistical properties of trees generated under these two models, including a constant correlation between the cherry and the pitchfork distributions under the YHK model, and the log-concavity and unimodality of the cherry distributions under both models. In addition, we show that there exists a unique change point for the cherry distributions between these two models.


Subject(s)
Biological Evolution , Models, Biological , Phylogeny , Humans
10.
J Bioinform Comput Biol ; 13(4): 1550018, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26166210

ABSTRACT

Transcript-level quantification is often measured across two groups of patients to aid the discovery of biomarkers and detection of biological mechanisms involving these biomarkers. Statistical tests lack power and false discovery rate is high when sample size is small. Yet, many experiments have very few samples (≤ 5). This creates the impetus for a method to discover biomarkers and mechanisms under very small sample sizes. We present a powerful method, ESSNet, that is able to identify subnetworks consistently across independent datasets of the same disease phenotypes even under very small sample sizes. The key idea of ESSNet is to fragment large pathways into smaller subnetworks and compute a statistic that discriminates the subnetworks in two phenotypes. We do not greedily select genes to be included based on differential expression but rely on gene-expression-level ranking within a phenotype, which is shown to be stable even under extremely small sample sizes. We test our subnetworks on null distributions obtained by array rotation; this preserves the gene-gene correlation structure and is suitable for datasets with small sample size allowing us to consistently predict relevant subnetworks even when sample size is small. For most other methods, this consistency drops to less than 10% when we test them on datasets with only two samples from each phenotype, whereas ESSNet is able to achieve an average consistency of 58% (72% when we consider genes within the subnetworks) and continues to be superior when sample size is large. We further show that the subnetworks identified by ESSNet are highly correlated to many references in the biological literature. ESSNet and supplementary material are available at: http://compbio.ddns.comp.nus.edu.sg:8080/essnet .


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Humans , Phenotype , Reproducibility of Results , Sample Size
11.
Int J Nephrol ; 2015: 156484, 2015.
Article in English | MEDLINE | ID: mdl-25649135

ABSTRACT

Background. The use of spot urine protein to creatinine ratios in estimating 24 hr urine protein excretion rates for diagnosing and managing chronic kidney disease (CKD) predated the standardization of creatinine assays. The comparative predictive performance of spot urine ratios and 24 hr urine collections (of albumin or protein) for the clinical outcomes of CKD progression, end-stage renal disease (ESRD), and mortality in Asians is unclear. We compared 4 methods of assessing urine protein excretion in a multiethnic population of CKD patients. Methods. Patients with CKD (n = 232) provided 24 hr urine collections followed by spot urine samples the next morning. We created multiple linear regression models to assess the factors associated with GFR decline (median follow-up: 37 months, IQR 26-41) and constructed Cox proportional-hazards models for predicting the combined outcome of ESRD and death. Results. The linear regression models showed that 24 hr urine protein excretion was most predictive of GFR decline but all other methods were similar. For the combined outcomes of ESRD and death, the proportional hazards models had similar predictive performance. Conclusions. We showed that all methods of assessments were comparable for clinical end-points, and any method can be used in clinical practice or research.

12.
Nucleic Acids Res ; 42(20): 12380-7, 2014 Nov 10.
Article in English | MEDLINE | ID: mdl-25300490

ABSTRACT

Neph et al. (2012) (Circuitry and dynamics of human transcription factor regulatory networks. Cell, 150: 1274-1286) reported the transcription factor (TF) regulatory networks of 41 human cell types using the DNaseI footprinting technique. This provides a valuable resource for uncovering regulation principles in different human cells. In this paper, the architectures of the 41 regulatory networks and the distributions of housekeeping and specific regulatory interactions are investigated. The TF regulatory networks of different human cell types demonstrate similar global three-layer (top, core and bottom) hierarchical architectures, which are greatly different from the yeast TF regulatory network. However, they have distinguishable local organizations, as suggested by the fact that wiring patterns of only a few TFs are enough to distinguish cell identities. The TF regulatory network of human embryonic stem cells (hESCs) is dense and enriched with interactions that are unseen in the networks of other cell types. The examination of specific regulatory interactions suggests that specific interactions play important roles in hESCs.


Subject(s)
Gene Regulatory Networks , Transcription Factors/metabolism , Algorithms , Embryonic Stem Cells/metabolism , Humans
13.
PLoS One ; 8(11): e78448, 2013.
Article in English | MEDLINE | ID: mdl-24260118

ABSTRACT

Complex networks abound in physical, biological and social sciences. Quantifying a network's topological structure facilitates network exploration and analysis, and network comparison, clustering and classification. A number of Wiener type indices have recently been incorporated as distance-based descriptors of complex networks, such as the R package QuACN. Wiener type indices are known to depend both on the network's number of nodes and topology. To apply these indices to measure similarity of networks of different numbers of nodes, normalization of these indices is needed to correct the effect of the number of nodes in a network. This paper aims to fill this gap. Moreover, we introduce an f-Wiener index of network G, denoted by Wf(G). This notion generalizes the Wiener index to a very wide class of Wiener type indices including all known Wiener type indices. We identify the maximum and minimum of Wf(G) over a set of networks with n nodes. We then introduce our normalized-version of f-Wiener index. The normalized f-Wiener indices were demonstrated, in a number of experiments, to improve significantly the hierarchical clustering over the non-normalized counterparts.


Subject(s)
Models, Theoretical
14.
Nat Commun ; 4: 2241, 2013.
Article in English | MEDLINE | ID: mdl-23917172

ABSTRACT

Small over-represented motifs in biological networks often form essential functional units of biological processes. A natural question is to gauge whether a motif occurs abundantly or rarely in a biological network. Here we develop an accurate method to estimate the occurrences of a motif in the entire network from noisy and incomplete data, and apply it to eukaryotic interactomes and cell-specific transcription factor regulatory networks. The number of triangles in the human interactome is about 194 times that in the Saccharomyces cerevisiae interactome. A strong positive linear correlation exists between the numbers of occurrences of triad and quadriad motifs in human cell-specific transcription factor regulatory networks. Our findings show that the proposed method is general and powerful for counting motifs and can be applied to any network regardless of its topological structure.


Subject(s)
Protein Interaction Mapping , Protein Interaction Maps , Animals , Arabidopsis/metabolism , Caenorhabditis elegans/metabolism , Gene Regulatory Networks , Humans , Protein Binding , Saccharomyces cerevisiae , Transcription Factors/metabolism
15.
Article in English | MEDLINE | ID: mdl-24407300

ABSTRACT

Evolutionary history of protein-protein interaction (PPI) networks provides valuable insight into molecular mechanisms of network growth. In this paper, we study how to infer the evolutionary history of a PPI network from its protein duplication relationship. We show that for a plausible evolutionary history of a PPI network, its relative quality, measured by the so-called loss number, is independent of the growth parameters of the network and can be computed efficiently. This finding leads us to propose two fast maximum likelihood algorithms to infer the evolutionary history of a PPI network given the duplication history of its proteins. Simulation studies demonstrated that our approach, which takes advantage of protein duplication information, outperforms NetArch, the first maximum likelihood algorithm for PPI network history reconstruction. Using the proposed method, we studied the topological change of the PPI networks of the yeast, fruitfly, and worm.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping , Proteins/chemistry , Algorithms , Animals , Caenorhabditis elegans , Cluster Analysis , Computer Simulation , Drosophila melanogaster , Evolution, Molecular , Likelihood Functions , Mutation , Protein Interaction Domains and Motifs , Protein Multimerization , Saccharomyces cerevisiae , Software
16.
Biol Direct ; 6: 27, 2011 May 20.
Article in English | MEDLINE | ID: mdl-21595983

ABSTRACT

BACKGROUND: False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. However, in many practical testing problems such as in genomics, the p-values could be under-estimated or over-estimated for many known or unknown reasons. Consequently, FDR estimation would then be influenced and lose its veracity. RESULTS: We propose a new extrapolative method called Constrained Regression Recalibration (ConReg-R) to recalibrate the empirical p-values by modeling their distribution to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses (π0) and FDR are estimated after the recalibration. CONCLUSIONS: ConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments.


Subject(s)
Data Interpretation, Statistical , Regression Analysis , Algorithms , Gene Expression Profiling , Gene Expression Regulation, Fungal , Humans , Models, Statistical , Reproducibility of Results , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/physiology , Sample Size , Sequence Analysis, RNA
17.
J Bioinform Comput Biol ; 8(1): 99-115, 2010 Feb.
Article in English | MEDLINE | ID: mdl-20183876

ABSTRACT

BACKGROUND: Current miRNA target prediction tools have the common problem that their false positive rate is high. This renders identification of co-regulating groups of miRNAs and target genes unreliable. In this study, we describe a procedure to identify highly probable co-regulating miRNAs and the corresponding co-regulated gene groups. Our procedure involves a sequence of statistical tests: (1) identify genes that are highly probable miRNA targets; (2) determine for each such gene, the minimum number of miRNAs that co-regulate it with high probability; (3) find, for each such gene, the combination of the determined minimum size of miRNAs that co-regulate it with the lowest p-value; and (4) discover for each such combination of miRNAs, the group of genes that are co-regulated by these miRNAs with the lowest p-value computed based on GO term annotations of the genes. RESULTS: Our method identifies 4, 3 and 2-term miRNA groups that co-regulate gene groups of size at least 3 in human. Our result suggests some interesting hypothesis on the functional role of several miRNAs through a "guilt by association" reasoning. For example, miR-130, miR-19 and miR-101 are known neurodegenerative diseases associated miRNAs. Our 3-term miRNA table shows that miR-130/19/101 form a co-regulating group of rank 22 (p-value =1.16 x 10(-2)). Since miR-144 is co-regulating with miR-130, miR-19 and miR-101 of rank 4 (p-value = 1.16 x 10(-2)) in our 4-term miRNA table, this suggests hsa-miR-144 may be neurodegenerative diseases related miRNA. CONCLUSIONS: This work identifies highly probable co-regulating miRNAs, which are refined from the prediction by computational tools using (1) signal-to-noise ratio to get high accurate regulating miRNAs for every gene, and (2) Gene Ontology to obtain functional related co-regulating miRNA groups. Our result has partly been supported by biological experiments. Based on prediction by TargetScanS, we found highly probable target gene groups in the Supplementary Information. This result might help biologists to find small set of miRNAs for genes of interest rather than huge amount of miRNA set. SUPPLEMENTARY INFORMATION: http://www.deakin.edu.au/~phoebe/JBCBAnChen/JBCB.htm.


Subject(s)
Databases, Nucleic Acid/statistics & numerical data , MicroRNAs/genetics , Algorithms , Animals , Computational Biology , Gene Expression Regulation , Humans , Multigene Family , Software Design
18.
J Comput Biol ; 15(5): 469-87, 2008 Jun.
Article in English | MEDLINE | ID: mdl-18549302

ABSTRACT

A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg.


Subject(s)
Algorithms , Pattern Recognition, Automated , Repetitive Sequences, Nucleic Acid , Genome, Human , Humans , Probability , Sequence Analysis, DNA/methods
19.
BMC Bioinformatics ; 8: 163, 2007 May 21.
Article in English | MEDLINE | ID: mdl-17517140

ABSTRACT

BACKGROUND: Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind of genomes might not work well for another. In this paper, we propose the AT excursion method, which is a score-based approach, to quantify local AT abundance in genomic sequences and use the identified high scoring segments for predicting replication origins. This method has the advantages of requiring no preset window size and having rigorous criteria to evaluate statistical significance of high scoring segments. RESULTS: We have evaluated the AT excursion method by checking its predictions against known replication origins in herpesviruses and comparing its performance with an existing base weighted score method (BWS1). Out of 43 known origins, 39 are predicted by either one or the other method and 26 origins are predicted by both. The excursion method identifies six origins not predicted by BWS1, showing that the AT excursion method is a valuable complement to BWS1. We have also applied the AT excursion method to two other families of double stranded DNA viruses, the poxviruses and iridoviruses, of which very few replication origins are documented in the public domain. The prediction results are made available as supplementary materials at 1. Preliminary investigation shows that the proposed method works well on some larger genomes too. CONCLUSION: The AT excursion method will be a useful computational tool for identifying replication origins in a variety of genomic sequences.


Subject(s)
AT Rich Sequence/genetics , Algorithms , Chromosome Mapping/methods , Genome, Viral/genetics , Herpesviridae/genetics , Replication Origin/genetics , Sequence Analysis, DNA/methods , Base Sequence , Molecular Sequence Data
20.
Genomics ; 89(3): 378-84, 2007 Mar.
Article in English | MEDLINE | ID: mdl-17208408

ABSTRACT

We identified a set of transcriptional elements that are conserved and overrepresented within the promoters of human, mouse, and rat GRIAs by comparing these promoters against a collection of 10,741 gene promoters. Cells regulate functional groups of genes by coordinating the transcriptional and/or posttranscriptional mRNA levels of interacting genes. As such, it is expected that functional groups of genes share the same transcriptional features within their promoters. We found 47 genes whose promoters contain the same combination of transcriptional elements that are overrepresented within the promoters of the GRIA gene family. Coexpressed genes may be transcriptionally coregulated, which in turn suggests that these genes may play complementary roles within a particular functional context. Using microarray expression data, we found 24 (of the 47) genes that share not only a similar promoter profile with GRIAs but also a well-correlated gene expression profile and, thus, we believe these to be coregulated with GRIAs.


Subject(s)
Gene Expression Regulation , Promoter Regions, Genetic , Receptors, AMPA/genetics , Animals , Binding Sites , Brain/metabolism , Computational Biology , Humans , Mice , Oligonucleotide Array Sequence Analysis , Rats , Regulatory Elements, Transcriptional , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...