ABSTRACT
Overlapping gene groups (OGGs) arise when exons of one gene are contained within the introns of another. Typically, the two overlapping genes are encoded on opposite DNA strands. OGGs are often associated with specific disease phenotypes. In this report, we identify genes with OGG architecture and genes encoding multiple long amino acid runs and examine their relations to diseases. OGGs appear to be susceptible to genomic rearrangements as happens commonly with the loci of the DiGeorge syndrome on human chromosome 22. We also examine the degree of conservation of OGGs between human and mouse. Our analyses suggest that (i) a high proportion of genes in OGG regions are disease-associated, (ii) genomic rearrangements are likely to occur within OGGs, possibly as a consequence of anomalous sequence features prevalent in these regions, and (iii) multiple amino acid runs are also frequently associated with pathologies.
Subject(s)
Amino Acids/genetics , Chromosomes, Human, Pair 22 , DiGeorge Syndrome/genetics , Genes, Overlapping , Animals , Exons , Gene Rearrangement , Humans , Introns , MiceABSTRACT
Human chromosomes 21 and 22 (mainly the q-arms) were the first complete parts of the human genome released. Our analysis of genes, pseudogenes (Psig), and Alu repeats across these chromosomes include the following findings: The number of gene structures containing untranslated exons exceeds 25%; the terminal exon tends to be the largest among exons, whereas, the initial intron tends to be the largest among introns; single-exon gene length is approximately the mean gene exon number times the mean internal exon length; processed Psig lengths are on average approximately the same as single-exon gene length; and the G+C content and length of genes are uncorrelated. The counts and distribution of genes, Psig, and Alu sequences and G+C variation are evaluated with respect to clusters and overdispersions. Other assessments concern comparisons of intergenic lengths, properties of Psig sequences, and correlations between Alu and Psig sequences.