Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Breast Cancer Res Treat ; 191(1): 63-75, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34698969

ABSTRACT

PURPOSE: Invasion of carcinoma cells into surrounding tissue affects breast cancer staging, influences choice of treatment, and impacts on patient outcome. KIF21A is a member of the kinesin superfamily that has been well-studied in congenital extraocular muscle fibrosis. However, its biological relevance in breast cancer is unknown. This study investigated the functional roles of KIF21A in this malignancy and examined its expression pattern in breast cancer tissue. METHODS: The function of KIF21A in breast carcinoma was studied in vitro by silencing its expression in breast cancer cells and examining the changes in cellular activities. Immunohistochemical staining of breast cancer tissue microarrays was performed to determine the expression patterns of KIF21A. RESULTS: Knocking down the expression of KIF21A using siRNA in MDA-MB-231 and MCF7 human breast cancer cells resulted in significant decreases in tumor cell migration and invasiveness. This was associated with reduced Patched 1 expression and F-actin microfilaments. Additionally, the number of focal adhesion kinase- and paxillin-associated focal adhesions was increased. Immunohistochemical staining of breast cancer tissue microarrays showed that KIF21A was expressed in both the cytoplasmic and nuclear compartments of carcinoma cells. Predominance of cytoplasmic KIF21A was significantly associated with larger tumors and high grade cancer, and prognostic of cause-specific overall patient survival and breast cancer recurrence. CONCLUSION: The data demonstrates that KIF21A is an important regulator of breast cancer aggressiveness and may be useful in refining prognostication of this malignant disease.


Subject(s)
Breast Neoplasms , Kinesins , Breast Neoplasms/genetics , Cytoplasm , Female , Humans , Kinesins/genetics , Neoplasm Recurrence, Local/genetics , Prognosis
2.
Psychoneuroendocrinology ; 78: 185-192, 2017 04.
Article in English | MEDLINE | ID: mdl-28212520

ABSTRACT

Why some individuals seek social engagement while others shy away has profound implications for normal and pathological human behavior. Evidence suggests that oxytocin (OT), the paramount human social hormone, and CD38 that governs OT release, contribute to individual differences in social skills from intense social involvement to extreme avoidance that characterize autism. To explore the neurochemical underpinnings of sociality, CD38 expression of peripheral blood leukocytes (PBL) was measured in Han Chinese undergraduates. First, CD38 mRNA levels were correlated with lower Autism Quotient (AQ), indicating enhanced social skills. AQ assesses the extent of autistic-like traits including the propensity and dexterity needed for successful social engagement in the general population. Second, three CD157 eQTL SNPs in the CD38/CD157 gene region were associated with CD38 expression. CD157 is a paralogue of CD38 and is contiguous with it on chromosome 4p15. Third, association was also observed between the CD157 eQTL SNPs, CD38 expression and AQ. In the full model, CD38 expression and CD157 eQTL SNPs altogether account for a substantial 14% of the variance in sociality. Fourth, functionality of CD157 eQTL SNPs was suggested by a significant association with plasma oxytocin immunoreactivity products. Fifth, the ecological validity of these findings was demonstrated with subjects with higher PBL CD38 expression having more friends, especially for males. Furthermore, CD157 sequence variation predicts scores on the Friendship questionnaire. To summarize, this study by uniquely leveraging various measures reveals salient elements contributing to nonkin sociality and friendship, revealing a likely pathway underpinning the transition from normality to psychopathology.


Subject(s)
ADP-ribosyl Cyclase 1/genetics , ADP-ribosyl Cyclase/genetics , Antigens, CD/genetics , Friends , Membrane Glycoproteins/genetics , Polymorphism, Single Nucleotide , Social Skills , ADP-ribosyl Cyclase/metabolism , ADP-ribosyl Cyclase 1/metabolism , Antigens, CD/metabolism , Autistic Disorder/genetics , Female , GPI-Linked Proteins/genetics , GPI-Linked Proteins/metabolism , Genetic Association Studies , Humans , Leukocytes, Mononuclear/metabolism , Male , Membrane Glycoproteins/metabolism , Oxytocin/blood , Quantitative Trait Loci , Young Adult
3.
Genome Biol Evol ; 9(1): 134-149, 2017 01 01.
Article in English | MEDLINE | ID: mdl-28175284

ABSTRACT

Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage-specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of nonsynonymous to synonymous rates of substitution tends to be underestimated over three data sets of mammals, vertebrates, and insects. Our basis for comparison is a nonstationary codon substitution model that allows sequence composition to change. Goodness-of-fit results demonstrate that our new model tends to fit the data better. Direct measurement of nonstationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.


Subject(s)
Codon , Models, Genetic , Proteins/genetics , Selection, Genetic , Animals , Humans , Markov Chains
4.
Genetics ; 205(2): 843-856, 2017 02.
Article in English | MEDLINE | ID: mdl-27974498

ABSTRACT

Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A[Formula: see text]G mutations. We show that major effects of neighbors on germline mutation lie within [Formula: see text] of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T[Formula: see text]C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif.


Subject(s)
Nucleotide Motifs , Point Mutation , Sequence Analysis, DNA/methods , Software , Animals , CpG Islands , Data Interpretation, Statistical , Humans
5.
Syst Biol ; 64(2): 281-93, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25503772

ABSTRACT

The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.


Subject(s)
Evolution, Molecular , Models, Genetic , Animals , Humans , Mammals/classification , Mammals/genetics , Markov Chains , Phylogeny
6.
Singapore Med J ; 55(9): 468-72, 2014 Sep.
Article in English | MEDLINE | ID: mdl-25273930

ABSTRACT

INTRODUCTION: While overexpression of syndecan-1 has been associated with aggressive breast cancer in the Caucasian population, the expression pattern of syndecan-1 in Asian women remains unclear. Triple-positive breast carcinoma, in particular, is a unique subtype that has not been extensively studied. We aimed to evaluate the role of syndecan-1 as a potential biomarker and prognostic factor for triple-positive breast carcinoma in Asian women. METHODS: Using immunohistochemistry, staining scores of 61 triple­positive breast carcinoma specimens were correlated with patients' clinicopathological variables such as age, ethnicity, tumour size, histological grade, lymph node status, lymphovascular invasion, associated ductal carcinoma in situ grade, recurrence and overall survival. RESULTS: Syndecan-1 had intense staining scores in triple­positive invasive ductal breast carcinomas when compared to normal breast tissue. On multivariate analysis, syndecan-1 epithelial total percentage and immunoreactivity score showed statistical correlation with survival (p = 0.02). CONCLUSION: The intense staining scores of syndecan-1 and their correlation with overall survival in patients with triple-positive breast carcinoma suggest that syndecan-1 may have a role as a biological and prognostic marker in patients with this specific subtype of breast cancer.


Subject(s)
Biomarkers, Tumor/blood , Breast Neoplasms/blood , Breast Neoplasms/mortality , Syndecan-1/blood , Adult , Aged , Aged, 80 and over , Asian People , Breast Neoplasms/classification , Estrogen Receptor alpha/metabolism , Female , Humans , Immunohistochemistry , Kaplan-Meier Estimate , Middle Aged , Multivariate Analysis , Prognosis , Receptor, ErbB-2/metabolism , Receptors, Progesterone/metabolism , Tissue Array Analysis , Treatment Outcome
7.
PLoS One ; 8(7): e69187, 2013.
Article in English | MEDLINE | ID: mdl-23935949

ABSTRACT

Continuous-time Markov processes are often used to model the complex natural phenomenon of sequence evolution. To make the process of sequence evolution tractable, simplifying assumptions are often made about the sequence properties and the underlying process. The validity of one such assumption, time-homogeneity, has never been explored. Violations of this assumption can be found by identifying non-embeddability. A process is non-embeddable if it can not be embedded in a continuous time-homogeneous Markov process. In this study, non-embeddability was demonstrated to exist when modelling sequence evolution with Markov models. Evidence of non-embeddability was found primarily at the third codon position, possibly resulting from changes in mutation rate over time. Outgroup edges and those with a deeper time depth were found to have an increased probability of the underlying process being non-embeddable. Overall, low levels of non-embeddability were detected when examining individual edges of triads across a diverse set of alignments. Subsequent phylogenetic reconstruction analyses demonstrated that non-embeddability could impact on the correct prediction of phylogenies, but at extremely low levels. Despite the existence of non-embeddability, there is minimal evidence of violations of the local time homogeneity assumption and consequently the impact is likely to be minor.


Subject(s)
Evolution, Molecular , Markov Chains , Models, Genetic , Mutation , Algorithms , Animals , Humans , Introns , Mice , Nucleotides/genetics , Open Reading Frames/genetics , Phylogeny , Rats
8.
Infect Genet Evol ; 18: 362-6, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23499773

ABSTRACT

The most general context-dependent Markov substitution process, where each substitution event involves only one site and substitution rates depend on the whole sequence, is presented for the first time. The focus is on circular DNA sequences, where the problem of specifying the behaviour of the first and last sites in a linear sequence does not arise. Important special cases include (1) the established models where each site behaves independently, (2) models which are increasingly applied to non-coding DNA, where each site depends on only the immediate neighbouring sites, and (3) models where each site depends on two closest neighbours on both sides, such as the codon models. These special cases are classified and illustrated by published models. It is shown that the existing codon substitution models mix up the mutation and selection processes, rendering the substitution rates challenging to interpret. The classification suggests the study of a more interpretable codon model, where the mutation and selection processes are clearly delineated. Furthermore, this model allows a natural accommodation of possibly different selection pressures in overlapping reading frames, which may contribute to furthering the understanding of viral diseases. Also included are brief discussions on the stationary distribution of a context-dependent substitution process and a simple recipe for simulating it on a computer.


Subject(s)
Codon , DNA, Circular/genetics , Evolution, Molecular , Models, Genetic , Computer Simulation , Markov Chains
9.
Math Biosci ; 242(2): 111-6, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23313463

ABSTRACT

For a reversible finite-state continuous-time Markov chain containing similar states, the computation of the transition matrix can be expressed quite elegantly in terms of the transition matrix of an associated lumped Markov chain. This result is immensely useful for obtaining explicit transition matrices for many DNA substitution models, without diagonalizing a matrix or solving a differential equation. Furthermore, the technique works for the analogous problem in the discrete-time DNA substitution models.


Subject(s)
Computational Biology/methods , DNA/genetics , Models, Genetic , Markov Chains
10.
Stat Appl Genet Mol Biol ; 9: Article 7, 2010.
Article in English | MEDLINE | ID: mdl-20196757

ABSTRACT

We wish to suggest the categorical analysis of variance as a means of quantifying the proportion of total genetic variation attributed to different sources of variation. This method potentially challenges researchers to rethink conclusions derived from a well-known method known as the analysis of molecular variance (AMOVA). The CATANOVA framework allows explicit definition, and estimation, of two measures of genetic differentiation. These parameters form the subject of interest in many research programmes, but are often confused with the correlation measures defined in AMOVA, which cannot be interpreted as relative contributions of particular sources of variation. Through a simulation approach, we show that under certain conditions, researchers who use AMOVA to estimate these measures of genetic differentiation may attribute more than justified amounts of total variation to population labels. Moreover, the two measures can also lead to incongruent conclusions regarding the genetic structure of the populations of interest. Fortunately, one of the two measures seems robust to variations in relative sample sizes used. Its merits are illustrated in this paper using mitochondrial haplotype and amplified fragment length polymorphism (AFLP) data.


Subject(s)
Analysis of Variance , Genetic Variation , Genetics, Population/statistics & numerical data , Algorithms , Amplified Fragment Length Polymorphism Analysis/statistics & numerical data , Animals , Asteraceae/genetics , Biostatistics , Calophyllum/genetics , DNA, Mitochondrial/genetics , Genomics/statistics & numerical data , Haplotypes , Humans , Models, Genetic , Models, Statistical , Pinctada/genetics , Racial Groups/genetics
11.
Mol Biol Evol ; 27(3): 726-34, 2010 Mar.
Article in English | MEDLINE | ID: mdl-19815689

ABSTRACT

Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.


Subject(s)
Codon , Evolution, Molecular , Models, Genetic , Models, Statistical , Selection, Genetic , Base Composition , Chi-Square Distribution , Computer Simulation , Genes, Protozoan , Mutation , Plasmodium/genetics , Sequence Alignment
12.
BMC Bioinformatics ; 10: 415, 2009 Dec 14.
Article in English | MEDLINE | ID: mdl-20003414

ABSTRACT

BACKGROUND: Quantitative trait loci analysis assumes that the trait is normally distributed. In reality, this is often not observed and one strategy is to transform the trait. However, it is not clear how much normality is required and which transformation works best in association studies. RESULTS: We performed simulations on four types of common quantitative traits to evaluate the effects of normalization using the logarithm, Box-Cox, and rank-based transformations. The impact of sample size and genetic effects on normalization is also investigated. Our results show that rank-based transformation gives generally the best and consistent performance in identifying the causal polymorphism and ranking it highly in association tests, with a slight increase in false positive rate. CONCLUSION: For small sample size or genetic effects, the improvement in sensitivity for rank transformation outweighs the slight increase in false positive rate. However, for large sample size and genetic effects, normalization may not be necessary since the increase in sensitivity is relatively modest.


Subject(s)
Computational Biology/methods , Quantitative Trait Loci , Genetic Variation , Polymorphism, Single Nucleotide , Sample Size
13.
BMC Bioinformatics ; 9: 511, 2008 Dec 01.
Article in English | MEDLINE | ID: mdl-19046431

ABSTRACT

BACKGROUND: The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared. RESULTS: Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short. CONCLUSION: Based on the conditions tested, we recommend the use of method of Gojobori et al. (1982) for long sequences (> 600 nucleotides), and the method of Goldman et al. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life.


Subject(s)
Computational Biology/methods , DNA Mutational Analysis/methods , Evolution, Molecular , Nucleotides/genetics , Algorithms , Computer Simulation , DNA/genetics , Data Interpretation, Statistical , Logistic Models , Markov Chains , Models, Genetic , Phylogeny , Reproducibility of Results , Sensitivity and Specificity
14.
Biol Direct ; 3: 52, 2008 Dec 16.
Article in English | MEDLINE | ID: mdl-19087239

ABSTRACT

BACKGROUND: Neighboring nucleotides exert a striking influence on mutation, with the hypermutability of CpG dinucleotides in many genomes being an exemplar. Among the approaches employed to measure the relative importance of sequence neighbors on molecular evolution have been continuous-time Markov process models for substitutions that treat sequences as a series of independent tuples. The most widely used examples are the codon substitution models. We evaluated the suitability of derivatives of the nucleotide frequency weighted (hereafter NF) and tuple frequency weighted (hereafter TF) models for measuring sequence context dependent substitution. Critical properties we address are their relationships to an independent nucleotide process and the robustness of parameter estimation to changes in sequence composition. We then consider the impact on inference concerning dinucleotide substitution processes from application of these two forms to intron sequence alignments from primates. RESULTS: We prove that the NF form always nests the independent nucleotide process and that this is not true for the TF form. As a consequence, using TF to study context effects can be misleading, which is shown by both theoretical calculations and simulations. We describe a simple example where a context parameter estimated under TF is confounded with composition terms unless all sequence states are equi-frequent. We illustrate this for the dinucleotide case by simulation under a nucleotide model, showing that the TF form identifies a CpG effect when none exists. Our analysis of primate introns revealed that the effect of nucleotide neighbors is over-estimated under TF compared with NF. Parameter estimates for a number of contexts are also strikingly discordant between the two model forms. CONCLUSION: Our results establish that the NF form should be used for analysis of independent-tuple context dependent processes. Although neighboring effects in general are still important, prominent influences such as the elevated CpG transversion rate previously identified using the TF form are an artifact. Our results further suggest as few as 5 parameters may account for approximately 85% of neighboring nucleotide influence.


Subject(s)
Amino Acid Substitution/genetics , Models, Genetic , Animals , CpG Islands/genetics , Introns/genetics , Likelihood Functions , Nucleotides/genetics , Primates/genetics , Sequence Alignment
15.
BMC Bioinformatics ; 9: 550, 2008 Dec 19.
Article in English | MEDLINE | ID: mdl-19099591

ABSTRACT

BACKGROUND: Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate matrices defined for these models are typically transformed into substitution probability matrices using a matrix exponentiation algorithm that employs eigendecomposition, but this algorithm has characteristic vulnerabilities that lead to significant errors when a rate matrix possesses certain 'pathological' properties. Here we tested whether pathological rate matrices exist in nature, and consider the suitability of different algorithms to their computation. RESULTS: We used concatenated protein coding gene alignments from microbial genomes, primate genomes and independent intron alignments from primate genomes. The Taylor series expansion and eigendecomposition matrix exponentiation algorithms were compared to the less widely employed, but more robust, Padé with scaling and squaring algorithm for nucleotide, dinucleotide, codon and trinucleotide rate matrices. Pathological dinucleotide and trinucleotide matrices were evident in the microbial data set, affecting the eigendecomposition and Taylor algorithms respectively. Even using a conservative estimate of matrix error (occurrence of an invalid probability), both Taylor and eigendecomposition algorithms exhibited substantial error rates: ~100% of all exonic trinucleotide matrices were pathological to the Taylor algorithm while ~10% of codon positions 1 and 2 dinucleotide matrices and intronic trinucleotide matrices, and ~30% of codon matrices were pathological to eigendecomposition. The majority of Taylor algorithm errors derived from occurrence of multiple unobserved states. A small number of negative probabilities were detected from the Padé algorithm on trinucleotide matrices that were attributable to machine precision. Although the Padé algorithm does not facilitate caching of intermediate results, it was up to 3x faster than eigendecomposition on the same matrices. CONCLUSION: Development of robust software for computing non-reversible dinucleotide, codon and higher evolutionary models requires implementation of the Padé with scaling and squaring algorithm.


Subject(s)
Computational Biology/methods , Evolution, Molecular , Algorithms , Animals , Codon , Humans , Markov Chains , Primates/genetics , Software
16.
BMC Evol Biol ; 5: 2, 2005 Jan 04.
Article in English | MEDLINE | ID: mdl-15629063

ABSTRACT

BACKGROUND: We compared two methods of rooting a phylogenetic tree: the stationary and the nonstationary substitution processes. These methods do not require an outgroup. METHODS: Given a multiple alignment and an unrooted tree, the maximum likelihood estimates of branch lengths and substitution parameters for each associated rooted tree are found; rooted trees are compared using their likelihood values. Site variation in substitution rates is handled by assigning sites into several classes before the analysis. RESULTS: In three test datasets where the trees are small and the roots are assumed known, the nonstationary process gets the correct estimate significantly more often, and fits data much better, than the stationary process. Both processes give biologically plausible root placements in a set of nine primate mitochondrial DNA sequences. CONCLUSIONS: The nonstationary process is simple to use and is much better than the stationary process at inferring the root. It could be useful for situations where an outgroup is unavailable.


Subject(s)
Classification/methods , Phylogeny , Animals , DNA, Mitochondrial/genetics , Humans , Likelihood Functions , Models, Biological , Primates/genetics , Reproducibility of Results
17.
Proc Natl Acad Sci U S A ; 101(36): 13268-72, 2004 Sep 07.
Article in English | MEDLINE | ID: mdl-15326311

ABSTRACT

Major histocompatibility complex class I molecules present peptides of 8-10 residues to CD8+ T cells. We used 19 predicted proteomes to determine the influence of CD8+ T cell immune surveillance on protein evolution in humans and microbial pathogens by predicting immunopeptidomes, i.e., sets of class I binding peptides present in proteomes. We find that class I peptide binding specificities (i) have had little, if any, influence on the evolution of immunopeptidomes and (ii) do not take advantage of biases in amino acid distribution in proteins other than the concentration of hydrophobic residues in NH(2)-terminal leader sequences.


Subject(s)
CD8-Positive T-Lymphocytes/immunology , Histocompatibility Antigens Class I/metabolism , Peptide Fragments/immunology , Proteome , Animals , Biological Evolution , Humans , Ligands , Mice
18.
Genome Res ; 14(4): 574-9, 2004 Apr.
Article in English | MEDLINE | ID: mdl-15059998

ABSTRACT

We describe a whole-genome comparative analysis of the human, mouse, and rat genomes to describe the average substitution patterns of four genomic regions: ancient repeats, rodent-specific DNA, exons, and conserved (coding and noncoding) regions, and to identify rodent evolutionary hotspots. In all types of regions, except the rodent-specific DNA, the rat branch is slightly longer than the mouse branch. Moreover, the mouse-rat distance is longer in the rodent-specific DNA than in the ancient repeats. Analysis of individual conserved regions with different substitution models yielded the conclusion that the Jukes-Cantor model is inadequate, and the Hasegawa-Kishino-Yano model is almost as good as the REV model. Using human as an outgroup, we identified 5055 evolutionary hotspots, which are highly conserved subalignment blocks (each consisting of at least 100 aligned sites and a small fraction of gaps) with a large and statistically significant difference in the branch lengths of the rodent species. The cutoffs used to identify the hotspots are partially based on estimates of the average rates of substitution. The fractions of hotspots overlapping with the rodent RefSeq genes, RefSeq exons, and ESTs are all higher than expected. Still, more than half of the hotspots lie in noncoding regions of the mouse genome. We believe that the hotspots represent biologically interesting regions in the rodent genomes.


Subject(s)
Evolution, Molecular , Genome , Animals , Conserved Sequence/genetics , DNA/genetics , Genome, Human , Humans , Mice , Models, Genetic , Mutation/genetics , Rats , Repetitive Sequences, Nucleic Acid/genetics , Sensitivity and Specificity , Sequence Alignment/methods , Sequence Alignment/statistics & numerical data , Species Specificity
19.
J Mol Evol ; 58(1): 12-8, 2004 Jan.
Article in English | MEDLINE | ID: mdl-14743311

ABSTRACT

We studied the substitution patterns in 7661 well-conserved human-mouse alignments corresponding to the intergenic regions of human chromosome 22. Alignments with a high average GC content tend to have a higher human GC content than mouse GC content, indicating a lack of stationarity. Segmenting the alignments into four groups of GC content and fitting the general reversible substitution model (REV) separately gave significantly better fits than the overall fit and the levels of fit are close to that expected under an REV model. In addition, most of the fitted rate matrices are not of the HKY type but are remarkably strand-symmetric, and we constructed a number of substitution matrices that should be useful for genomic DNA sequence alignment. We did not find obvious signs of temporal inhomogeneity in the substitution rates and concluded that the conserved intergenic regions in human chromosome 22 and mouse appear to have evolved from their common ancestors via a process that is approximately reversible and strand-symmetric, assuming site homogeneity and independence.


Subject(s)
Chromosomes, Human, Pair 22/genetics , Evolution, Molecular , Mice/genetics , Models, Genetic , Point Mutation/genetics , Animals , Base Composition , Base Sequence , Humans , Markov Chains , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL
...