Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 25(1): 42, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38273275

ABSTRACT

BACKGROUND: The clustering of immune repertoire data is challenging due to the computational cost associated with a very large number of pairwise sequence comparisons. To overcome this limitation, we developed Anchor Clustering, an unsupervised clustering method designed to identify similar sequences from millions of antigen receptor gene sequences. First, a Point Packing algorithm is used to identify a set of maximally spaced anchor sequences. Then, the genetic distance of the remaining sequences to all anchor sequences is calculated and transformed into distance vectors. Finally, distance vectors are clustered using unsupervised clustering. This process is repeated iteratively until the resulting clusters are small enough so that pairwise distance comparisons can be performed. RESULTS: Our results demonstrate that Anchor Clustering is faster than existing pairwise comparison clustering methods while providing similar clustering quality. With its flexible, memory-saving strategy, Anchor Clustering is capable of clustering millions of antigen receptor gene sequences in just a few minutes. CONCLUSIONS: This method enables the meta-analysis of immune-repertoire data from different studies and could contribute to a more comprehensive understanding of the immune repertoire data space.


Subject(s)
Algorithms , Receptors, Antigen , Cluster Analysis
2.
PLoS One ; 18(7): e0288388, 2023.
Article in English | MEDLINE | ID: mdl-37440576

ABSTRACT

Intrinsically disordered proteins (IDPs) are proteins that lack a stable 3D structure but maintain a biological function. It has been frequently suggested that IDPs are difficult to align because they tend to have fewer conserved residues compared to ordered proteins, but to our knowledge this has never been directly tested. To compare the alignments of ordered proteins to IDPs, their multiple sequence alignments (MSAs) were assessed using two different methods. The first compared the similarity between MSAs produced using the same sequences but created with Clustal Omega, MAFFT, and MUSCLE. The second assessed MSAs based on how well they recapitulated the species tree. These two methods measure the "correctness" of an MSA with two different approaches; the first method measures consistency while the second measures the underlying phylogenetic signal. Proteins that contained both regions of disorder and order were analyzed along with proteins that were fully disordered and fully ordered, using nucleotide, codon and peptide sequence alignments. We observed that IDPs had less similar MSAs than ordered proteins, which is most likely linked to the lower sequence conservation in IDPs. However, comparisons of tree distances found that trees from the ordered sequence MSAs were not significantly closer to the species tree than those inferred from disordered sequence MSAs. Our results show that it is correct to say that IDPs are difficult to align on the basis of MSA consistency, but that this does not equate with alignments being of poor quality when assessed by their ability to correctly infer a species tree.


Subject(s)
Intrinsically Disordered Proteins , Intrinsically Disordered Proteins/genetics , Intrinsically Disordered Proteins/chemistry , Phylogeny , Sequence Alignment
3.
PLoS One ; 14(2): e0211813, 2019.
Article in English | MEDLINE | ID: mdl-30726271

ABSTRACT

Dehydrins, plant proteins that are upregulated during dehydration stress conditions, have modular sequences that can contain three conserved motifs (the Y-, S-, and K-segments). The presence and order of these motifs are used to classify dehydrins into one of five architectures: Kn, SKn, KnS, YnKn, and YnSKn, where the subscript n describes the number of copies of that motif. In this study, an architectural and phylogenetic analysis was performed on 426 dehydrin sequences that were identified in 53 angiosperm and 3 gymnosperm genomes. It was found that angiosperms contained all five architectures, while gymnosperms only contained Kn and SKn dehydrins. This suggests that the ancestral dehydrin in spermatophytes was either Kn or SKn, and the Y-segment containing dehydrins first arose in angiosperms. A high-level split between the YnSKn dehydrins from either the Kn or SKn dehydrins could not be confidently identified, however, two lower level architectural divisions appear to have occurred after different duplication events. The first likely occurred after a whole genome duplication, resulting in the duplication of a Y3SK2 dehydrin; the duplicate subsequently lost an S- and K- segment to become a Y3K1 dehydrin. The second split occurred after a tandem duplication of a Y1SK2 dehydrin, where the duplicate lost both the Y- and S- segment and gained four K-segments, resulting in a K6 dehydrin. We suggest that the newly arisen Y3K1 dehydrin is possibly on its way to pseudogenization, while the newly arisen K6 dehydrin developed a novel function in cold protection.


Subject(s)
Cycadopsida/genetics , Evolution, Molecular , Gene Duplication , Genome, Plant , Magnoliopsida/genetics , Phylogeny , Plant Proteins/genetics , Databases, Protein
4.
Biosystems ; 114(3): 178-85, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24051263

ABSTRACT

This paper examines the use of evolutionary algorithms in the development of antibiotic regimens given to production animals. A model is constructed that combines the lifespan of the animal and the bacteria living in the animal's gastro-intestinal tract from the early finishing stage until the animal reaches market weight. This model is used as the fitness evaluation for a set of graph based evolutionary algorithms to assess the impact of diversity control on the evolving antibiotic regimens. The graph based evolutionary algorithms have two objectives: to find an antibiotic treatment regimen that maintains the weight gain and health benefits of antibiotic use and to reduce the risk of spreading antibiotic resistant bacteria. This study examines different regimens of tylosin phosphate use on bacteria populations divided into Gram positive and Gram negative types, with a focus on Campylobacter spp. Treatment regimens were found that provided decreased antibiotic resistance relative to conventional methods while providing nearly the same benefits as conventional antibiotic regimes. By using a graph to control the information flow in the evolutionary algorithm, a variety of solutions along the Pareto front can be found automatically for this and other multi-objective problems.


Subject(s)
Algorithms , Animal Diseases/prevention & control , Animal Husbandry/methods , Anti-Bacterial Agents/therapeutic use , Bacterial Infections/veterinary , Livestock/growth & development , Models, Theoretical , Animal Diseases/microbiology , Animals , Bacterial Infections/prevention & control , Computational Biology/methods , Livestock/microbiology , Tylosin
5.
J Genet Genomics ; 35(10): 603-16, 2008 Oct.
Article in English | MEDLINE | ID: mdl-18937917

ABSTRACT

The maize (Zea mays) spikelet consists of two florets, each of which contains three developmentally synchronized anthers. Morphologically, the anthers in the upper and lower florets proceed through apparently similar developmental programs. To test for global differences in gene expression and to identify genes that are coordinately regulated during maize anther development, RNA samples isolated from upper and lower floret anthers at six developmental stages were hybridized to cDNA microarrays. Approximately 9% of the tested genes exhibited statistically significant differences in expression between anthers in the upper and lower florets. This finding indicates that several basic biological processes are differentially regulated between upper and lower floret anthers, including metabolism, protein synthesis and signal transduction. Genes that are coordinately regulated across anther development were identified via cluster analysis. Analysis of these results identified stage-specific, early in development, late in development and bi-phasic expression profiles. Quantitative RT-PCR analysis revealed that four genes whose homologs in other plant species are involved in programmed cell death are up-regulated just prior to the time the tapetum begins to visibly degenerate (i.e., the mid-microspore stage). This finding supports the hypothesis that developmentally normal tapetal degeneration occurs via programmed cell death.


Subject(s)
Apoptosis , Flowers/cytology , Flowers/genetics , Gene Expression Regulation, Plant , Zea mays/cytology , Zea mays/genetics , Cluster Analysis , Flowers/growth & development , Flowers/metabolism , Gene Expression Profiling , Genes, Plant/genetics , Oligonucleotide Array Sequence Analysis , Plant Proteins/biosynthesis , Reproducibility of Results , Reverse Transcriptase Polymerase Chain Reaction , Time Factors , Up-Regulation , Zea mays/growth & development , Zea mays/metabolism
6.
Genetics ; 175(1): 429-39, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17110490

ABSTRACT

As an ancient segmental tetraploid, the maize (Zea mays L.) genome contains large numbers of paralogs that are expected to have diverged by a minimum of 10% over time. Nearly identical paralogs (NIPs) are defined as paralogous genes that exhibit > or = 98% identity. Sequence analyses of the "gene space" of the maize inbred line B73 genome, coupled with wet lab validation, have revealed that, conservatively, at least approximately 1% of maize genes have a NIP, a rate substantially higher than that in Arabidopsis. In most instances, both members of maize NIP pairs are expressed and are therefore at least potentially functional. Of evolutionary significance, members of many NIP families also exhibit differential expression. The finding that some families of maize NIPs are closely linked genetically while others are genetically unlinked is consistent with multiple modes of origin. NIPs provide a mechanism for the maize genome to circumvent the inherent limitation that diploid genomes can carry at most two "alleles" per "locus." As such, NIPs may have played important roles during the evolution and domestication of maize and may contribute to the success of long-term selection experiments in this important crop species.


Subject(s)
Evolution, Molecular , Genome, Plant , Plant Proteins/genetics , Zea mays/genetics , Arabidopsis/genetics , Base Sequence , DNA, Plant/chemistry , DNA, Plant/genetics , Molecular Sequence Data , Selection, Genetic , Sequence Homology, Nucleic Acid
7.
Genetics ; 174(3): 1671-83, 2006 Nov.
Article in English | MEDLINE | ID: mdl-16951074

ABSTRACT

A new genetic map of maize, ISU-IBM Map4, that integrates 2029 existing markers with 1329 new indel polymorphism (IDP) markers has been developed using intermated recombinant inbred lines (IRILs) from the intermated B73xMo17 (IBM) population. The website http://magi.plantgenomics.iastate.edu provides access to IDP primer sequences, sequences from which IDP primers were designed, optimized marker-specific PCR conditions, and polymorphism data for all IDP markers. This new gene-based genetic map will facilitate a wide variety of genetic and genomic research projects, including map-based genome sequencing and gene cloning. The mosaic structures of the genomes of 91 IRILs, an important resource for identifying and mapping QTL and eQTL, were defined. Analyses of segregation data associated with markers genotyped in three B73/Mo17-derived mapping populations (F2, Syn5, and IBM) demonstrate that allele frequencies were significantly altered during the development of the IBM IRILs. The observations that two segregation distortion regions overlap with maize flowering-time QTL suggest that the altered allele frequencies were a consequence of inadvertent selection. Detection of two-locus gamete disequilibrium provides another means to extract functional genomic data from well-characterized plant RILs.


Subject(s)
Chromosome Mapping , Crosses, Genetic , Genes, Plant , Recombination, Genetic , Zea mays/genetics , Alleles , Base Sequence , Chromosomes, Plant , Expressed Sequence Tags , Gene Frequency , Genetic Markers , Molecular Sequence Data , Polymorphism, Genetic , Quantitative Trait Loci
8.
Proc Natl Acad Sci U S A ; 102(34): 12282-7, 2005 Aug 23.
Article in English | MEDLINE | ID: mdl-16103354

ABSTRACT

Recent sequencing efforts have targeted the gene-rich regions of the maize (Zea mays L.) genome. We report the release of an improved assembly of maize assembled genomic islands (MAGIs). The 114,173 resulting contigs have been subjected to computational and physical quality assessments. Comparisons to the sequences of maize bacterial artificial chromosomes suggest that at least 97% (160 of 165) of MAGIs are correctly assembled. Because the rates at which junction-testing PCR primers for genomic survey sequences (90-92%) amplify genomic DNA are not significantly different from those of control primers ( approximately 91%), we conclude that a very high percentage of genic MAGIs accurately reflect the structure of the maize genome. EST alignments, ab initio gene prediction, and sequence similarity searches of the MAGIs are available at the Iowa State University MAGI web site. This assembly contains 46,688 ab initio predicted genes. The expression of almost half (628 of 1,369) of a sample of the predicted genes that lack expression evidence was validated by RT-PCR. Our analyses suggest that the maize genome contains between approximately 33,000 and approximately 54,000 expressed genes. Approximately 5% (32 of 628) of the maize transcripts discovered do not have detectable paralogs among maize ESTs or detectable homologs from other species in the GenBank NR nucleotide/protein database. Analyses therefore suggest that this assembly of the maize genome contains approximately 350 previously uncharacterized expressed genes. We hypothesize that these "orphans" evolved quickly during maize evolution and/or domestication.


Subject(s)
Contig Mapping/methods , Genes, Plant/genetics , Genome, Plant , Genomic Islands/genetics , Genomics/methods , Zea mays/genetics , Chromosomes, Artificial, Bacterial , Computational Biology , DNA Primers , Reverse Transcriptase Polymerase Chain Reaction , Sequence Analysis, DNA
9.
Plant Mol Biol ; 57(3): 445-60, 2005 Feb.
Article in English | MEDLINE | ID: mdl-15830133

ABSTRACT

Five ab initio programs (FGENESH, GeneMark.hmm, GENSCAN, GlimmerR and Grail) were evaluated for their accuracy in predicting maize genes. Two of these programs, GeneMark.hmm and GENSCAN had been trained for maize; FGENESH had been trained for monocots (including maize), and the others had been trained for rice or Arabidopsis. Initial evaluations were conducted using eight maize genes (gl8a, pdc2, pdc3, rf2c, rf2d, rf2e1, rth1, and rth3) of which the sequences were not released to the public prior to conducting this evaluation. The significant advantage of this data set for this evaluation is that these genes could not have been included in the training sets of the prediction programs. FGENESH yielded the most accurate and GeneMark.hmm the second most accurate predictions. The five programs were used in conjunction with RT-PCR to identify and establish the structures of two new genes in the a1-sh2 interval of the maize genome. FGENESH, GeneMark.hmm and GENSCAN were tested on a larger data set consisting of maize assembled genomic islands (MAGIs) that had been aligned to ESTs. FGENESH, GeneMark.hmm and GENSCAN correctly predicted gene models in 773, 625, and 371 MAGIs, respectively, out of the 1353 MAGIs that comprise data set 2.


Subject(s)
Genes, Plant/genetics , Software , Zea mays/genetics , Alternative Splicing , DNA, Plant/genetics , Exons/genetics , Gene Expression Regulation, Plant , Reproducibility of Results
10.
Plant Physiol ; 134(3): 960-8, 2004 Mar.
Article in English | MEDLINE | ID: mdl-15020760

ABSTRACT

In recent years, access to complete genomic sequences, coupled with rapidly accumulating data related to RNA and protein expression patterns, has made it possible to determine comprehensively how genes contribute to complex phenotypes. However, for major crop plants, publicly available, standard platforms for parallel expression analysis have been limited. We report the conception and design of the new publicly available, 22K Barley1 GeneChip probe array, a model for plants without a fully sequenced genome. Array content was derived from worldwide contribution of 350,000 high-quality ESTs from 84 cDNA libraries, in addition to 1,145 barley (Hordeum vulgare) gene sequences from the National Center for Biotechnology Information nonredundant database. Conserved sequences expressed in seedlings of wheat (Triticum aestivum), oat (Avena strigosa), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays) were identified that will be valuable in the design of arrays across grasses. To enhance the usability of the data, BarleyBase, a MIAME-compliant, MySQL relational database, serves as a public repository for raw and normalized expression data from the Barley1 GeneChip probe array. Interconnecting links with PlantGDB and Gramene allow BarleyBase users to perform gene predictions using the 21,439 non-redundant Barley1 exemplar sequences or cross-species comparison at the genome level, respectively. We expect that this first generation array will accelerate hypothesis generation and gene discovery in disease defense pathways, responses to abiotic stresses, development, and evolutionary diversity in monocot plants.


Subject(s)
Genomics/methods , Hordeum/genetics , Oligonucleotide Array Sequence Analysis/methods , Edible Grain/genetics , Genome, Plant , Genomics/statistics & numerical data , Oligonucleotide Array Sequence Analysis/statistics & numerical data , RNA, Plant/genetics , Software , Software Design
11.
Bioinformatics ; 20(2): 140-7, 2004 Jan 22.
Article in English | MEDLINE | ID: mdl-14734303

ABSTRACT

UNLABELLED: Because the bulk of the maize (Zea mays L.) genome consists of repetitive sequences, sequencing efforts are being targeted to its 'gene-rich' fraction. Traditional assembly programs are inadequate for this approach because they are optimized for a uniform sampling of the genome and inherently lack the ability to differentiate highly similar paralogs. RESULTS: We report the development of bioinformatics tools for the accurate assembly of the maize genome. This software, which is based on innovative parallel algorithms to ensure scalability, assembled 730,974 genomic survey sequences fragments in 4 h using 64 Pentium III 1.26 GHz processors of a commodity cluster. Algorithmic innovations are used to reduce the number of pairwise alignments significantly without sacrificing quality. Clone pair information was used to estimate the error rate for improved differentiation of polymorphisms versus sequencing errors. The assembly was also used to evaluate the effectiveness of various filtering strategies and thereby provide information that can be used to focus subsequent sequencing efforts.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Genome, Plant , Repetitive Sequences, Nucleic Acid/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Zea mays/genetics , Computing Methodologies , Database Management Systems , Databases, Nucleic Acid , Software , Software Design
12.
Plant Physiol ; 133(2): 475-81, 2003 Oct.
Article in English | MEDLINE | ID: mdl-14555776

ABSTRACT

To enhance gene discovery, expressed sequence tag (EST) projects often make use of cDNA libraries produced using diverse mixtures of mRNAs. As such, expression data are lost because the origins of the resulting ESTs cannot be determined. Alternatively, multiple libraries can be prepared, each from a more restricted source of mRNAs. Although this approach allows the origins of ESTs to be determined, it requires the production of multiple libraries. A hybrid approach is reported here. A cDNA library was prepared using 21 different pools of maize (Zea mays) mRNAs. DNA sequence "bar codes" were added during first-strand cDNA synthesis to uniquely identify the mRNA source pool from which individual cDNAs were derived. Using a decoding algorithm that included error correction, it was possible to identify the source mRNA pool of more than 97% of the ESTs. The frequency at which a bar code is represented in an EST contig should be proportional to the abundance of the corresponding mRNA in the source pool. Consistent with this, all ESTs derived from several genes (zein and adh1) that are known to be exclusively expressed in kernels or preferentially expressed under anaerobic conditions, respectively, were exclusively tagged with bar codes associated with mRNA pools prepared from kernel and anaerobically treated seedlings, respectively. Hence, by allowing for the retention of expression data, the bar coding of cDNA libraries can enhance the value of EST projects.


Subject(s)
DNA, Plant/genetics , Expressed Sequence Tags , RNA, Messenger/genetics , Zea mays/genetics , Base Sequence , DNA, Complementary/genetics , DNA, Plant/chemistry , Electronic Data Processing/methods , Gene Library , RNA, Messenger/chemistry
13.
Genetics ; 163(2): 685-98, 2003 Feb.
Article in English | MEDLINE | ID: mdl-12618406

ABSTRACT

Even in the absence of excisional loss of the associated Mu transposons, some Mu-induced mutant alleles of maize can lose their capacity to condition a mutant phenotype. Three of five Mu-derived rf2a alleles are susceptible to such Mu suppression. The suppressible rf2a-m9437 allele has a novel Mu transposon insertion (Mu10) in its 5' untranslated region (UTR). The suppressible rf2a-m9390 allele has a Mu1 insertion in its 5' UTR. During suppression, alternative transcription initiation sites flanking the Mu1 transposon yield functional transcripts. The suppressible rf2a-m8110 allele has an rcy/Mu7 insertion in its 3' UTR. Suppression of this allele occurs via a previously unreported mechanism; sequences in the terminal inverted repeats of rcy/Mu7 function as alternative polyadenylation sites such that the suppressed rf2a-m8110 allele yields functional rf2a transcripts. No significant differences were observed in the nucleotide compositions of these alternative polyadenylation sites as compared with 94 other polyadenylation sites from maize genes.


Subject(s)
DNA Transposable Elements , Plant Proteins , Trans-Activators/genetics , Transcription Initiation Site , Zea mays/genetics , Base Sequence , Basic-Leucine Zipper Transcription Factors , DNA Methylation , Molecular Sequence Data , Sequence Analysis, DNA , Suppression, Genetic
14.
Genetics ; 160(2): 697-716, 2002 Feb.
Article in English | MEDLINE | ID: mdl-11861572

ABSTRACT

The widespread use of the maize Mutator (Mu) system to generate mutants exploits the preference of Mu transposons to insert into genic regions. However, little is known about the specificity of Mu insertions within genes. Analysis of 79 independently isolated Mu-induced alleles at the gl8 locus established that at least 75 contain Mu insertions. Analysis of the terminal inverted repeats (TIRs) of the inserted transposons defined three new Mu transposons: Mu10, Mu 11, and Mu12. A large percentage (>80%) of the insertions are located in the 5' untranslated region (UTR) of the gl8 gene. Ten positions within the 5' UTR experienced multiple independent Mu insertions. Analyses of the nucleotide composition of the 9-bp TSD and the sequences directly flanking the TSD reveals that the nucleotide composition of Mu insertion sites differs dramatically from that of random DNA. In particular, the frequencies at which C's and G's are observed at positions -2 and +2 (relative to the TSD) are substantially higher than expected. Insertion sites of 315 RescueMu insertions displayed the same nonrandom nucleotide composition observed for the gl8-Mu alleles. Hence, this study provides strong evidence for the involvement of sequences flanking the TSD in Mu insertion-site selection.


Subject(s)
5' Untranslated Regions/genetics , Alcohol Oxidoreductases/genetics , DNA Transposable Elements/genetics , Gene Duplication , Genome, Plant , Plant Proteins , Zea mays/genetics , Alleles , Base Sequence , Chromosome Mapping , Crosses, Genetic , Molecular Sequence Data , Mutation/genetics , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL
...