Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Methods Mol Biol ; 2016: 171-180, 2019.
Article in English | MEDLINE | ID: mdl-31197719

ABSTRACT

Insertion sequences are small mobile regions of DNA (transposable elements) found primarily in prokaryotes. The identification of insertion sequences in bacteria is a growing field of study because of their applications in evolution, genetics, and medicine. One of the first steps in characterizing the insertion sequences found in an organism is to perform a genome-wide survey to identify all insertion sequences using in silico methods. This includes a thorough scan of the genome to locate all copies of different families of insertion sequences and the identification of the key characteristics of each element. The results provide an extensive catalog of the insertion sequences which can be used to further other analyses or manipulation of the genome.


Subject(s)
Bacteria/genetics , DNA Transposable Elements , Genome, Bacterial , Genomics/methods , Bacterial Infections/microbiology , Humans , Open Reading Frames
2.
BMC Bioinformatics ; 11 Suppl 6: S20, 2010 Oct 07.
Article in English | MEDLINE | ID: mdl-20946604

ABSTRACT

BACKGROUND: RNA transcripts from genomic sequences showing dyad symmetry typically adopt hairpin-like, cloverleaf, or similar structures that act as recognition sites for proteins. Such structures often are the precursors of non-coding RNA (ncRNA) sequences like microRNA (miRNA) and small-interfering RNA (siRNA) that have recently garnered more functional significance than in the past. Genomic DNA contains hundreds of thousands of such inverted repeats (IRs) with varying degrees of symmetry. But by collecting statistically significant information from a known set of ncRNA, we can sort these IRs into those that are likely to be functional. RESULTS: A novel method was developed to scan genomic DNA for partially symmetric inverted repeats and the resulting set was further refined to match miRNA precursors (pre-miRNA) with respect to their density of symmetry, statistical probability of the symmetry, length of stems in the predicted hairpin secondary structure, and the GC content of the stems. This method was applied on the Arabidopsis thaliana genome and validated against the set of 190 known Arabidopsis pre-miRNA in the miRBase database. A preliminary scan for IRs identified 186 of the known pre-miRNA but with 714700 pre-miRNA candidates. This large number of IRs was further refined to 483908 candidates with 183 pre-miRNA identified and further still to 165371 candidates with 171 pre-miRNA identified (i.e. with 90% of the known pre-miRNA retained). CONCLUSIONS: 165371 candidates for potentially functional miRNA is still too large a set to warrant wet lab analyses, such as northern blotting, on all of them. Hence additional filters are needed to further refine the number of candidates while still retaining most of the known miRNA. These include detection of promoters and terminators, homology analyses, location of candidate relative to coding regions, and better secondary structure prediction algorithms. The software developed is designed to easily accommodate such additional filters with a minimal experience in Perl.


Subject(s)
Genome , Genomics/methods , Inverted Repeat Sequences/genetics , Arabidopsis/genetics , Base Composition , Base Sequence , MicroRNAs/chemistry , RNA, Small Interfering
3.
BMC Bioinformatics ; 9 Suppl 9: S2, 2008 Aug 12.
Article in English | MEDLINE | ID: mdl-18793465

ABSTRACT

BACKGROUND: Gene family identification from ESTs can be a valuable resource for analysis of genome evolution but presents unique challenges in organisms for which the entire genome is not yet sequenced. We have developed a novel gene family identification method based on negative selection patterns (NSP) between family members to screen EST-generated contigs. This strategy was tested on five known gene families in Arabidopsis to see if individual paralogs could be identified with accuracy from EST data alone when compared to the actual gene sequences in this fully sequenced genome. RESULTS: The NSP method uniquely identified family members in all the gene families tested. Two members of the FtsH gene family, three members each of the PAL, RF1, and ribosomal L6 gene families, and four members of the CAD gene family were correctly identified. Additionally all ESTs from the representative contigs when checked against MapViewer data successfully identify the gene locus predicted. CONCLUSION: We demonstrate the effectiveness of the NSP strategy in identifying specific gene family members in Arabidopsis using only EST data and we describe how this strategy can be used to identify many gene families in agronomically important crop species where they are as yet undiscovered.


Subject(s)
Arabidopsis Proteins/genetics , Chromosome Mapping/methods , Expressed Sequence Tags , Genome, Plant/genetics , Multigene Family/genetics , Proteome/genetics , Selection, Genetic , Arabidopsis/genetics
4.
BMC Bioinformatics ; 7 Suppl 2: S19, 2006 Sep 06.
Article in English | MEDLINE | ID: mdl-17118140

ABSTRACT

BACKGROUND: Gene duplication events have played a significant role in genome evolution, particularly in plants. Exhaustive searches for all members of a known gene family as well as the identification of new gene families has become increasingly important. Subfunctionalization via changes in regulatory sequences following duplication (adaptive selection) appears to be a common mechanism of evolution in plants and can be accompanied by purifying selection on the coding region. Such negative selection can be detected by a bias toward synonymous over nonsynonymous substitutions. However, the process of identifying this bias requires many steps usually employing several different software programs. We have simplified the process and significantly shortened the time required by condensing many steps into a few scripts or programs to rapidly identify putative gene family members beginning with a single query sequence. RESULTS: In this report we 1) describe the software tools (SimESTs, PCAT, and SCAT) developed to automate the gene family identification, 2) demonstrate the validity of the method by correctly identifying 3 of 4 PAL gene family members from Arabidopsis using EST data alone, 3) identify 2 to 6 CAD gene family members from Glycine max (previously unidentified), and 4) identify 2 members of a putative Glycine max gene family previously unidentified in any plant species. CONCLUSION: Gene families in plants, particularly that subset where purifying selection has occurred in the coding region, can be identified quickly and easily by integrating our software tools and commonly available contig assembly and ORF identification programs.


Subject(s)
Arabidopsis/genetics , Automation, Laboratory/methods , Computational Biology/methods , Glycine max/genetics , Software Design , Time Factors
5.
BMC Bioinformatics ; 6 Suppl 2: S7, 2005 Jul 15.
Article in English | MEDLINE | ID: mdl-16026604

ABSTRACT

BACKGROUND: Clustering the ESTs from a large dataset representing a single species is a convenient starting point for a number of investigations into gene discovery, genome evolution, expression patterns, and alternatively spliced transcripts. Several methods have been developed to accomplish this, the most widely available being UniGene, a public domain collection of gene-oriented clusters for over 45 different species created and maintained by NCBI. The goal is for each cluster to represent a unique gene, but currently it is not known how closely the overall results represent that reality. UniGene's build procedure begins with initial mRNA clusters before joining ESTs. UniGene's results for soybean indicate a significant amount of redundancy among some sequences reported to be unique mRNAs. To establish a valid non-redundant known gene set for Glycine max we applied our algorithm to the clustering of only mRNA sequences. The mRNA dataset was run through the algorithm using two different matching stringencies. The resulting cluster compositions were compared to each other and to UniGene. Clusters exhibiting differences among the three methods were analyzed by 1) nucleotide and amino acid alignment and 2) submitting authors conclusions to determine whether members of a single cluster represented the same gene or not. RESULTS: Of the 12 clusters that were examined closely most contained examples of sequences that did not belong in the same cluster. However, neither the two stringencies of PECT nor UniGene had a significantly greater record of accuracy in placing paralogs into separate clusters. CONCLUSION: Our results reveal that, although each method produces some errors, using multiple stringencies for matching or a sequential hierarchical method of increasing stringencies can provide more reliable results and therefore allow greater confidence in the vast majority of clusters that contain only ESTs and no mRNA sequences.


Subject(s)
Glycine max/genetics , RNA, Messenger/genetics , Cluster Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...