Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PeerJ Comput Sci ; 10: e1921, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38660211

RESUMO

The density-based clustering method is considered a robust approach in unsupervised clustering technique due to its ability to identify outliers, form clusters of irregular shapes and automatically determine the number of clusters. These unique properties helped its pioneering algorithm, the Density-based Spatial Clustering on Applications with Noise (DBSCAN), become applicable in datasets where various number of clusters of different shapes and sizes could be detected without much interference from the user. However, the original algorithm exhibits limitations, especially towards its sensitivity on its user input parameters minPts and ɛ. Additionally, the algorithm assigned inconsistent cluster labels to data objects found in overlapping density regions of separate clusters, hence lowering its accuracy. To alleviate these specific problems and increase the clustering accuracy, we propose two methods that use the statistical data from a given dataset's k-nearest neighbor density distribution in order to determine the optimal ɛ values. Our approach removes the burden on the users, and automatically detects the clusters of a given dataset. Furthermore, a method to identify the accurate border objects of separate clusters is proposed and implemented to solve the unpredictability of the original algorithm. Finally, in our experiments, we show that our efficient re-implementation of the original algorithm to automatically cluster datasets and improve the clustering quality of adjoining cluster members provides increase in clustering accuracy and faster running times when compared to earlier approaches.

2.
Interdiscip Sci ; 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38568406

RESUMO

With the rapid development of NGS technology, the number of protein sequences has increased exponentially. Computational methods have been introduced in protein functional studies because the analysis of large numbers of proteins through biological experiments is costly and time-consuming. In recent years, new approaches based on deep learning have been proposed to overcome the limitations of conventional methods. Although deep learning-based methods effectively utilize features of protein function, they are limited to sequences of fixed-length and consider information from adjacent amino acids. Therefore, new protein analysis tools that extract functional features from proteins of flexible length and train models are required. We introduce DeepPI, a deep learning-based tool for analyzing proteins in large-scale database. The proposed model that utilizes Global Average Pooling is applied to proteins of flexible length and leads to reduced information loss compared to existing algorithms that use fixed sizes. The image generator converts a one-dimensional sequence into a distinct two-dimensional structure, which can extract common parts of various shapes. Finally, filtering techniques automatically detect representative data from the entire database and ensure coverage of large protein databases. We demonstrate that DeepPI has been successfully applied to large databases such as the Pfam-A database. Comparative experiments on four types of image generators illustrated the impact of structure on feature extraction. The filtering performance was verified by varying the parameter values and proved to be applicable to large databases. Compared to existing methods, DeepPI outperforms in family classification accuracy for protein function inference.

3.
Heliyon ; 9(10): e20931, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37916084

RESUMO

Smart policing based on the analysis of big data ensures the development and sustainability of police policy. However, it is difficult to find instances in which the results of data analysis have been applied to actual policy in the field of crime prevention. The South Korean police force recognizes the need for smart policing and is engaged in various research and field support activities. Some examples that are especially relevant for crime investigation include analyzing the connections between cases and predicting the location of the next crime in a series of crimes and the location of suspects. However, it is difficult to find examples of police policy that use big data. Therefore, this study aims to suggest a model that uses big data to respond to emergency calls efficiently. First, we extract hotspots that are predicted to be locations of criminal activity based on an analysis of the association between community environment data and crime data. Second, we create a route having the shortest travel time to the crime location by developing a route optimization algorithm. Lastly, we assess the performance of the patrol routes in reflecting real-time traffic information. If the data application model suggested in this study could be adjusted and applied to the current police patrol system, the model could be used by each police department effectively.

4.
PeerJ ; 10: e14186, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36262414

RESUMO

Numerous published genomes contain gaps or unknown sequences. Gap filling is a critical final step in de novo genome assembly, particularly for large genomes. While certain computational approaches partially address the problem, others have shortcomings regarding the draft genome's dependability and correctness (high rates of mis-assembly at gap-closing sites and high error rates). While it is well established that genomic repeats result in gaps, many sequence reads originating from repeat-related gaps are typically missed by existing approaches. A fast and reliable statistical algorithm for closing gaps in a draft genome is presented in this paper. It utilizes the alignment statistics between scaffolds, contigs, and paired-end reads to generate a Markov chain that appropriately assigns contigs or long reads to scaffold gap regions (only corrects candidate regions), resulting in accurate and efficient gap closure. To reconstruct the missing component between the two ends of the same insert, the RFfiller meticulously searches for valid overlaps (in repeat regions) and generates transition tables for similar reads, allowing it to make a statistical guess at the missing sequence. Finally, in our experiments, we show that the RFfiller's gap-closing accuracy is better than that of other publicly available tools when sequence data from various organisms are used. Assembly benchmarks were used to validate RFfiller. Our findings show that RFfiller efficiently fills gaps and that it is especially effective when the gap length is longer. We also show that the RFfiller outperforms other gap closing tools currently on the market.


Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Genômica/métodos
5.
Int J Mol Sci ; 23(19)2022 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-36232783

RESUMO

Advances in the next-generation sequencing technology have led to a dramatic decrease in read-generation cost and an increase in read output. Reconstruction of short DNA sequence reads generated by next-generation sequencing requires a read alignment method that reconstructs a reference genome. In addition, it is essential to analyze the results of read alignments for a biologically meaningful inference. However, read alignment from vast amounts of genomic data from various organisms is challenging in that it involves repeated automatic and manual analysis steps. We, here, devised cPlot software for read alignment of nucleotide sequences, with automated read alignment and position analysis, which allows visual assessment of the analysis results by the user. cPlot compares sequence similarity of reads by performing multiple read alignments, with FASTA format files as the input. This application provides a web-based interface for the user for facile implementation, without the need for a dedicated computing environment. cPlot identifies the location and order of the sequencing reads by comparing the sequence to a genetically close reference sequence in a way that is effective for visualizing the assembly of short reads generated by NGS and rapid gene map construction.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Algoritmos , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência , Análise de Sequência de DNA/métodos
6.
PeerJ Comput Sci ; 7: e636, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34307867

RESUMO

Technologies for next-generation sequencing (NGS) have stimulated an exponential rise in high-throughput sequencing projects and resulted in the development of new read-assembly algorithms. A drastic reduction in the costs of generating short reads on the genomes of new organisms is attributable to recent advances in NGS technologies such as Ion Torrent, Illumina, and PacBio. Genome research has led to the creation of high-quality reference genomes for several organisms, and de novo assembly is a key initiative that has facilitated gene discovery and other studies. More powerful analytical algorithms are needed to work on the increasing amount of sequence data. We make a thorough comparison of the de novo assembly algorithms to allow new users to clearly understand the assembly algorithms: overlap-layout-consensus and de-Bruijn-graph, string-graph based assembly, and hybrid approach. We also address the computational efficacy of each algorithm's performance, challenges faced by the assem- bly tools used, and the impact of repeats. Our results compare the relative performance of the different assemblers and other related assembly differences with and without the reference genome. We hope that this analysis will contribute to further the application of de novo sequences and help the future growth of assembly algorithms.

7.
PeerJ ; 9: e12707, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35036172

RESUMO

The massively parallel nature of next-generation sequencing technologies has contributed to the generation of massive sequence data in the last two decades. Deciphering the meaning of each generated sequence requires multiple analysis tools, at all stages of analysis, from the reads stage all the way up to the whole-genome level. Homology-based approaches based on related reference sequences are usually the preferred option for gene and transcript prediction in newly sequenced genomes, resulting in the popularity of a variety of BLAST and BLAST-based tools. For organelle genomes, a single-reference-based gene finding tool that uses grouping parameters for BLAST results has been implemented in the Genome Search Plotter (GSP). However, this tool does not accept multiple and user-customized reference sequences required for a broad homology search. Here, we present multiple Reference-based Gene Search and Plot (ReGSP), a simple and convenient web tool that accepts multiple reference sequences for homology-based gene search. The tool incorporates cPlot, a novel dot plot tool, for illustrating nucleotide sequence similarity between the query and the reference sequences. ReGSP has an easy-to-use web interface and is freely accessible at https://ds.mju.ac.kr/regsp.

8.
Bioinformatics ; 35(24): 5303-5305, 2019 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-31350879

RESUMO

SUMMARY: In comparative and evolutionary genomics, a detailed comparison of common features between organisms is essential to evaluate genetic distance. However, identifying differences in matched and mismatched genes among multiple genomes is difficult using current comparative genomic approaches due to complicated methodologies or the generation of meager information from obtained results. This study describes a visualized software tool, geneCo (gene Comparison), for comparing genome structure and gene arrangements between various organisms. User data are aligned, gene information is recognized, and genome structures are compared based on user-defined GenBank files. Information regarding inversion, gain, loss, duplication and gene rearrangement among multiple organisms being compared is provided by geneCo, which uses a web-based interface that users can easily access without any need to consider the computational environment. AVAILABILITY AND IMPLEMENTATION: Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/geneCo. The main module of geneCo is implemented by Python and the web-based user interface is built by PHP, HTML and CSS to support all browsers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Genômica , Bases de Dados de Ácidos Nucleicos , Internet , Software
9.
Bioinformatics ; 34(15): 2661-2663, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29617954

RESUMO

Summary: Next-generation sequencing (NGS) technologies have led to the accumulation of high-throughput sequence data from various organisms in biology. To apply gene annotation of organellar genomes for various organisms, more optimized tools for functional gene annotation are required. Almost all gene annotation tools are mainly focused on the chloroplast genome of land plants or the mitochondrial genome of animals. We have developed a web application AGORA for the fast, user-friendly and improved annotations of organellar genomes. Annotator for Genes of Organelle from the Reference sequence Analysis (AGORA) annotates genes based on a basic local alignment search tool (BLAST)-based homology search and clustering with selected reference sequences from the NCBI database or user-defined uploaded data. AGORA can annotate the functional genes in almost all mitochondrion and plastid genomes of eukaryotes. The gene annotation of a genome with an exon-intron structure within a gene or inverted repeat region is also available. It provides information of start and end positions of each gene, BLAST results compared with the reference sequence and visualization of gene map by OGDRAW. Availability and implementation: Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/gene_project/AGORA/. The main module of the tool is implemented by the python and php, and the web page is built by the HTML and CSS to support all browsers. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma de Cloroplastos , Genoma Mitocondrial , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular/métodos , Software , Animais , Eucariotos/genética , Análise de Sequência de DNA/métodos
10.
BMC Genomics ; 19(1): 275, 2018 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-29678149

RESUMO

BACKGROUND: Cryptophytes are an ecologically important group of algae comprised of phototrophic, heterotrophic and osmotrophic species. This lineage is of great interest to evolutionary biologists because their plastids are of red algal secondary endosymbiotic origin. Cryptophytes have a clear phylogenetic affinity to heterotrophic eukaryotes and possess four genomes: host-derived nuclear and mitochondrial genomes, and plastid and nucleomorph genomes of endosymbiotic origin. RESULTS: To gain insight into cryptophyte mitochondrial genome evolution, we sequenced the mitochondrial DNAs of five species and performed a comparative analysis of seven genomes from the following cryptophyte genera: Chroomonas, Cryptomonas, Hemiselmis, Proteomonas, Rhodomonas, Storeatula and Teleaulax. The mitochondrial genomes were similar in terms of their general architecture, gene content and presence of a large repeat region. However, gene order was poorly conserved. Characteristic features of cryptophyte mtDNAs included large syntenic clusters resembling α-proteobacterial operons that encode bacteria-like rRNAs, tRNAs, and ribosomal protein genes. The cryptophyte mitochondrial genomes retain almost all genes found in many other eukaryotes including the nad, sdh, cox, cob, and atp genes, with the exception of sdh2 and atp3. In addition, gene cluster analysis showed that cryptophytes possess a gene order closely resembling the jakobid flagellates Jakoba and Reclinomonas. Interestingly, the cox1 gene of R. salina, T. amphioxeia, and Storeatula species was found to contain group II introns encoding a reverse transcriptase protein, as did the cob gene of Storeatula species CCMP1868. CONCLUSIONS: These newly sequenced genomes increase the breadth of data available from algae and will aid in the identification of general trends in mitochondrial genome evolution. While most of the genomes were highly conserved, extensive gene arrangements have shuffled gene order, perhaps due to genome rearrangements associated with hairpin-containing mobile genetic elements, tRNAs with palindromic sequences, and tandem repeat sequences. The cox1 and cob gene sequences suggest that introns have recently been acquired during cryptophyte evolution. Comparison of phylogenetic trees based on plastid and mitochondrial genome data sets underscore the different evolutionary histories of the host and endosymbiont components of present-day cryptophytes.


Assuntos
Criptófitas/genética , Genoma Mitocondrial/genética , Genômica , Sequências Repetitivas Dispersas/genética , Rearranjo Gênico , Filogenia
11.
Genome Biol Evol ; 9(7): 1859-1872, 2017 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-28854597

RESUMO

Cryptophytes are an ecologically important group of largely photosynthetic unicellular eukaryotes. This lineage is of great interest to evolutionary biologists because their plastids are of red algal secondary endosymbiotic origin and the host cell retains four different genomes (host nuclear, mitochondrial, plastid, and red algal nucleomorph). Here, we report a comparative analysis of plastid genomes from six representative cryptophyte genera. Four newly sequenced cryptophyte plastid genomes of Chroomonas mesostigmatica, Ch. placoidea, Cryptomonas curvata, and Storeatula sp. CCMP1868 share a number of features including synteny and gene content with the previously sequenced genomes of Cryptomonas paramecium, Rhodomonas salina, Teleaulax amphioxeia, and Guillardia theta. Our analysis of these plastid genomes reveals examples of gene loss and intron insertion. In particular, the chlB/chlL/chlN genes, which encode light-independent (dark active) protochlorophyllide oxidoreductase (LIPOR) proteins have undergone recent gene loss and pseudogenization in cryptophytes. Comparison of phylogenetic trees based on plastid and nuclear genome data sets show the introduction, via secondary endosymbiosis, of a red algal derived plastid in a lineage of chlorophyll-c containing algae. This event was followed by additional rounds of eukaryotic endosymbioses that spread the red lineage plastid to diverse groups such as haptophytes and stramenopiles.


Assuntos
Criptófitas/genética , Evolução Molecular , Genomas de Plastídeos , Plastídeos/genética , Simbiose , Criptófitas/fisiologia , Filogenia , Análise de Sequência de DNA/métodos
12.
Genome Biol Evol ; 7(8): 2394-406, 2015 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-26245677

RESUMO

Two red algal classes, the Florideophyceae (approximately 7,100 spp.) and Bangiophyceae (approximately 193 spp.), comprise 98% of red algal diversity in marine and freshwater habitats. These two classes form well-supported monophyletic groups in most phylogenetic analyses. Nonetheless, the interordinal relationships remain largely unresolved, in particular in the largest subclass Rhodymeniophycidae that includes 70% of all species. To elucidate red algal phylogenetic relationships and study organelle evolution, we determined the sequence of 11 mitochondrial genomes (mtDNA) from 5 florideophycean subclasses. These mtDNAs were combined with existing data, resulting in a database of 25 florideophytes and 12 bangiophytes (including cyanidiophycean species). A concatenated alignment of mt proteins was used to resolve ordinal relationships in the Rhodymeniophycidae. Red algal mtDNA genome comparisons showed 47 instances of gene rearrangement including 12 that distinguish Bangiophyceae from Hildenbrandiophycidae, and 5 that distinguish Hildenbrandiophycidae from Nemaliophycidae. These organelle data support a rapid radiation and surprisingly high conservation of mtDNA gene syntheny among the morphologically divergent multicellular lineages of Rhodymeniophycidae. In contrast, we find extensive mitochondrial gene rearrangements when comparing Bangiophyceae and Florideophyceae and multiple examples of gene loss among the different red algal lineages.


Assuntos
Evolução Molecular , Genoma Mitocondrial , Rodófitas/genética , Sequência Conservada , Dados de Sequência Molecular , Filogenia , Proteínas de Plantas/genética , Rodófitas/classificação , Sintenia
13.
PLoS One ; 10(6): e0129284, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26047475

RESUMO

Teleaulax amphioxeia is a photosynthetic unicellular cryptophyte alga that is distributed throughout marine habitats worldwide. This alga is an important plastid donor to the dinoflagellate Dinophysis caudata through the ciliate Mesodinium rubrum in the marine food web. To better understand the genomic characteristics of T. amphioxeia, we have sequenced and analyzed its plastid genome. The plastid genome sequence of T. amphioxeia is similar to that of Rhodomonas salina, and they share significant synteny. This sequence exhibits less similarity to that of Guillardia theta, the representative plastid genome of photosynthetic cryptophytes. The gene content and order of the three photosynthetic cryptomonad plastid genomes studied is highly conserved. The plastid genome of T. amphioxeia is composed of 129,772 bp and includes 143 protein-coding genes, 2 rRNA operons and 30 tRNA sequences. The DNA polymerase III gene (dnaX) was most likely acquired via lateral gene transfer (LGT) from a firmicute bacterium, identical to what occurred in R. salina. On the other hand, the psbN gene was independently encoded by the plastid genome without a reverse transcriptase gene as an intron. To clarify the phylogenetic relationships of the algae with red-algal derived plastids, phylogenetic analyses of 32 taxa were performed, including three previously sequenced cryptophyte plastid genomes containing 93 protein-coding genes. The stramenopiles were found to have branched out from the Chromista taxa (cryptophytes, haptophytes, and stramenopiles), while the cryptophytes and haptophytes were consistently grouped into sister relationships with high resolution.


Assuntos
Criptófitas/genética , Genes de Cloroplastos/genética , Genomas de Plastídeos/genética , Plastídeos/genética , Proteínas de Cloroplastos/genética , DNA de Cloroplastos/química , DNA de Cloroplastos/genética , DNA Circular/química , DNA Circular/genética , Ordem dos Genes , Transferência Genética Horizontal , Fotossíntese/genética , Complexo de Proteína do Fotossistema I/genética , Complexo de Proteína do Fotossistema II/genética , Filogenia , Plastídeos/classificação , Análise de Sequência de DNA
14.
ScientificWorldJournal ; 2014: 473132, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25254245

RESUMO

Even though existing low-power listening (LPL) protocols have enabled ultra-low-power operation in wireless sensor networks (WSN), they do not address trade-off between energy and delay, since they focused only on energy aspect. However, in recent years, a growing interest in various WSN applications is requiring new design factors, such as minimum delay and higher reliability, as well as energy efficiency. Therefore, in this paper we propose a novel sensor multiple access control (MAC) protocol, transmission rate based adaptive low-power listening MAC protocol (TRA-MAC), which is a kind of preamble-based LPL but is capable of controlling preamble sensing cycle adaptively to transmission rates. Through experiments, it is demonstrated that TRA-MAC enables LPL cycle (LC) and preamble transmission length to adapt dynamically to varying transmission rates, compensating trade-off between energy and response time.


Assuntos
Algoritmos , Redes de Comunicação de Computadores/normas , Modelos Teóricos , Tecnologia sem Fio/normas , Redes de Comunicação de Computadores/instrumentação , Reprodutibilidade dos Testes , Processamento de Sinais Assistido por Computador/instrumentação , Fatores de Tempo , Tecnologia sem Fio/instrumentação
15.
ScientificWorldJournal ; 2014: 156083, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25197692

RESUMO

Resource management of the main memory and process handler is critical to enhancing the system performance of a web server. Owing to the transaction delay time that affects incoming requests from web clients, web server systems utilize several web processes to anticipate future requests. This procedure is able to decrease the web generation time because there are enough processes to handle the incoming requests from web browsers. However, inefficient process management results in low service quality for the web server system. Proper pregenerated process mechanisms are required for dealing with the clients' requests. Unfortunately, it is difficult to predict how many requests a web server system is going to receive. If a web server system builds too many web processes, it wastes a considerable amount of memory space, and thus performance is reduced. We propose an adaptive web process manager scheme based on the analysis of web log mining. In the proposed scheme, the number of web processes is controlled through prediction of incoming requests, and accordingly, the web process management scheme consumes the least possible web transaction resources. In experiments, real web trace data were used to prove the improved performance of the proposed scheme.


Assuntos
Algoritmos , Dispositivos de Armazenamento em Computador , Processamento Eletrônico de Dados/métodos , Internet , Mineração de Dados , Fatores de Tempo
16.
ScientificWorldJournal ; 2014: 109435, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25147832

RESUMO

In recent years, traditional development techniques for e-learning systems have been changing to become more convenient and efficient. One new technology in the development of application systems includes both cloud and ubiquitous computing. Cloud computing can support learning system processes by using services while ubiquitous computing can provide system operation and management via a high performance technical process and network. In the cloud computing environment, a learning service application can provide a business module or process to the user via the internet. This research focuses on providing the learning material and processes of courses by learning units using the services in a ubiquitous computing environment. And we also investigate functions that support users' tailored materials according to their learning style. That is, we analyzed the user's data and their characteristics in accordance with their user experience. We subsequently applied the learning process to fit on their learning performance and preferences. Finally, we demonstrate how the proposed system outperforms learning effects to learners better than existing techniques.


Assuntos
Internet , Aprendizagem , Software , Humanos
17.
ScientificWorldJournal ; 2014: 542824, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25133242

RESUMO

Automated protein function prediction defines the designation of functions of unknown protein functions by using computational methods. This technique is useful to automatically assign gene functional annotations for undefined sequences in next generation genome analysis (NGS). NGS is a popular research method since high-throughput technologies such as DNA sequencing and microarrays have created large sets of genes. These huge sequences have greatly increased the need for analysis. Previous research has been based on the similarities of sequences as this is strongly related to the functional homology. However, this study aimed to designate protein functions by automatically predicting the function of the genome by utilizing InterPro (IPR), which can represent the properties of the protein family and groups of the protein function. Moreover, we used gene ontology (GO), which is the controlled vocabulary used to comprehensively describe the protein function. To define the relationship between IPR and GO terms, three pattern recognition techniques have been employed under different conditions, such as feature selection and weighted value, instead of a binary one.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Análise de Sequência de Proteína/métodos
18.
Mitochondrial DNA ; 25(4): 273-4, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23789771

RESUMO

We sequenced and characterized the first complete mitochondrial genome of the sublittoral red alga Rhodymenia pseudopalmata (Rhodymeniales, Rhodophyta). The mitogenome is 26,166 bp in length with 29.5% GC content. The circular mitogenome contains 47 genes, including 24 protein-coding, 2 rRNA and 21 tRNA genes including two copies of trnG, trnL, trnM and trnS. There are two cases of gene-overlapping, found between sdhD and nad4, and between secY and rps12. The R. pseudopalmata mitochondria genome differs from that of Gracilariopsis lemaneiformis by three missing genes (orf60, rpl20 and trnH).


Assuntos
Genoma Mitocondrial , Rodófitas/genética , Dados de Sequência Molecular , Proteínas de Plantas/genética , RNA Ribossômico/genética , RNA de Transferência/genética
19.
J Comput Biol ; 19(8): 957-67, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22876787

RESUMO

The goal of protein family classification is to group proteins into families so that proteins within the same family have common function or are related by ancestry. While supervised classification algorithms are available for this purpose, most of these approaches focus on assigning unclassified proteins to known families but do not allow for progressive construction of new families from proteins that cannot be assigned. Although unsupervised clustering algorithms are also available, they do not make use of information from known families. By computing similarities between proteins based on pairwise sequence comparisons, we develop supervised classification algorithms that achieve improved accuracy over previous approaches while allowing for construction of new families. We show that our algorithm has higher accuracy rate and lower mis-classification rate when compared to algorithms that are based on the use of multiple sequence alignments and hidden Markov models, and our algorithm performs well even on families with very few proteins and on families with low sequence similarity. A software program implementing the algorithm (SClassify) is available online (http://faculty.cse.tamu.edu/shsze/sclassify).


Assuntos
Algoritmos , Modelos Genéticos , Proteínas/classificação , Análise de Sequência de Proteína/métodos , Motivos de Aminoácidos , Animais , Análise por Conglomerados , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Cadeias de Markov , Proteínas/genética
20.
Bioinformation ; 7(5): 251-6, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22125394

RESUMO

Identifying genomic regions that descended from a common ancestor helps us study the gene function and genome evolution. In distantly related genomes, clusters of homologous gene pairs are evidently used in function prediction, operon detection, etc. Currently, there are many kinds of computational methods that have been proposed defining gene clusters to identify gene families and operons. However, most of those algorithms are only available on a data set of small size. We developed an efficient gene clustering algorithm that can be applied on hundreds of genomes at the same time. This approach allows for large-scale study of evolutionary relationships of gene clusters and study of operon formation and destruction. An analysis of proposed algorithms shows that more biological insight can be obtained by analyzing gene clusters across hundreds of genomes, which can help us understand operon occurrences, gene orientations and gene rearrangements.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...