Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Plant Sci ; 14: 1039211, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36993855

RESUMO

Pomegranate has a unique evolutionary history given that different cultivars have eight or nine bivalent chromosomes with possible crossability between the two classes. Therefore, it is important to study chromosome evolution in pomegranate to understand the dynamics of its population. Here, we de novo assembled the Azerbaijani cultivar "Azerbaijan guloyshasi" (AG2017; 2n = 16) and re-sequenced six cultivars to track the evolution of pomegranate and to compare it with previously published de novo assembled and re-sequenced cultivars. High synteny was observed between AG2017, Bhagawa (2n = 16), Tunisia (2n = 16), and Dabenzi (2n = 18), but these four cultivars diverged from the cultivar Taishanhong (2n = 18) with several rearrangements indicating the presence of two major chromosome evolution events. Major presence/absence variations were not observed as >99% of the five genomes aligned across the cultivars, while >99% of the pan-genic content was represented by Tunisia and Taishanhong only. We also revisited the divergence between soft- and hard-seeded cultivars with less structured population genomic data, compared to previous studies, to refine the selected genomic regions and detect global migration routes for pomegranate. We reported a unique admixture between soft- and hard-seeded cultivars that can be exploited to improve the diversity, quality, and adaptability of local pomegranate varieties around the world. Our study adds body knowledge to understanding the evolution of the pomegranate genome and its implications for the population structure of global pomegranate diversity, as well as planning breeding programs aiming to develop improved cultivars.

2.
Opt Express ; 27(22): 32578-32586, 2019 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-31684467

RESUMO

Exceptionally strong enhancement of the Raman signal exceeding eight orders of magnitude for near-infrared (1064 nm) excitation is demonstrated for an array of dielectric submicron pillars covered by a relatively thick metal layer. The microstructure is designed to support 'spoof' plasmon-polariton excitations with resonant frequencies significantly below the fundamental surface plasmon resonance. Experiments reveal a relatively narrow range of spatial parameters for the optimal resonant scattering enhancement. They include a period close to the excitation wavelength, a specific ratio of the pillar planar size to the period, and optimal heights of both the pillars and the covering silver metal layer. The realized microstructures can be produced by fab-compatible photolithography techniques, and their outstanding sensing possibilities open the venue for the biomedical applications.

3.
BMC Genomics ; 20(1): 399, 2019 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-31117933

RESUMO

BACKGROUND: The three epidemiologically important Opisthorchiidae liver flukes Opisthorchis felineus, O. viverrini, and Clonorchis sinensis, are believed to harbour similar potencies to provoke hepatobiliary diseases in their definitive hosts, although their populations have substantially different ecogeographical aspects including habitat, preferred hosts, population structure. Lack of O. felineus genomic data is an obstacle to the development of comparative molecular biological approaches necessary to obtain new knowledge about the biology of Opisthorchiidae trematodes, to identify essential pathways linked to parasite-host interaction, to predict genes that contribute to liver fluke pathogenesis and for the effective prevention and control of the disease. RESULTS: Here we present the first draft genome assembly of O. felineus and its gene repertoire accompanied by a comparative analysis with that of O. viverrini and Clonorchis sinensis. We observed both noticeably high heterozygosity of the sequenced individual and substantial genetic diversity in a pooled sample. This indicates that potency of O. felineus population for rapid adaptive response to control and preventive measures of opisthorchiasis is higher than in O. viverrini and C. sinensis. We also have found that all three species are characterized by more intensive involvement of trans-splicing in RNA processing compared to other trematodes. CONCLUSION: All revealed peculiarities of structural organization of genomes are of extreme importance for a proper description of genes and their products in these parasitic species. This should be taken into account both in academic and applied research of epidemiologically important liver flukes. Further comparative genomics studies of liver flukes and non-carcinogenic flatworms allow for generation of well-grounded hypotheses on the mechanisms underlying development of cholangiocarcinoma associated with opisthorchiasis and clonorchiasis as well as species-specific mechanisms of these diseases.


Assuntos
Cricetinae/parasitologia , Cyprinidae/parasitologia , Genoma Helmíntico , Genômica/métodos , Proteínas de Helminto/genética , Opistorquíase/epidemiologia , Opisthorchis/genética , Sequência de Aminoácidos , Animais , Clonorquíase/epidemiologia , Clonorquíase/genética , Clonorquíase/parasitologia , Clonorchis sinensis/genética , Opistorquíase/genética , Opistorquíase/parasitologia , Homologia de Sequência
4.
FASEB J ; 33(7): 8161-8173, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30970224

RESUMO

Human prefrontal cortex (PFC) is associated with broad individual variabilities in functions linked to personality, social behaviors, and cognitive functions. The phenotype variabilities associated with brain functions can be caused by genetic or epigenetic factors. The interactions between these factors in human subjects is, as of yet, poorly understood. The heterogeneity of cerebral tissue, consisting of neuronal and nonneuronal cells, complicates the comparative analysis of gene activities in brain specimens. To approach the underlying neurogenomic determinants, we performed a deep analysis of open chromatin-associated histone methylation in PFC neurons sorted from multiple human individuals in conjunction with whole-genome and transcriptome sequencing. Integrative analyses produced novel unannotated neuronal genes and revealed individual-specific chromatin "blueprints" of neurons that, in part, relate to genetic background. Surprisingly, we observed gender-dependent epigenetic signals, implying that gender may contribute to the chromatin variabilities in neurons. Finally, we found epigenetic, allele-specific activation of the testis-specific gene nucleoporin 210 like (NUP210L) in brain in some individuals, which we link to a genetic variant occurring in <3% of the human population. Recently, the NUP210L locus has been associated with intelligence and mathematics ability. Our findings highlight the significance of epigenetic-genetic footprinting for exploring neurologic function in a subject-specific manner.-Gusev, F. E., Reshetov, D. A., Mitchell, A. C., Andreeva, T. V., Dincer, A., Grigorenko, A. P., Fedonin, G., Halene, T., Aliseychik, M., Goltsov, A. Y., Solovyev, V., Brizgalov, L., Filippova, E., Weng, Z., Akbarian, S., Rogaev, E. I. Epigenetic-genetic chromatin footprinting identifies novel and subject-specific genes active in prefrontal cortex neurons.


Assuntos
Cromatina/metabolismo , Cognição/fisiologia , Epigênese Genética/fisiologia , Neurônios/metabolismo , Córtex Pré-Frontal/metabolismo , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Pré-Escolar , Feminino , Loci Gênicos/fisiologia , Histonas/metabolismo , Humanos , Lactente , Recém-Nascido , Masculino , Metilação , Pessoa de Meia-Idade , Neurônios/citologia , Complexo de Proteínas Formadoras de Poros Nucleares/biossíntese , Córtex Pré-Frontal/citologia , Gravidez
5.
Bioinformatics ; 35(16): 2730-2737, 2019 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-30601980

RESUMO

MOTIVATION: Computational identification of promoters is notoriously difficult as human genes often have unique promoter sequences that provide regulation of transcription and interaction with transcription initiation complex. While there are many attempts to develop computational promoter identification methods, we have no reliable tool to analyze long genomic sequences. RESULTS: In this work, we further develop our deep learning approach that was relatively successful to discriminate short promoter and non-promoter sequences. Instead of focusing on the classification accuracy, in this work we predict the exact positions of the transcription start site inside the genomic sequences testing every possible location. We studied human promoters to find effective regions for discrimination and built corresponding deep learning models. These models use adaptively constructed negative set, which iteratively improves the model's discriminative ability. Our method significantly outperforms the previously developed promoter prediction programs by considerably reducing the number of false-positive predictions. We have achieved error-per-1000-bp rate of 0.02 and have 0.31 errors per correct prediction, which is significantly better than the results of other human promoter predictors. AVAILABILITY AND IMPLEMENTATION: The developed method is available as a web server at http://www.cbrc.kaust.edu.sa/PromID/.


Assuntos
Aprendizado Profundo , Regiões Promotoras Genéticas , Genoma Humano , Genômica , Humanos , Sítio de Iniciação de Transcrição
6.
Opt Express ; 26(17): 22519-22527, 2018 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-30130943

RESUMO

Apart from the main plasmon-polariton resonance of the surface-enhanced Raman scattering (SERS) occurring at 480 - 530 nm, an additional resonance was observed for substrates with two silver layers separated by a dielectric layer which support extra plasmon modes with decreased group velocities. The novel SERS resonance is shifted towards lower energies and has comparable amplitude, its exact energy position being determined by the thickness of the dielectric interlayer. The experimental findings provide a ground for the engineering of SERS-substrates with the spectral position of the additional resonance matched with the photon energy of the pump laser over a fairly wide range of laser wavelengths.

7.
PLoS One ; 12(11): e0187243, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29141011

RESUMO

Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into "promoters" and "non-promoters" even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 "promoter-specific" transcription factors), those that bind preferentially to the [0,500] region (282 "5' UTR-specific" TFs), and 207 of the "promiscuous" transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.


Assuntos
Eucariotos/genética , Nucleotídeos/metabolismo , Regiões Promotoras Genéticas , Algoritmos , Sítios de Ligação , Metilação de DNA , Evolução Molecular , Oryza/genética , Fatores de Transcrição/metabolismo
8.
Biol Direct ; 12(1): 21, 2017 09 08.
Artigo em Inglês | MEDLINE | ID: mdl-28886750

RESUMO

BACKGROUND: Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. RESULTS: Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. CONCLUSIONS: We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database ( http://palmxplore.mpob.gov.my ), will provide important resources for studies on the genomes of oil palm and related crops. REVIEWERS: This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.


Assuntos
Arecaceae/genética , Genoma de Planta , Modelos Genéticos , Anotação de Sequência Molecular , Biologia Computacional/métodos , Genes de Plantas , Software
9.
Methods Mol Biol ; 1613: 311-331, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28849566

RESUMO

It is becoming more evident that computational methods are needed for the identification and the mapping of pathways in new genomes. We introduce an automatic annotation system (ARBA4Path Association Rule-Based Annotator for Pathways) that utilizes rule mining techniques to predict metabolic pathways across wide range of prokaryotes. It was demonstrated that specific combinations of protein domains (recorded in our rules) strongly determine pathways in which proteins are involved and thus provide information that let us very accurately assign pathway membership (with precision of 0.999 and recall of 0.966) to proteins of a given prokaryotic taxon. Our system can be used to enhance the quality of automatically generated annotations as well as annotating proteins with unknown function. The prediction models are represented in the form of human-readable rules, and they can be used effectively to add absent pathway information to many proteins in UniProtKB/TrEMBL database.


Assuntos
Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Mineração de Dados/métodos , Redes e Vias Metabólicas , Proteínas de Bactérias/química , Bases de Dados de Proteínas , Aprendizado de Máquina , Anotação de Sequência Molecular , Domínios Proteicos , Proteômica/métodos
10.
PLoS One ; 12(2): e0171410, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28158264

RESUMO

Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.


Assuntos
Células Eucarióticas/metabolismo , Redes Neurais de Computação , Células Procarióticas/metabolismo , Regiões Promotoras Genéticas/genética , Animais , Biologia Computacional/métodos , Humanos , Análise de Sequência de DNA
11.
Nucleic Acids Res ; 45(8): e65, 2017 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-28082394

RESUMO

Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.


Assuntos
Genoma de Planta , Redes Neurais de Computação , Proteínas de Plantas/genética , Regiões Promotoras Genéticas , RNA Polimerase II/genética , Sítio de Iniciação de Transcrição , Arabidopsis/genética , Arabidopsis/metabolismo , Expressão Gênica , Oryza/genética , Oryza/metabolismo , Proteínas de Plantas/metabolismo , RNA Polimerase II/metabolismo , Análise de Sequência de DNA , Software
12.
Nucleic Acids Res ; 45(D1): D1075-D1081, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899667

RESUMO

We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma de Planta , Mutação INDEL , Oryza/genética , Polimorfismo de Nucleotídeo Único , Ferramenta de Busca , Software , Alelos , Biologia Computacional/métodos , Frequência do Gene , Loci Gênicos , Genômica/métodos , Genótipo , Interface Usuário-Computador , Navegador
13.
PLoS One ; 11(7): e0158896, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27390860

RESUMO

The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.


Assuntos
Mineração de Dados/métodos , Bases de Dados de Proteínas , Anotação de Sequência Molecular/métodos , Células Procarióticas/metabolismo , Proteoma/genética , Proteoma/metabolismo
14.
Genome Announc ; 3(5)2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26472828

RESUMO

The emergence and spread of multidrug-resistant (MDR) bacteria have been regarded as major challenges among health care-associated infections worldwide. Here, we report the draft genome sequence of an MDR Stenotrophomonas maltophilia strain isolated in 2014 from King Abdulla Medical City, Makkah, Saudi Arabia.

15.
Bioinformatics ; 31(21): 3544-5, 2015 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-26142184

RESUMO

UNLABELLED: Gene transcription is mostly conducted through interactions of various transcription factors and their binding sites on DNA (regulatory elements, REs). Today, we are still far from understanding the real regulatory content of promoter regions. Computer methods for identification of REs remain a widely used tool for studying and understanding transcriptional regulation mechanisms. The Nsite, NsiteH and NsiteM programs perform searches for statistically significant (non-random) motifs of known human, animal and plant one-box and composite REs in a single genomic sequence, in a pair of aligned homologous sequences and in a set of functionally related sequences, respectively. AVAILABILITY AND IMPLEMENTATION: Pre-compiled executables built under commonly used operating systems are available for download by visiting http://www.molquest.kaust.edu.sa and http://www.softberry.com. CONTACT: solovictor@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Regiões Promotoras Genéticas , Software , Animais , Sítios de Ligação , Genômica , Humanos , Motivos de Nucleotídeos , Plantas/genética , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo
16.
Bioinformatics ; 31(21): 3421-8, 2015 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-26177965

RESUMO

MOTIVATION: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. RESULTS: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction. AVAILABILITY AND IMPLEMENTATION: Karect is available at: http://aminallam.github.io/karect. CONTACT: amin.allam@kaust.edu.sa SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação INDEL/genética , Mutagênese Insercional/genética , Análise de Sequência de DNA/métodos , Deleção de Sequência , Mapeamento Cromossômico , Biologia Computacional/métodos , Genoma Humano , Humanos
17.
Genome Res ; 24(12): 2077-89, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25273068

RESUMO

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.


Assuntos
Genoma , Genômica/métodos , Alinhamento de Sequência/métodos , Software , Animais , Biologia Computacional/métodos , Simulação por Computador , Conjuntos de Dados como Assunto , Estudo de Associação Genômica Ampla , Humanos , Mamíferos/genética , Filogenia , Reprodutibilidade dos Testes
18.
Nature ; 510(7503): 109-14, 2014 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-24847885

RESUMO

The origins of neural systems remain unresolved. In contrast to other basal metazoans, ctenophores (comb jellies) have both complex nervous and mesoderm-derived muscular systems. These holoplanktonic predators also have sophisticated ciliated locomotion, behaviour and distinct development. Here we present the draft genome of Pleurobrachia bachei, Pacific sea gooseberry, together with ten other ctenophore transcriptomes, and show that they are remarkably distinct from other animal genomes in their content of neurogenic, immune and developmental genes. Our integrative analyses place Ctenophora as the earliest lineage within Metazoa. This hypothesis is supported by comparative analysis of multiple gene families, including the apparent absence of HOX genes, canonical microRNA machinery, and reduced immune complement in ctenophores. Although two distinct nervous systems are well recognized in ctenophores, many bilaterian neuron-specific genes and genes of 'classical' neurotransmitter pathways either are absent or, if present, are not expressed in neurons. Our metabolomic and physiological data are consistent with the hypothesis that ctenophore neural systems, and possibly muscle specification, evolved independently from those in other animals.


Assuntos
Ctenóforos/genética , Evolução Molecular , Genoma/genética , Sistema Nervoso , Animais , Ctenóforos/classificação , Ctenóforos/imunologia , Ctenóforos/fisiologia , Genes Controladores do Desenvolvimento , Genes Homeobox , Mesoderma/metabolismo , Metabolômica , MicroRNAs , Dados de Sequência Molecular , Músculos/fisiologia , Sistema Nervoso/metabolismo , Neurônios/metabolismo , Neurotransmissores , Filogenia , Transcriptoma/genética
19.
BMC Genomics ; 15: 86, 2014 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-24479613

RESUMO

BACKGROUND: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. RESULTS: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. CONCLUSIONS: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.


Assuntos
Abelhas/genética , Genes de Insetos , Animais , Composição de Bases , Bases de Dados Genéticas , Sequências Repetitivas Dispersas/genética , Anotação de Sequência Molecular , Fases de Leitura Aberta/genética , Peptídeos/análise , Análise de Sequência de RNA , Homologia de Sequência de Aminoácidos
20.
Genome Res ; 21(12): 2224-41, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21926179

RESUMO

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.


Assuntos
Genoma/fisiologia , Genômica/métodos , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...