Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Res Notes ; 11(1): 584, 2018 Aug 13.
Article in English | MEDLINE | ID: mdl-30103816

ABSTRACT

OBJECTIVE: The state-of-the-art genome annotation tools output GFF3 format files, while this format is not accepted as submission format by the International Nucleotide Sequence Database Collaboration (INSDC) databases. Converting the GFF3 format to a format accepted by one of the three INSDC databases is a key step in the achievement of genome annotation projects. However, the flexibility existing in the GFF3 format makes this conversion task difficult to perform. Until now, no converter is able to handle any GFF3 flavour regardless of source. RESULTS: Here we present EMBLmyGFF3, a robust universal converter from GFF3 format to EMBL format compatible with genome annotation submission to the European Nucleotide Archive. The tool uses json parameter files, which can be easily tuned by the user, allowing the mapping of corresponding vocabulary between the GFF3 format and the EMBL format. We demonstrate the conversion of GFF3 annotation files from four different commonly used annotation tools: Maker, Prokka, Augustus and Eugene. EMBLmyGFF3 is freely available at https://github.com/NBISweden/EMBLmyGFF3 .


Subject(s)
Databases, Nucleic Acid , Molecular Sequence Annotation , Nucleotides , Europe , Genome , Internet , Software
2.
Eur J Hum Genet ; 25(11): 1253-1260, 2017 11.
Article in English | MEDLINE | ID: mdl-28832569

ABSTRACT

Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.


Subject(s)
Genome, Human , Polymorphism, Single Nucleotide , Registries , Datasets as Topic , Genome-Wide Association Study , Humans , Sweden , Twins/genetics
3.
Proc Natl Acad Sci U S A ; 103(19): 7482-7, 2006 May 09.
Article in English | MEDLINE | ID: mdl-16641098

ABSTRACT

The quaking viable mouse mutation (qk(v)) is a deletion including the 5' regulatory region of the quaking gene (Qki), which causes body tremor and severe dysmyelination in mouse. The function of the human quaking gene, called quaking homolog KH domain RNA-binding (mouse) (QKI), is not well known. We have previously shown that QKI is a new candidate gene for schizophrenia. Here we show that human QKI mRNA levels can account for a high proportion (47%) of normal interindividual mRNA expression variation (and covariation) of six oligodendrocyte-related genes (PLP1, MAG, MBP, TF, SOX10, and CDKN1B) in 55 human brain autopsy samples from individuals without psychiatric diagnoses. In addition, the tightly coexpressed myelin-related genes (PLP1, MAG, and TF) have decreased mRNA levels in 55 schizophrenic patients, as compared with 55 control individuals, and most of this difference (68-96%) can be explained by variation in the relative mRNA levels of QKI-7kb, the same QKI splice variant previously shown to be down-regulated in patients with schizophrenia. Taken together, our results suggest that QKI levels may regulate oligodendrocyte differentiation and maturation in human brain, in a similar way as in mouse. Moreover, we hypothesize that previously observed decreased activity of myelin-related genes in schizophrenia might be caused by disturbed QKI splicing.


Subject(s)
Oligodendroglia/metabolism , RNA-Binding Proteins/genetics , Schizophrenia/genetics , Animals , Autopsy , Base Sequence , Binding Sites , Conserved Sequence , Gene Expression Regulation , Genetic Variation/genetics , Humans , Molecular Sequence Data , Myelin Proteins/genetics , Myelin Sheath/genetics , Open Reading Frames/genetics , RNA, Messenger/genetics , RNA-Binding Proteins/chemistry , Sequence Alignment , Transcription, Genetic/genetics
4.
Genomics ; 85(1): 24-35, 2005 Jan.
Article in English | MEDLINE | ID: mdl-15607419

ABSTRACT

Polymorphic minisatellites, also known as variable number of tandem repeats (VNTRs), are tandem repeat regions that show variation in the number of repeat units among chromosomes in a population. Currently, there are no general methods for predicting which minisatellites have a high probability of being polymorphic, given their sequence characteristics. An earlier approach has focused on potentially highly polymorphic and hypervariable minisatellites, which make up only a small fraction of all minisatellites in the human genome. We have developed a model, based on available minisatellite and VNTR sequence data, that predicts the probability that a minisatellite (unit size > or = 6 bp) identified by the computer program Tandem Repeats Finder is polymorphic (VNTR). According to the model, minisatellites with high copy number and high degree of sequence similarity are most likely to be VNTRs. This approach was used to scan the draft sequence of the human genome for VNTRs. A total of 157,549 minisatellite repeats were found, of which 29,224 are predicted to be VNTRs. Contrary to previous results, VNTRs appear to be widespread and abundant throughout the human genome, with an estimated density of 9.1 VNTRs/Mb.


Subject(s)
Genome, Human , Microsatellite Repeats/genetics , Minisatellite Repeats/genetics , Models, Genetic , Polymorphism, Genetic/genetics , Humans
5.
J Biol ; 2(2): 13, 2003.
Article in English | MEDLINE | ID: mdl-12760745

ABSTRACT

BACKGROUND: For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments. RESULTS: We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcription-factor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcription-factor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at http://www.phylofoot.org/. CONCLUSIONS: Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting.


Subject(s)
Conserved Sequence/genetics , Genes, Regulator/genetics , Genome, Human , Algorithms , Binding Sites/genetics , Computational Biology/methods , Computer Graphics , DNA Footprinting/methods , Humans , Models, Genetic , Phylogeny , Predictive Value of Tests , Sensitivity and Specificity , Transcription Factors/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...