Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Plant J ; 117(3): 944-955, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37947292

ABSTRACT

Scots pine (Pinus sylvestris L.) is one of the most widespread and economically important conifer species in the world. Applications like genomic selection and association studies, which could help accelerate breeding cycles, are challenging in Scots pine because of its large and repetitive genome. For this reason, genotyping tools for conifer species, and in particular for Scots pine, are commonly based on transcribed regions of the genome. In this article, we present the Axiom Psyl50K array, the first single nucleotide polymorphism (SNP) genotyping array for Scots pine based on whole-genome resequencing, that represents both genic and intergenic regions. This array was designed following a two-step procedure: first, 192 trees were sequenced, and a 430K SNP screening array was constructed. Then, 480 samples, including haploid megagametophytes, full-sib family trios, breeding population, and range-wide individuals from across Eurasia were genotyped with the screening array. The best 50K SNPs were selected based on quality, replicability, distribution across the draft genome assembly, balance between genic and intergenic regions, and genotype-environment and genotype-phenotype associations. Of the final 49 877 probes tiled in the array, 20 372 (40.84%) occur inside gene models, while the rest lie in intergenic regions. We also show that the Psyl50K array can yield enough high-confidence SNPs for genetic studies in pine species from North America and Eurasia. This new genotyping tool will be a valuable resource for high-throughput fundamental and applied research of Scots pine and other pine species.


Subject(s)
Pinus sylvestris , Pinus , Humans , Pinus sylvestris/genetics , Polymorphism, Single Nucleotide/genetics , Genotype , Plant Breeding , Pinus/genetics , DNA, Intergenic
2.
BMC Genomics ; 15: 439, 2014 Jun 06.
Article in English | MEDLINE | ID: mdl-24906298

ABSTRACT

BACKGROUND: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. RESULTS: In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. CONCLUSIONS: By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.


Subject(s)
Genetic Vectors , High-Throughput Nucleotide Sequencing/methods , Picea/genetics , Cloning, Molecular , Genome, Plant , High-Throughput Nucleotide Sequencing/economics , Software
3.
PLoS One ; 8(7): e70388, 2013.
Article in English | MEDLINE | ID: mdl-23894647

ABSTRACT

BACKGROUND: Ultra-deep pyrosequencing (UDPS) is used to identify rare sequence variants. The sequence depth is influenced by several factors including the error frequency of PCR and UDPS. This study investigated the characteristics and source of errors in raw and cleaned UDPS data. RESULTS: UDPS of a 167-nucleotide fragment of the HIV-1 SG3Δenv plasmid was performed on the Roche/454 platform. The plasmid was diluted to one copy, PCR amplified and subjected to bidirectional UDPS on three occasions. The dataset consisted of 47,693 UDPS reads. Raw UDPS data had an average error frequency of 0.30% per nucleotide site. Most errors were insertions and deletions in homopolymeric regions. We used a cleaning strategy that removed almost all indel errors, but had little effect on substitution errors, which reduced the error frequency to 0.056% per nucleotide. In cleaned data the error frequency was similar in homopolymeric and non-homopolymeric regions, but varied considerably across sites. These site-specific error frequencies were moderately, but still significantly, correlated between runs (r=0.15-0.65) and between forward and reverse sequencing directions within runs (r=0.33-0.65). Furthermore, transition errors were 48-times more common than transversion errors (0.052% vs. 0.001%; p<0.0001). Collectively the results indicate that a considerable proportion of the sequencing errors that remained after data cleaning were generated during the PCR that preceded UDPS. CONCLUSIONS: A majority of the sequencing errors that remained after data cleaning were introduced by PCR prior to sequencing, which means that they will be independent of platform used for next-generation sequencing. The transition vs. transversion error bias in cleaned UDPS data will influence the detection limits of rare mutations and sequence variants.


Subject(s)
High-Throughput Nucleotide Sequencing/standards , Polymerase Chain Reaction/standards , Sequence Analysis, DNA/standards , Artifacts , Base Sequence , HIV-1/genetics
4.
Nature ; 497(7451): 579-84, 2013 May 30.
Article in English | MEDLINE | ID: mdl-23698360

ABSTRACT

Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.


Subject(s)
Evolution, Molecular , Genome, Plant/genetics , Picea/genetics , Conserved Sequence/genetics , DNA Transposable Elements/genetics , Gene Silencing , Genes, Plant/genetics , Genomics , Internet , Introns/genetics , Phenotype , RNA, Untranslated/genetics , Sequence Analysis, DNA , Terminal Repeat Sequences/genetics , Transcription, Genetic/genetics
5.
PLoS One ; 6(7): e21910, 2011.
Article in English | MEDLINE | ID: mdl-21760920

ABSTRACT

BACKGROUND: The tremendous output of massive parallel sequencing technologies requires automated robust and scalable sample preparation methods to fully exploit the new sequence capacity. METHODOLOGY: In this study, a method for automated library preparation of RNA prior to massively parallel sequencing is presented. The automated protocol uses precipitation onto carboxylic acid paramagnetic beads for purification and size selection of both RNA and DNA. The automated sample preparation was compared to the standard manual sample preparation. CONCLUSION/SIGNIFICANCE: The automated procedure was used to generate libraries for gene expression profiling on the Illumina HiSeq 2000 platform with the capacity of 12 samples per preparation with a significantly improved throughput compared to the standard manual preparation. The data analysis shows consistent gene expression profiles in terms of sensitivity and quantification of gene expression between the two library preparation methods.


Subject(s)
Gene Expression Profiling , High-Throughput Nucleotide Sequencing/methods , Automation , Cell Line, Tumor , Chemical Precipitation , DNA, Complementary/biosynthesis , Gene Expression Regulation, Neoplastic , Humans , Polymerase Chain Reaction , RNA, Neoplasm/genetics , RNA, Neoplasm/isolation & purification , RNA, Neoplasm/metabolism , Sequence Analysis, DNA
6.
PLoS Negl Trop Dis ; 5(3): e984, 2011 Mar 08.
Article in English | MEDLINE | ID: mdl-21408126

ABSTRACT

Trypanosoma cruzi is the causative agent of Chagas disease, which affects more than 9 million people in Latin America. We have generated a draft genome sequence of the TcI strain Sylvio X10/1 and compared it to the TcVI reference strain CL Brener to identify lineage-specific features. We found virtually no differences in the core gene content of CL Brener and Sylvio X10/1 by presence/absence analysis, but 6 open reading frames from CL Brener were missing in Sylvio X10/1. Several multicopy gene families, including DGF, mucin, MASP and GP63 were found to contain substantially fewer genes in Sylvio X10/1, based on sequence read estimations. 1,861 small insertion-deletion events and 77,349 nucleotide differences, 23% of which were non-synonymous and associated with radical amino acid changes, further distinguish these two genomes. There were 336 genes indicated as under positive selection, 145 unique to T. cruzi in comparison to T. brucei and Leishmania. This study provides a framework for further comparative analyses of two major T. cruzi lineages and also highlights the need for sequencing more strains to understand fully the genomic composition of this parasite.


Subject(s)
DNA, Protozoan/genetics , Genome, Protozoan , Sequence Analysis, DNA , Trypanosoma cruzi/genetics , DNA, Protozoan/chemistry , Humans , Latin America , Molecular Sequence Data , Mutagenesis, Insertional , Sequence Deletion , Sequence Homology , Synteny
7.
PLoS One ; 5(7): e11345, 2010 Jul 07.
Article in English | MEDLINE | ID: mdl-20628644

ABSTRACT

BACKGROUND: Ultra-deep pyrosequencing (UDPS) allows identification of rare HIV-1 variants and minority drug resistance mutations, which are not detectable by standard sequencing. PRINCIPAL FINDINGS: Here, UDPS was used to analyze the dynamics of HIV-1 genetic variation in reverse transcriptase (RT) (amino acids 180-220) in six individuals consecutively sampled before, during and after failing 3TC and AZT containing antiretroviral treatment. Optimized UDPS protocols and bioinformatic software were developed to generate, clean and analyze the data. The data cleaning strategy reduced the error rate of UDPS to an average of 0.05%, which is lower than previously reported. Consequently, the cut-off for detection of resistance mutations was very low. A median of 16,016 (range 2,406-35,401) sequence reads were obtained per sample, which allowed detection and quantification of minority resistance mutations at amino acid position 181, 184, 188, 190, 210, 215 and 219 in RT. In four of five pre-treatment samples low levels (0.07-0.09%) of the M184I mutation were observed. Other resistance mutations, except T215A and T215I were below the detection limit. During treatment failure, M184V replaced M184I and dominated the population in combination with T215Y, while wild-type variants were rarely detected. Resistant virus disappeared rapidly after treatment interruption and was undetectable as early as after 3 months. In most patients, drug resistant variants were replaced by wild-type variants identical to those present before treatment, suggesting rebound from latent reservoirs. CONCLUSIONS: With this highly sensitive UDPS protocol preexisting drug resistance was infrequently observed; only M184I, T215A and T215I were detected at very low levels. Similarly, drug resistant variants in plasma quickly decreased to undetectable levels after treatment interruption. The study gives important insights into the dynamics of the HIV-1 quasispecies and is of relevance for future research and clinical use of the UDPS technology.


Subject(s)
HIV-1/classification , HIV-1/genetics , Anti-HIV Agents/therapeutic use , Drug Resistance, Viral/genetics , HIV Infections/drug therapy , HIV Infections/virology , HIV-1/drug effects , Humans , Mutation , Polymerase Chain Reaction , Sequence Analysis, DNA
8.
Nature ; 464(7288): 587-91, 2010 Mar 25.
Article in English | MEDLINE | ID: mdl-20220755

ABSTRACT

Domestic animals are excellent models for genetic studies of phenotypic evolution. They have evolved genetic adaptations to a new environment, the farm, and have been subjected to strong human-driven selection leading to remarkable phenotypic changes in morphology, physiology and behaviour. Identifying the genetic changes underlying these developments provides new insight into general mechanisms by which genetic variation shapes phenotypic diversity. Here we describe the use of massively parallel sequencing to identify selective sweeps of favourable alleles and candidate mutations that have had a prominent role in the domestication of chickens (Gallus gallus domesticus) and their subsequent specialization into broiler (meat-producing) and layer (egg-producing) chickens. We have generated 44.5-fold coverage of the chicken genome using pools of genomic DNA representing eight different populations of domestic chickens as well as red jungle fowl (Gallus gallus), the major wild ancestor. We report more than 7,000,000 single nucleotide polymorphisms, almost 1,300 deletions and a number of putative selective sweeps. One of the most striking selective sweeps found in all domestic chickens occurred at the locus for thyroid stimulating hormone receptor (TSHR), which has a pivotal role in metabolic regulation and photoperiod control of reproduction in vertebrates. Several of the selective sweeps detected in broilers overlapped genes associated with growth, appetite and metabolic regulation. We found little evidence that selection for loss-of-function mutations had a prominent role in chicken domestication, but we detected two deletions in coding sequences that we suggest are functionally important. This study has direct application to animal breeding and enhances the importance of the domestic chicken as a model organism for biomedical research.


Subject(s)
Chickens/genetics , Genetic Loci/genetics , Genome/genetics , Selection, Genetic/genetics , Amino Acid Sequence , Animals , Biological Evolution , Female , Male , Molecular Sequence Data , Polymorphism, Single Nucleotide , Sequence Alignment , Sequence Analysis, DNA , Sequence Deletion
9.
PLoS Negl Trop Dis ; 4(12): e919, 2010 Dec 21.
Article in English | MEDLINE | ID: mdl-21200421

ABSTRACT

BACKGROUND: Neurocysticercosis is a disease caused by the oral ingestion of eggs from the human parasitic worm Taenia solium. Although drugs are available they are controversial because of the side effects and poor efficiency. An expressed sequence tag (EST) library is a method used to describe the gene expression profile and sequence of mRNA from a specific organism and stage. Such information can be used in order to find new targets for the development of drugs and to get a better understanding of the parasite biology. METHODS AND FINDINGS: Here an EST library consisting of 5760 sequences from the pig cysticerca stage has been constructed. In the library 1650 unique sequences were found and of these, 845 sequences (52%) were novel to T. solium and not identified within other EST libraries. Furthermore, 918 sequences (55%) were of unknown function. Amongst the 25 most frequently expressed sequences 6 had no relevant similarity to other sequences found in the Genbank NR DNA database. A prediction of putative signal peptides was also performed and 4 among the 25 were found to be predicted with a signal peptide. Proposed vaccine and diagnostic targets T24, Tsol18/HP6 and Tso31d could also be identified among the 25 most frequently expressed. CONCLUSIONS: An EST library has been produced from pig cysticerca and analyzed. More than half of the different ESTs sequenced contained a sequence with no suggested function and 845 novel EST sequences have been identified. The library increases the knowledge about what genes are expressed and to what level. It can also be used to study different areas of research such as drug and diagnostic development together with parasite fitness via e.g. immune modulation.


Subject(s)
Cysticercosis/veterinary , Expressed Sequence Tags , Gene Library , Swine Diseases/parasitology , Taenia solium/genetics , Animals , Computational Biology , Sequence Analysis, DNA , Sequence Homology , Swine
10.
PLoS Pathog ; 5(8): e1000560, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19696920

ABSTRACT

Giardia intestinalis is a major cause of diarrheal disease worldwide and two major Giardia genotypes, assemblages A and B, infect humans. The genome of assemblage A parasite WB was recently sequenced, and the structurally compact 11.7 Mbp genome contains simplified basic cellular machineries and metabolism. We here performed 454 sequencing to 16x coverage of the assemblage B isolate GS, the only Giardia isolate successfully used to experimentally infect animals and humans. The two genomes show 77% nucleotide and 78% amino-acid identity in protein coding regions. Comparative analysis identified 28 unique GS and 3 unique WB protein coding genes, and the variable surface protein (VSP) repertoires of the two isolates are completely different. The promoters of several enzymes involved in the synthesis of the cyst-wall lack binding sites for encystation-specific transcription factors in GS. Several synteny-breaks were detected and verified. The tetraploid GS genome shows higher levels of overall allelic sequence polymorphism (0.5 versus <0.01% in WB). The genomic differences between WB and GS may explain some of the observed biological and clinical differences between the two isolates, and it suggests that assemblage A and B Giardia can be two different species.


Subject(s)
Genome, Protozoan , Giardia lamblia/genetics , Giardiasis/parasitology , Animals , Base Sequence , Gene Frequency , Genome, Bacterial/genetics , Giardia lamblia/classification , Humans , Introns , Molecular Sequence Data , Phylogeny , Polymorphism, Genetic , Porphyromonas gingivalis/genetics , Promoter Regions, Genetic , Protozoan Proteins/genetics , Protozoan Proteins/metabolism , RNA Splicing , RNA, Messenger/metabolism , RNA, Protozoan/genetics , Sequence Alignment , Synteny
11.
Malar J ; 7: 46, 2008 Mar 07.
Article in English | MEDLINE | ID: mdl-18325124

ABSTRACT

BACKGROUND: Segmental duplications (SD) have been found in genomes of various organisms, often accumulated at the ends of chromosomes. It has been assumed that the sequence homology in-between the SDs allow for ectopic interactions that may contribute to the emergence of new genes or gene variants through recombinatorial events. METHODS: In silico analysis of the 3D7 Plasmodium falciparum genome, conducted to investigate the subtelomeric compartments, led to the identification of subtelomeric SDs. Sequence variation and copy number polymorphisms of the SDs were studied by DNA sequencing, real-time quantitative PCR (qPCR) and fluorescent in situ hybridization (FISH). The levels of transcription and the developmental expression of copy number variant genes were investigated by qPCR. RESULTS: A block of six genes of >10 kilobases in size, including var, rif, pfmc-2tm and three hypothetical genes (n-, o- and q-gene), was found duplicated in the subtelomeric regions of chromosomes 1, 2, 3, 6, 7, 10 and 11 (SD1). The number of SD1 per genome was found to vary from 4 to 8 copies in between different parasites. The intragenic regions of SD1 were found to be highly conserved across ten distinct fresh and long-term cultivated P. falciparum. Sequence variation was detected in a approximately 23 amino-acid long hypervariable region of a surface-exposed loop of PFMC-2TM. A hypothetical gene within SD1, the n-gene, encoding a PEXEL/VTS-containing two-transmembrane protein was found expressed in ring stage parasites. The n-gene transcription levels were found to correlate to the number of n-gene copies. Fragments of SD1 harbouring two or three of the SD1-genes (o-gene, pfmc-2tm, q-gene) were also found in the 3D7 genome. In addition a related second SD, SD2, of approximately 55% sequence identity to SD1 was found duplicated in a fresh clinical isolate but was only present in a single copy in 3D7 and in other P. falciparum lines or clones. CONCLUSION: Plasmodium falciparum carries multiple sequence conserved SDs in the otherwise highly variable subtelomeres of its chromosomes. The uniqueness of the SDs amongst plasmodium species, and the conserved nature of the genes within, is intriguing and suggests an important role of the SD to P. falciparum.


Subject(s)
DNA, Protozoan/genetics , Gene Duplication , Plasmodium falciparum/genetics , Telomere , Animals , Computational Biology , Conserved Sequence , Gene Dosage , Gene Expression Profiling , In Situ Hybridization , Polymerase Chain Reaction , Protozoan Proteins/genetics , RNA, Messenger/biosynthesis , RNA, Protozoan/biosynthesis , Sequence Analysis, DNA , Sequence Homology
SELECTION OF CITATIONS
SEARCH DETAIL
...