Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Sci Rep ; 8(1): 28, 2018 01 08.
Article in English | MEDLINE | ID: mdl-29311716

ABSTRACT

Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.


Subject(s)
Genome, Viral , Metagenome , Metagenomics , Viral Proteins/genetics , Base Sequence , Cluster Analysis , Computational Biology/methods , Humans , Markov Chains , Metagenomics/methods , Molecular Sequence Annotation , Open Reading Frames
2.
PLoS One ; 10(12): e0145013, 2015.
Article in English | MEDLINE | ID: mdl-26698305

ABSTRACT

BACKGROUND: The AKT/mTORC1/S6K pathway is frequently overstimulated in breast cancer, constituting a promising therapeutic target. The benefit from mTOR inhibitors varies, likely as a consequence of tumour heterogeneity, and upregulation of several compensatory feed-back mechanisms. The mTORC1 downstream effectors S6K1, S6K2, and 4EBP1 are amplified and overexpressed in breast cancer, associated with a poor outcome and divergent endocrine treatment benefit. S6K1 and S6K2 share high sequence homology, but evidence of partly distinct biological functions is emerging. The aim of this work was to explore possible different roles and treatment target potentials of S6K1 and S6K2 in breast cancer. MATERIALS AND METHODS: Whole-genome expression profiles were compared for breast tumours expressing high levels of S6K1, S6K2 or 4EBP1, using public datasets, as well as after in vitro siRNA downregulation of S6K1 and/or S6K2 in ZR751 breast cancer cells. In silico homology modelling of the S6K2 kinase domain was used to evaluate its possible structural divergences to S6K1. RESULTS: Genome expression profiles were highly different in S6K1 and S6K2 high tumours, whereas S6K2 and 4EBP1 profiles showed significant overlaps, both correlated to genes involved in cell cycle progression, among these the master regulator E2F1. S6K2 and 4EBP1 were inversely associated with IGF1 levels, and their prognostic value was shown to be restricted to tumours positive for IGFR and/or HER2. In vitro, S6K1 and S6K2 silencing resulted in upregulation of genes in the mTORC1 and mTORC2 complexes. Isoform-specific silencing also showed distinct patterns, e.g. S6K2 downregulation lead to upregulation of several cell cycle associated genes. Structural analyses of the S6K2 kinase domain showed unique structure patterns, deviating from those of S6K1, facilitating the development of isoform-specific inhibitors. Our data support emerging proposals of distinct biological features of S6K1 and S6K2, suggesting their importance as separate oncogenes and clinical markers, where specific targeting in different breast cancer subtypes could facilitate further individualised therapies.


Subject(s)
Biomarkers, Tumor/genetics , Breast Neoplasms/genetics , Gene Expression Profiling , Ribosomal Protein S6 Kinases, 70-kDa/genetics , Ribosomal Protein S6 Kinases, 90-kDa/genetics , TOR Serine-Threonine Kinases/genetics , Breast Neoplasms/mortality , Breast Neoplasms/pathology , Female , High-Throughput Nucleotide Sequencing , Humans , Models, Molecular , Protein Conformation , RNA, Messenger/genetics , Real-Time Polymerase Chain Reaction , Reverse Transcriptase Polymerase Chain Reaction , Ribosomal Protein S6 Kinases, 70-kDa/chemistry , Ribosomal Protein S6 Kinases, 90-kDa/chemistry , Survival Rate , Tumor Cells, Cultured
3.
Nature ; 497(7451): 579-84, 2013 May 30.
Article in English | MEDLINE | ID: mdl-23698360

ABSTRACT

Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.


Subject(s)
Evolution, Molecular , Genome, Plant/genetics , Picea/genetics , Conserved Sequence/genetics , DNA Transposable Elements/genetics , Gene Silencing , Genes, Plant/genetics , Genomics , Internet , Introns/genetics , Phenotype , RNA, Untranslated/genetics , Sequence Analysis, DNA , Terminal Repeat Sequences/genetics , Transcription, Genetic/genetics
4.
BMC Bioinformatics ; 13: 230, 2012 Sep 12.
Article in English | MEDLINE | ID: mdl-22971057

ABSTRACT

BACKGROUND: Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for translated search tools such as BLASTx where they introduce frame-shifts and result in regions of decreased identity and/or terminated alignments, which affect further analysis. RESULTS: To address this issue, the Homopolymer Aware Cross Alignment Tool (HAXAT) was developed. HAXAT uses a novel dynamic programming algorithm for solving the optimal local alignment between a 454 nucleotide and a protein sequence by allowing frame-shifts, guided by 454 flowpeak values. The algorithm is an efficient minimal extension of the Smith-Waterman-Gotoh algorithm that easily fits in into other tools. Experiments using HAXAT demonstrate, through the introduction of 454 specific frame-shift penalties, significantly increased accuracy of alignments spanning homopolymer sequence errors. The full effect of the new parameters introduced with this novel alignment model is explored. Experimental results evaluating homopolymer inaccuracy through alignments show a two to five-fold increase in Matthews Correlation Coefficient over previous algorithms, for 454-derived data. CONCLUSIONS: This increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis. The alignment tool is available at http://bioinfo.ifm.liu.se/454tools/haxat.


Subject(s)
Nucleotides/genetics , Proteins/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , Algorithms , Amino Acid Sequence , Base Sequence , Frameshift Mutation , Phylogeny , Protein Biosynthesis/genetics , Search Engine
5.
Virology ; 432(2): 427-34, 2012 Oct 25.
Article in English | MEDLINE | ID: mdl-22819835

ABSTRACT

Infections during pregnancy have been suggested to be involved in childhood leukemias. We used high-throughput sequencing to describe the viruses most readily detectable in serum samples of pregnant women. Serum DNA of 112 mothers to leukemic children was amplified using whole genome amplification. Sequencing identified one TT virus (TTV) isolate belonging to a known type and two putatively new TTVs. For 22 mothers, we also performed TTV amplification by general primer PCR before sequencing. This detected 39 TTVs, two of which were identical to the TTVs found after whole genome amplification. Altogether, we found 40 TTV isolates, 29 of which were putatively new types (similarities ranging from 89% to 69%). In conclusion, high throughput sequencing is useful to describe the known or unknown viruses that are present in serum samples of pregnant women.


Subject(s)
Phylogeny , Pregnancy Complications, Infectious/virology , Torque teno virus/genetics , Viremia/virology , Amino Acid Sequence , Child , Child, Preschool , DNA Virus Infections/epidemiology , DNA Virus Infections/transmission , DNA Virus Infections/virology , DNA, Viral/blood , Female , High-Throughput Nucleotide Sequencing , Humans , Infectious Disease Transmission, Vertical , Leukemia/epidemiology , Leukemia/virology , Molecular Sequence Data , Polymerase Chain Reaction , Pregnancy , Pregnancy Complications, Infectious/epidemiology , Sequence Analysis, DNA , Torque teno virus/classification , Torque teno virus/isolation & purification , Viremia/epidemiology , Viremia/transmission
6.
PLoS One ; 7(2): e30875, 2012.
Article in English | MEDLINE | ID: mdl-22355331

ABSTRACT

The human respiratory tract is heavily exposed to microorganisms. Viral respiratory tract pathogens, like RSV, influenza and rhinoviruses cause major morbidity and mortality from respiratory tract disease. Furthermore, as viruses have limited means of transmission, viruses that cause pathogenicity in other tissues may be transmitted through the respiratory tract. It is therefore important to chart the human virome in this compartment. We have studied nasopharyngeal aspirate samples submitted to the Karolinska University Laboratory, Stockholm, Sweden from March 2004 to May 2005 for diagnosis of respiratory tract infections. We have used a metagenomic sequencing strategy to characterize viruses, as this provides the most unbiased view of the samples. Virus enrichment followed by 454 sequencing resulted in totally 703,790 reads and 110,931 of these were found to be of viral origin by using an automated classification pipeline. The snapshot of the respiratory tract virome of these 210 patients revealed 39 species and many more strains of viruses. Most of the viral sequences were classified into one of three major families; Paramyxoviridae, Picornaviridae or Orthomyxoviridae. The study also identified one novel type of Rhinovirus C, and identified a number of previously undescribed viral genetic fragments of unknown origin.


Subject(s)
Influenza, Human/genetics , Metagenome/genetics , Metagenomics , Paramyxoviridae Infections/genetics , Picornaviridae Infections/genetics , Respiratory Tract Infections/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Child , Humans , Influenza A virus/genetics , Influenza A virus/isolation & purification , Influenza, Human/diagnosis , Influenza, Human/virology , Middle Aged , Nasopharynx/virology , Paramyxoviridae/genetics , Paramyxoviridae/isolation & purification , Paramyxoviridae Infections/diagnosis , Paramyxoviridae Infections/virology , Phylogeny , Picornaviridae/genetics , Picornaviridae/isolation & purification , Picornaviridae Infections/diagnosis , Picornaviridae Infections/virology , Respiratory Tract Infections/diagnosis , Respiratory Tract Infections/virology , Young Adult
7.
BMC Res Notes ; 4: 449, 2011 Oct 26.
Article in English | MEDLINE | ID: mdl-22029428

ABSTRACT

BACKGROUND: Roche 454 is one of the major 2nd generation sequencing platforms. The particular characteristics of 454 sequence data pose new challenges for bioinformatic analyses, e.g. assembly and alignment search algorithms. Simulation of these data is therefore useful, in order to further assess how bioinformatic applications and algorithms handle 454 data. FINDINGS: We developed a new application named 454sim for simulation of 454 data at high speed and accuracy. The program is multi-thread capable and is available as C++ source code or pre-compiled binaries. Sequence reads are simulated by 454sim using a set of statistical models for each chemistry. 454sim simulates recorded peak intensities, peak quality deterioration and it calculates quality values. All three generations of the Roche 454 chemistry ('GS20', 'GS FLX' and 'Titanium') are supported and defined in external text files for easy access and tweaking. CONCLUSIONS: We present a new platform independent application named 454sim. 454sim is generally 200 times faster compared to previous programs and it allows for simple adjustments of the statistical models. These improvements make it possible to carry out more complex and rigorous algorithm evaluations in a reasonable time scale.

8.
BMC Bioinformatics ; 12: 293, 2011 Jul 19.
Article in English | MEDLINE | ID: mdl-21771335

ABSTRACT

BACKGROUND: High throughput pyrosequencing (454 sequencing) is the major sequencing platform for producing long read high throughput data. While most other sequencing techniques produce reading errors mainly comparable with substitutions, pyrosequencing produce errors mainly comparable with gaps. These errors are less efficiently detected by most conventional alignment programs and may produce inaccurate alignments. RESULTS: We suggest a novel algorithm for calculating the optimal local alignment which utilises flowpeak information in order to improve alignment accuracy. Flowpeak information can be retained from a 454 sequencing run through interpretation of the binary SFF-file format. This novel algorithm has been implemented in a program named FAAST (Flow-space Assisted Alignment Search Tool). CONCLUSIONS: We present and discuss the results of simulations that show that FAAST, through the use of the novel algorithm, can gain several percentage points of accuracy compared to Smith-Waterman-Gotoh alignments, depending on the 454 data quality. Furthermore, through an efficient multi-thread aware implementation, FAAST is able to perform these high quality alignments at high speed. The tool is available at http://www.ifm.liu.se/bioinfo/


Subject(s)
Algorithms , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Search Engine
9.
BMC Microbiol ; 11: 2, 2011 Jan 02.
Article in English | MEDLINE | ID: mdl-21194495

ABSTRACT

BACKGROUND: Chronic fatigue syndrome is an idiopathic syndrome widely suspected of having an infectious or immune etiology. We applied an unbiased metagenomic approach to try to identify known or novel infectious agents in the serum of 45 cases with chronic fatigue syndrome or idiopathic chronic fatigue. Controls were the unaffected monozygotic co-twins of cases, and serum samples were obtained at the same place and time. RESULTS: No novel DNA or RNA viral signatures were confidently identified. Four affected twins and no unaffected twins evidenced viremia with GB virus C (8.9% vs. 0%, p = 0.019), and one affected twin had previously undetected hepatitis C viremia. An excess of GB virus C viremia in cases with chronic fatigue requires confirmation. CONCLUSIONS: Current, impairing chronic fatigue was not robustly associated with viremia detectable in serum.


Subject(s)
Fatigue Syndrome, Chronic/genetics , GB virus C/genetics , Metagenomics/methods , Twins, Monozygotic/genetics , Viremia/genetics , Adult , Communicable Diseases , Cross-Sectional Studies , DNA, Viral/blood , Diseases in Twins , Female , Humans , Male , Middle Aged , RNA, Viral/blood , Risk Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...