Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
Add more filters










Publication year range
2.
DNA Res ; 31(3)2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38686638

ABSTRACT

Lodderomyces beijingensis is an ascosporic ascomycetous yeast. In contrast to related species Lodderomyces elongisporus, which is a recently emerging human pathogen, L. beijingensis is associated with insects. To provide an insight into its genetic makeup, we investigated the genome of its type strain, CBS 14171. We demonstrate that this yeast is diploid and describe the high contiguity nuclear genome assembly consisting of eight chromosome-sized contigs with a total size of about 15.1 Mbp. We find that the genome sequence contains multiple copies of the mating type loci and codes for essential components of the mating pheromone response pathway, however, the missing orthologs of several genes involved in the meiotic program raise questions about the mode of sexual reproduction. We also show that L. beijingensis genome codes for the 3-oxoadipate pathway enzymes, which allow the assimilation of protocatechuate. In contrast, the GAL gene cluster underwent a decay resulting in an inability of L. beijingensis to utilize galactose. Moreover, we find that the 56.5 kbp long mitochondrial DNA is structurally similar to known linear mitochondrial genomes terminating on both sides with covalently closed single-stranded hairpins. Finally, we discovered a new double-stranded RNA mycovirus from the Totiviridae family and characterized its genome sequence.


Subject(s)
Chromosomes, Fungal , Genes, Mating Type, Fungal , Genome, Fungal , Chromosomes, Fungal/genetics , Saccharomycetales/genetics , Saccharomycetales/metabolism
3.
bioRxiv ; 2023 Nov 22.
Article in English | MEDLINE | ID: mdl-38045397

ABSTRACT

An annotation is a set of genomic intervals sharing a particular function or property. Examples include genes, conserved elements, and epigenetic modifications. A common task is to compare two annotations to determine if one is enriched or depleted in the regions covered by the other. We study the problem of assigning statistical significance to such a comparison based on a null model representing two random unrelated annotations. Previous approaches to this problem remain too slow or inaccurate. To incorporate more background information into such analyses and avoid biased results, we propose a new null model based on a Markov chain which differentiates among several genomic contexts. These contexts can capture various confounding factors, such as GC content or sequencing gaps. We then develop a new algorithm for estimating p-values by computing the exact expectation and variance of the test statistics and then estimating the p-value using a normal approximation. Compared to the previous algorithm by Gafurov et al., the new algorithm provides three advances: (1) the running time is improved from quadratic to linear or quasi-linear, (2) the algorithm can handle two different test statistics, and (3) the algorithm can handle both simple and context-dependent Markov chain null models. We demonstrate the efficiency and accuracy of our algorithm on synthetic and real data sets, including the recent human telomere-to-telomere assembly. In particular, our algorithm computed p-values for 450 pairs of human genome annotations using 24 threads in under three hours. The use of genomic contexts to correct for GC-bias also resulted in the reversal of some previously published findings. Availability: The software is freely available at https://github.com/fmfi-compbio/mcdp2 under the MIT licence. All data for reproducibility are available at https://github.com/fmfi-compbio/mcdp2-reproducibility.

4.
Front Microbiol ; 14: 1267695, 2023.
Article in English | MEDLINE | ID: mdl-37869681

ABSTRACT

Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph neural networks (GNNs) and the assembly graph to propagate the information from nearby nodes, which leads to more accurate classification, especially for short contigs that are difficult to classify based on sequence features or database searches alone. We trained plASgraph2 on a data set of samples from the ESKAPEE group of pathogens. plASgraph2 either outperforms or performs on par with a wide range of state-of-the-art methods on testing sets of independent ESKAPEE samples and samples from related pathogens. On one hand, our study provides a new accurate and easy to use tool for contig classification in bacterial isolates; on the other hand, it serves as a proof-of-concept for the use of GNNs in genomics. Our software is available at https://github.com/cchauve/plasgraph2 and the training and testing data sets are available at https://github.com/fmfi-compbio/plasgraph2-datasets.

5.
Bioinformatics ; 39(39 Suppl 1): i288-i296, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387134

ABSTRACT

MOTIVATION: The analysis of bacterial isolates to detect plasmids is important due to their role in the propagation of antimicrobial resistance. In short-read sequence assemblies, both plasmids and bacterial chromosomes are typically split into several contigs of various lengths, making identification of plasmids a challenging problem. In plasmid contig binning, the goal is to distinguish short-read assembly contigs based on their origin into plasmid and chromosomal contigs and subsequently sort plasmid contigs into bins, each bin corresponding to a single plasmid. Previous works on this problem consist of de novo approaches and reference-based approaches. De novo methods rely on contig features such as length, circularity, read coverage, or GC content. Reference-based approaches compare contigs to databases of known plasmids or plasmid markers from finished bacterial genomes. RESULTS: Recent developments suggest that leveraging information contained in the assembly graph improves the accuracy of plasmid binning. We present PlasBin-flow, a hybrid method that defines contig bins as subgraphs of the assembly graph. PlasBin-flow identifies such plasmid subgraphs through a mixed integer linear programming model that relies on the concept of network flow to account for sequencing coverage, while also accounting for the presence of plasmid genes and the GC content that often distinguishes plasmids from chromosomes. We demonstrate the performance of PlasBin-flow on a real dataset of bacterial samples. AVAILABILITY AND IMPLEMENTATION: https://github.com/cchauve/PlasBin-flow.


Subject(s)
Algorithms , Genome, Bacterial , Plasmids/genetics , Cell Movement , Databases, Factual
6.
Bioinformatics ; 39(6)2023 06 01.
Article in English | MEDLINE | ID: mdl-37326967

ABSTRACT

MOTIVATION: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required. RESULTS: Here, we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique. AVAILABILITY AND IMPLEMENTATION: WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr.


Subject(s)
Nanopores , High-Throughput Nucleotide Sequencing/methods , Genome , Algorithms , Microsatellite Repeats , Sequence Analysis, DNA
7.
Genetics ; 224(3)2023 Jul 06.
Article in English | MEDLINE | ID: mdl-37183478

ABSTRACT

One powerful strategy of how to increase the complexity of cellular proteomes is through posttranslational modifications (PTMs) of proteins. Currently, there are ∼400 types of PTMs, the different combinations of which yield a large variety of protein isoforms with distinct biochemical properties. Although mitochondrial proteins undergoing PTMs were identified nearly 6 decades ago, studies on the roles and extent of PTMs on mitochondrial functions lagged behind the other cellular compartments. The application of mass spectrometry for the characterization of the mitochondrial proteome as well as for the detection of various PTMs resulted in the identification of thousands of amino acid positions that can be modified by different chemical groups. However, the data on mitochondrial PTMs are scattered in several data sets, and the available databases do not contain a complete list of modified residues. To integrate information on PTMs of the mitochondrial proteome of the yeast Saccharomyces cerevisiae, we built the yeast mitochondrial posttranslational modification (y-mtPTM) database (http://compbio.fmph.uniba.sk/y-mtptm/). It lists nearly 20,000 positions on mitochondrial proteins affected by ∼20 various PTMs, with phosphorylated, succinylated, acetylated, and ubiquitylated sites being the most abundant. A simple search of a protein of interest reveals the modified amino acid residues, their position within the primary sequence as well as on its 3D structure, and links to the source reference(s). The database will serve yeast mitochondrial researchers as a comprehensive platform to investigate the functional significance of the PTMs of mitochondrial proteins.


Subject(s)
Proteome , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Proteome/metabolism , Protein Processing, Post-Translational , Mitochondrial Proteins/genetics , Mitochondrial Proteins/metabolism , Amino Acids
8.
Microbiol Resour Announc ; 12(3): e0000523, 2023 Mar 16.
Article in English | MEDLINE | ID: mdl-36840572

ABSTRACT

Candida verbasci is an anamorphic ascomycetous yeast. We report the genome sequence of its type strain, 11-1055 (CBS 12699). The nuclear genome assembly consists of seven chromosome-sized contigs with a total size of 12.1 Mbp and has a relatively low G+C content (28.1%).

9.
BMC Bioinformatics ; 23(1): 551, 2022 Dec 19.
Article in English | MEDLINE | ID: mdl-36536300

ABSTRACT

BACKGROUND: The genomes of SARS-CoV-2 are classified into variants, some of which are monitored as variants of concern (e.g. the Delta variant B.1.617.2 or Omicron variant B.1.1.529). Proportions of these variants circulating in a human population are typically estimated by large-scale sequencing of individual patient samples. Sequencing a mixture of SARS-CoV-2 RNA molecules from wastewater provides a cost-effective alternative, but requires methods for estimating variant proportions in a mixed sample. RESULTS: We propose a new method based on a probabilistic model of sequencing reads, capturing sequence diversity present within individual variants, as well as sequencing errors. The algorithm is implemented in an open source Python program called VirPool. We evaluate the accuracy of VirPool on several simulated and real sequencing data sets from both Illumina and nanopore sequencing platforms, including wastewater samples from Austria and France monitoring the onset of the Alpha variant. CONCLUSIONS: VirPool is a versatile tool for wastewater and other mixed-sample analysis that can handle both short- and long-read sequencing data. Our approach does not require pre-selection of characteristic mutations for variant profiles, it is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read.


Subject(s)
COVID-19 , SARS-CoV-2 , Wastewater , Humans , RNA, Viral , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , Wastewater/virology
10.
PLoS Genet ; 18(3): e1009815, 2022 03.
Article in English | MEDLINE | ID: mdl-35255079

ABSTRACT

Many fungal species utilize hydroxyderivatives of benzene and benzoic acid as carbon sources. The yeast Candida parapsilosis metabolizes these compounds via the 3-oxoadipate and gentisate pathways, whose components are encoded by two metabolic gene clusters. In this study, we determine the chromosome level assembly of the C. parapsilosis strain CLIB214 and use it for transcriptomic and proteomic investigation of cells cultivated on hydroxyaromatic substrates. We demonstrate that the genes coding for enzymes and plasma membrane transporters involved in the 3-oxoadipate and gentisate pathways are highly upregulated and their expression is controlled in a substrate-specific manner. However, regulatory proteins involved in this process are not known. Using the knockout mutants, we show that putative transcriptional factors encoded by the genes OTF1 and GTF1 located within these gene clusters function as transcriptional activators of the 3-oxoadipate and gentisate pathway, respectively. We also show that the activation of both pathways is accompanied by upregulation of genes for the enzymes involved in ß-oxidation of fatty acids, glyoxylate cycle, amino acid metabolism, and peroxisome biogenesis. Transcriptome and proteome profiles of the cells grown on 4-hydroxybenzoate and 3-hydroxybenzoate, which are metabolized via the 3-oxoadipate and gentisate pathway, respectively, reflect their different connection to central metabolism. Yet we find that the expression profiles differ also in the cells assimilating 4-hydroxybenzoate and hydroquinone, which are both metabolized in the same pathway. This finding is consistent with the phenotype of the Otf1p-lacking mutant, which exhibits impaired growth on hydroxybenzoates, but still utilizes hydroxybenzenes, thus indicating that additional, yet unidentified transcription factor could be involved in the 3-oxoadipate pathway regulation. Moreover, we propose that bicarbonate ions resulting from decarboxylation of hydroxybenzoates also contribute to differences in the cell responses to hydroxybenzoates and hydroxybenzenes. Finally, our phylogenetic analysis highlights evolutionary paths leading to metabolic adaptations of yeast cells assimilating hydroxyaromatic substrates.


Subject(s)
Candida parapsilosis , Gentisates , Candida parapsilosis/metabolism , Carbon , Gentisates/metabolism , Hydroxybenzoates/metabolism , Phylogeny , Proteome/genetics , Proteomics , Saccharomyces cerevisiae/metabolism , Transcriptome/genetics
11.
EBioMedicine ; 76: 103818, 2022 Feb.
Article in English | MEDLINE | ID: mdl-35078012

ABSTRACT

BACKGROUND: The emergence of new SARS-CoV-2 variants of concern B.1.1.7 (Alpha), B.1.351 (Beta), P.1 (Gamma) and B.1.617.2 (Delta) that harbor mutations in the viral S protein raised concern about activity of current vaccines and therapeutic antibodies. Independent studies have shown that mutant variants are partially or completely resistant against some of the therapeutic antibodies authorized for emergency use. METHODS: We employed hybridoma technology, ELISA-based and cell-based S-ACE2 interaction assays combined with authentic virus neutralization assays to develop second-generation antibodies, which were specifically selected for their ability to neutralize the new variants of SARS-CoV-2. FINDINGS: AX290 and AX677, two monoclonal antibodies with non-overlapping epitopes, exhibit subnanomolar or nanomolar affinities to the receptor binding domain of the viral Spike protein carrying amino acid substitutions N501Y, N439K, E484K, K417N, and a combination N501Y/E484K/K417N found in the circulating virus variants. The antibodies showed excellent neutralization of an authentic SARS-CoV-2 virus representing strains circulating in Europe in spring 2020 and also the variants of concern B.1.1.7 (Alpha), B.1.351 (Beta) and B.1.617.2 (Delta). In addition, AX677 is able to bind Omicron Spike protein just like the wild type Spike. The combination of the two antibodies prevented the appearance of escape mutations of the authentic SARS-CoV-2 virus. Prophylactic administration of AX290 and AX677, either individually or in combination, effectively reduced viral burden and inflammation in the lungs, and prevented disease in a mouse model of SARS-CoV-2 infection. INTERPRETATION: The virus-neutralizing properties were fully reproduced in chimeric mouse-human versions of the antibodies, which may represent a promising tool for COVID-19 therapy. FUNDING: The study was funded by AXON Neuroscience SE and AXON COVIDAX a.s.


Subject(s)
Antibodies, Monoclonal/immunology , Antineoplastic Agents, Immunological/immunology , Immunodominant Epitopes/immunology , SARS-CoV-2/immunology , Spike Glycoprotein, Coronavirus/immunology , Angiotensin-Converting Enzyme 2/chemistry , Angiotensin-Converting Enzyme 2/genetics , Angiotensin-Converting Enzyme 2/metabolism , Animals , Antibodies, Monoclonal/therapeutic use , Antigenic Drift and Shift , Antineoplastic Agents, Immunological/therapeutic use , COVID-19/virology , Disease Models, Animal , Humans , Kinetics , Lung/pathology , Mice , Mutation , Neutralization Tests , Protein Binding , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/metabolism , COVID-19 Drug Treatment
12.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3416-3424, 2022.
Article in English | MEDLINE | ID: mdl-34784283

ABSTRACT

In nanopore sequencing, electrical signal is measured as DNA molecules pass through the sequencing pores. Translating these signals into DNA bases (base calling) is a highly non-trivial task, and its quality has a large impact on the sequencing accuracy. The most successful nanopore base callers to date use convolutional neural networks (CNN) to accomplish the task. Convolutional layers in CNNs are typically composed of filters with constant window size, performing best in analysis of signals with uniform speed. However, the speed of nanopore sequencing varies greatly both within reads and between sequencing runs. Here, we present dynamic pooling, a novel neural network component, which addresses this problem by adaptively adjusting the pooling ratio. To demonstrate the usefulness of dynamic pooling, we developed two base callers: Heron and Osprey. Heron improves the accuracy beyond the experimental high-accuracy base caller Bonito developed by Oxford Nanopore. Osprey is a fast base caller that can compete in accuracy with Guppy high-accuracy mode, but does not require GPU acceleration and achieves a near real-time speed on common desktop CPUs. Availability: https://github.com/fmfi-compbio/osprey, https://github.com/fmfi-compbio/heron.


Subject(s)
Nanopores , Software , Sequence Analysis, DNA , High-Throughput Nucleotide Sequencing , DNA/genetics
13.
Sci Rep ; 11(1): 20494, 2021 10 14.
Article in English | MEDLINE | ID: mdl-34650153

ABSTRACT

The emergence of a novel SARS-CoV-2 B.1.1.7 variant sparked global alarm due to increased transmissibility, mortality, and uncertainty about vaccine efficacy, thus accelerating efforts to detect and track the variant. Current approaches to detect B.1.1.7 include sequencing and RT-qPCR tests containing a target assay that fails or results in reduced sensitivity towards the B.1.1.7 variant. Since many countries lack genomic surveillance programs and failed assays detect unrelated variants containing similar mutations as B.1.1.7, we used allele-specific PCR, and judicious placement of LNA-modified nucleotides to develop an RT-qPCR test that accurately and rapidly differentiates B.1.1.7 from other SARS-CoV-2 variants. We validated the test on 106 clinical samples with lineage status confirmed by sequencing and conducted a country-wide surveillance study of B.1.1.7 prevalence in Slovakia. Our multiplexed RT-qPCR test showed 97% clinical sensitivity and retesting 6,886 SARS-CoV-2 positive samples obtained during three campaigns performed within one month, revealed pervasive spread of B.1.1.7 with an average prevalence of 82%. Labs can easily implement this test to rapidly scale B.1.1.7 surveillance efforts and it is particularly useful in countries with high prevalence of variants possessing only the ΔH69/ΔV70 deletion because current strategies using target failure assays incorrectly identify these as putative B.1.1.7 variants.


Subject(s)
COVID-19 Nucleic Acid Testing/methods , COVID-19/diagnosis , COVID-19/virology , Multiplex Polymerase Chain Reaction/methods , SARS-CoV-2/genetics , Alleles , COVID-19/epidemiology , Humans , Mutation , Prevalence , RNA, Viral/genetics , SARS-CoV-2/isolation & purification , Slovakia/epidemiology
14.
PLoS One ; 16(10): e0259277, 2021.
Article in English | MEDLINE | ID: mdl-34714886

ABSTRACT

Surveillance of the SARS-CoV-2 variants including the quickly spreading mutants by rapid and near real-time sequencing of the viral genome provides an important tool for effective health policy decision making in the ongoing COVID-19 pandemic. Here we evaluated PCR-tiling of short (~400-bp) and long (~2 and ~2.5-kb) amplicons combined with nanopore sequencing on a MinION device for analysis of the SARS-CoV-2 genome sequences. Analysis of several sequencing runs demonstrated that using the long amplicon schemes outperforms the original protocol based on the 400-bp amplicons. It also illustrated common artefacts and problems associated with PCR-tiling approach, such as uneven genome coverage, variable fraction of discarded sequencing reads, including human and bacterial contamination, as well as the presence of reads derived from the viral sub-genomic RNAs.


Subject(s)
COVID-19/diagnosis , Nanopore Sequencing/methods , Pandemics , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification
15.
Virus Genes ; 57(6): 556-560, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34448987

ABSTRACT

SARS-CoV-2 mutants carrying the ∆H69/∆V70 deletion in the amino-terminal domain of the Spike protein emerged independently in at least six lineages of the virus (namely, B.1.1.7, B.1.1.298, B.1.160, B.1.177, B.1.258, B.1.375). We analyzed SARS-CoV-2 samples collected from various regions of Slovakia between November and December 2020 that were presumed to contain B.1.1.7 variant due to drop-out of the Spike gene target in an RT-qPCR test caused by this deletion. Sequencing of these samples revealed that although in some cases the samples were indeed confirmed as B.1.1.7, a substantial fraction of samples contained another ∆H69/∆V70 carrying mutant belonging to the lineage B.1.258, which has been circulating in Central Europe since August 2020, long before the import of B.1.1.7. Phylogenetic analysis shows that the early sublineage of B.1.258 acquired the N439K substitution in the receptor-binding domain (RBD) of the Spike protein and, later on, also the deletion ∆H69/∆V70 in the Spike N-terminal domain (NTD). This variant was particularly common in several European countries including the Czech Republic and Slovakia but has been quickly replaced by B.1.1.7 early in 2021.


Subject(s)
COVID-19/epidemiology , COVID-19/virology , Phylogeny , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , Sequence Deletion , Spike Glycoprotein, Coronavirus/genetics , Europe/epidemiology , Humans , SARS-CoV-2/classification , Time Factors
16.
Bioinformatics ; 37(24): 4661-4667, 2021 12 11.
Article in English | MEDLINE | ID: mdl-34314502

ABSTRACT

MOTIVATION: MinION is a portable nanopore sequencing device that can be easily operated in the field with features including monitoring of run progress and selective sequencing. To fully exploit these features, real-time base calling is required. Up to date, this has only been achieved at the cost of high computing requirements that pose limitations in terms of hardware availability in common laptops and energy consumption. RESULTS: We developed a new base caller DeepNano-coral for nanopore sequencing, which is optimized to run on the Coral Edge Tensor Processing Unit, a small USB-attached hardware accelerator. To achieve this goal, we have designed new versions of two key components used in convolutional neural networks for speech recognition and base calling. In our components, we propose a new way of factorization of a full convolution into smaller operations, which decreases memory access operations, memory access being a bottleneck on this device. DeepNano-coral achieves real-time base calling during sequencing with the accuracy slightly better than the fast mode of the Guppy base caller and is extremely energy efficient, using only 10 W of power. AVAILABILITY AND IMPLEMENTATION: https://github.com/fmfi-compbio/coral-basecaller. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Nanopores , Software , Sequence Analysis, DNA , High-Throughput Nucleotide Sequencing , Neural Networks, Computer
17.
Bioinformatics ; 36(14): 4191-4192, 2020 08 15.
Article in English | MEDLINE | ID: mdl-32374816

ABSTRACT

MOTIVATION: Oxford Nanopore MinION is a portable DNA sequencer that is marketed as a device that can be deployed anywhere. Current base callers, however, require a powerful GPU to analyze data produced by MinION in real time, which hampers field applications. RESULTS: We have developed a fast base caller DeepNano-blitz that can analyze stream from up to two MinION runs in real time using a common laptop CPU (i7-7700HQ), with no GPU requirements. The base caller settings allow trading accuracy for speed and the results can be used for real time run monitoring (i.e. sample composition, barcode balance, species identification, etc.) or prefiltering of results for more detailed analysis (i.e. filtering out human DNA from human-pathogen runs). AVAILABILITY AND IMPLEMENTATION: DeepNano-blitz has been developed and tested on Linux and Intel processors and is available under MIT license at https://github.com/fmfi-compbio/deepnano-blitz. CONTACT: vladimir.boza@fmph.uniba.sk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Nanopores , DNA , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA , Software
18.
Gigascience ; 9(1)2020 01 01.
Article in English | MEDLINE | ID: mdl-31942620

ABSTRACT

BACKGROUND: The giant squid (Architeuthis dux; Steenstrup, 1857) is an enigmatic giant mollusc with a circumglobal distribution in the deep ocean, except in the high Arctic and Antarctic waters. The elusiveness of the species makes it difficult to study. Thus, having a genome assembled for this deep-sea-dwelling species will allow several pending evolutionary questions to be unlocked. FINDINGS: We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long reads, and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from 3 different tissue types from 3 other species of squid (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein-coding genes supported by evidence, and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome. CONCLUSIONS: This annotated draft genome of A. dux provides a critical resource to investigate the unique traits of this species, including its gigantism and key adaptations to deep-sea environments.


Subject(s)
Decapodiformes/genetics , Genome , Genomics , Animals , Biological Evolution , Chromatography, Liquid , Computational Biology/methods , DNA Transposable Elements , Gene Expression Profiling , Genomics/methods , Molecular Sequence Annotation , Multigene Family , RNA, Untranslated , Tandem Mass Spectrometry , Transcriptome , Whole Genome Sequencing
19.
Microbiol Resour Announc ; 8(50)2019 Dec 12.
Article in English | MEDLINE | ID: mdl-31831616

ABSTRACT

Chromosome-scale genome assembly of the yeast Saprochaete ingens CBS 517.90 was determined by a combination of technologies producing short (HiSeq X; Illumina) and long (MinION; Oxford Nanopore Technologies) reads. The 21.2-Mbp genome sequence has a GC content of 36.9% and codes for 6,475 predicted proteins.

20.
Microbiol Resour Announc ; 8(15)2019 Apr 11.
Article in English | MEDLINE | ID: mdl-30975801

ABSTRACT

Saprochaete fungicola is an arthroconidial yeast classified in the Magnusiomyces/Saprochaete clade of the subphylum Saccharomycotina. Here, we report the genome sequence of holotype strain CBS 625.85, assembled to five putative chromosomes. The genome sequence is 20.2 Mbp long and codes for 6,138 predicted proteins.

SELECTION OF CITATIONS
SEARCH DETAIL
...