Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 39(5)2023 05 04.
Article in English | MEDLINE | ID: mdl-37171844

ABSTRACT

MOTIVATION: A pangenome represents many diverse genome sequences of the same species. In order to cope with small variations as well as structural variations, recent research focused on the development of graph-based models of pangenomes. Mapping is the process of finding the original location of a DNA read in a reference sequence, typically a genome. Using a pangenome instead of a (linear) reference genome can, e.g. reduce mapping bias, the tendency to incorrectly map sequences that differ from the reference genome. Mapping reads to a graph, however, is more complex and needs more resources than mapping to a reference genome. Reducing the complexity of the graph by encoding simple variations like SNPs in a simple way can accelerate read mapping and reduce the memory requirements at the same time. RESULTS: We introduce graphs based on elastic-degenerate strings (ED strings, EDS) and the linearized form of these EDS graphs as a new representation for pangenomes. In this representation, small variations are encoded directly in the sequence. Structural variations are encoded in a graph structure. This reduces the size of the representation in comparison to sequence graphs. In the linearized form, mapping techniques that are known from ordinary strings can be applied with appropriate adjustments. Since most variations are expressed directly in the sequence, the mapping process rarely has to take edges of the EDS graph into account. We developed a prototypical software tool GED-MAP that uses this representation together with a minimizer index to map short reads to the pangenome. Our experiments show that the new method works on a whole human genome scale, taking structural variants properly into account. The advantage of GED-MAP, compared with other pangenomic short read mappers, is that the new representation allows for a simple indexing method. This makes GED-MAP fast and memory efficient. AVAILABILITY AND IMPLEMENTATION: Sources are available at: https://github.com/thomas-buechler-ulm/gedmap.


Subject(s)
Genome, Human , Software , Humans , Sequence Analysis, DNA/methods , Algorithms
2.
Bioinformatics ; 36(5): 1413-1419, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31613311

ABSTRACT

MOTIVATION: In resequencing experiments, a high-throughput sequencer produces DNA-fragments (called reads) and each read is then mapped to the locus in a reference genome at which it fits best. Currently dominant read mappers are based on the Burrows-Wheeler transform (BWT). A read can be mapped correctly if it is similar enough to a substring of the reference genome. However, since the reference genome does not represent all known variations, read mapping tends to be biased towards the reference and mapping errors may thus occur. To cope with this problem, Huang et al. encoded single nucleotide polymorphisms (SNPs) in a BWT by the International Union of Pure and Applied Chemistry (IUPAC) nucleotide code. In a different approach, Maciuca et al. provided a 'natural encoding' of SNPs and other genetic variations in a BWT. However, their encoding resulted in a significantly increased alphabet size (the modified alphabet can have millions of new symbols, which usually implies a loss of efficiency). Moreover, the two approaches do not handle all known kinds of variation. RESULTS: In this article, we propose a method that is able to encode many kinds of genetic variation (SNPs, multi-nucleotide polymorphisms, insertions or deletions, duplications, transpositions, inversions and copy-number variation) in a BWT. It takes the best of both worlds: SNPs are encoded by the IUPAC nucleotide code as in Huang et al. (2013, Short read alignment with populations of genomes. Bioinformatics, 29, i361-i370) and the encoding of the other kinds of genetic variation relies on the idea introduced in Maciuca et al. (2016, A natural encoding of genetic variation in a Burrows-Wheeler transform to enable mapping and genome inference. In: Proceedings of the 16th International Workshop on Algorithms in Bioinformatics, Volume 9838 of Lecture Notes in Computer Science, pp. 222-233. Springer). In contrast to Maciuca et al., however, we use only one additional symbol. This symbol marks variant sites in a chromosome and delimits multiple variants, which are added at the end of the 'marked chromosome'. We show how the backward search algorithm, which is used in BWT-based read mappers, can be modified in such a way that it can cope with the genetic variation encoded in the BWT. We implemented our method and compared it with BWBBLE and gramtools. AVAILABILITY AND IMPLEMENTATION: https://www.uni-ulm.de/in/theo/research/seqana/.


Subject(s)
Genome, Human , Genomics , Algorithms , Computational Biology , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA , Software
4.
PLoS One ; 14(5): e0216666, 2019.
Article in English | MEDLINE | ID: mdl-31091244

ABSTRACT

Mucins and their glycosylation have been suggested to play an important role in colorectal carcinogenesis. We examined potentially functional genetic variants in the mucin genes or genes involved in their glycosylation with respect to colorectal cancer (CRC) risk and clinical outcome. We genotyped 23 single nucleotide polymorphisms (SNPs) covering 123 SNPs through pairwise linkage disequilibrium (r2>0.80) in the MUC1, MUC2, MUC4, MUC5AC, MUC6, and B3GNT6 genes in a hospital-based case-control study of 1532 CRC cases and 1108 healthy controls from the Czech Republic. We also analyzed these SNPs in relation to overall survival and event-free survival in a subgroup of 672 patients. Among patients without distant metastasis at the time of diagnosis, two MUC4 SNPs, rs3107764 and rs842225, showed association with overall survival (HR 1.40, 95%CI 1.08-1.82, additive model, log-rank p = 0.004 and HR 0.64, 95%CI 0.42-0.99, recessive model, log-rank p = 0.01, respectively) and event-free survival (HR 1.31, 95%CI 1.03-1.68, log-rank p = 0.004 and HR 0.64, 95%CI 0.42-0.96, log-rank p = 0.006, respectively) after adjustment for age, sex and TNM stage. Our data suggest that genetic variation especially in the transmembrane mucin gene MUC4 may play a role in the survival of CRC and further studies are warranted.


Subject(s)
Colorectal Neoplasms/genetics , Mucin-4/genetics , Adult , Aged , Aged, 80 and over , Biomarkers, Tumor/genetics , Case-Control Studies , Colonic Neoplasms/genetics , Colonic Neoplasms/mortality , Colonic Neoplasms/pathology , Colorectal Neoplasms/mortality , Colorectal Neoplasms/pathology , Czech Republic , Disease-Free Survival , Female , Genotype , Glycosylation , Humans , Kaplan-Meier Estimate , Linkage Disequilibrium , Male , Middle Aged , Mucin-4/metabolism , Mucins/genetics , Mucins/metabolism , Polymorphism, Single Nucleotide/genetics , Progression-Free Survival , Risk Factors
5.
PLoS One ; 9(10): e111061, 2014.
Article in English | MEDLINE | ID: mdl-25350395

ABSTRACT

Interferon (IFN) signaling has been suggested to play an important role in colorectal carcinogenesis. Our study aimed to examine potentially functional genetic variants in interferon regulatory factor 3 (IRF3), IRF5, IRF7, type I and type II IFN and their receptor genes with respect to colorectal cancer (CRC) risk and clinical outcome. Altogether 74 single nucleotide polymorphisms (SNPs) were covered by the 34 SNPs genotyped in a hospital-based case-control study of 1327 CRC cases and 758 healthy controls from the Czech Republic. We also analyzed these SNPs in relation to overall survival and event-free survival in a subgroup of 483 patients. Seven SNPs in IFNA1, IFNA13, IFNA21, IFNK, IFNAR1 and IFNGR1 were associated with CRC risk. After multiple testing correction, the associations with the SNPs rs2856968 (IFNAR1) and rs2234711 (IFNGR1) remained formally significant (P = 0.0015 and P<0.0001, respectively). Multivariable survival analyses showed that the SNP rs6475526 (IFNA7/IFNA14) was associated with overall survival of the patients (P = 0.041 and event-free survival among patients without distant metastasis at the time of diagnosis, P = 0.034). The hazard ratios (HRs) for rs6475526 remained statistically significant even after adjustment for age, gender, grade and stage (P = 0.029 and P = 0.036, respectively), suggesting that rs6475526 is an independent prognostic marker for CRC. Our data suggest that genetic variation in the IFN signaling pathway genes may play a role in the etiology and survival of CRC and further studies are warranted.


Subject(s)
Colorectal Neoplasms/genetics , Genetic Predisposition to Disease , Interferons/genetics , Polymorphism, Single Nucleotide , Aged , Case-Control Studies , Colorectal Neoplasms/metabolism , Disease-Free Survival , Female , Genotype , Hospitals , Humans , Interferons/metabolism , Linkage Disequilibrium , Male , Middle Aged , Multivariate Analysis , Proportional Hazards Models , Risk Factors , Signal Transduction
SELECTION OF CITATIONS
SEARCH DETAIL
...