Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Sci Rep ; 13(1): 4900, 2023 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-36966180

RESUMO

The molecular pathophysiology underlying lumbar spondylosis development remains unclear. To identify genetic factors associated with lumbar spondylosis, we conducted a genome-wide association study using 83 severe lumbar spondylosis cases and 182 healthy controls and identified 65 candidate disease-associated single nucleotide polymorphisms (SNPs). Replication analysis in 510 case and 911 control subjects from five independent Japanese cohorts identified rs2054564, located in intron 7 of ADAMTS17, as a disease-associated SNP with a genome-wide significance threshold (P = 1.17 × 10-11, odds ratio = 1.92). This association was significant even after adjustment of age, sex, and body mass index (P = 7.52 × 10-11). A replication study in a Korean cohort, including 123 case and 319 control subjects, also verified the significant association of this SNP with severe lumbar spondylosis. Immunohistochemistry revealed that fibrillin-1 (FBN1) and ADAMTS17 were co-expressed in the annulus fibrosus of intervertebral discs (IVDs). ADAMTS17 overexpression in MG63 cells promoted extracellular microfibrils biogenesis, suggesting the potential role of ADAMTS17 in IVD function through interaction with fibrillin fibers. Finally, we provided evidence of FBN1 involvement in IVD function by showing that lumbar IVDs in patients with Marfan syndrome, caused by heterozygous FBN1 gene mutation, were significantly more degenerated. We identified a common SNP variant, located in ADAMTS17, associated with susceptibility to lumbar spondylosis and demonstrated the potential role of the ADAMTS17-fibrillin network in IVDs in lumbar spondylosis development.


Assuntos
Disco Intervertebral , Osteoartrite da Coluna Vertebral , Espondilose , Humanos , Fibrilina-1 , Fibrilinas/análise , Estudo de Associação Genômica Ampla , Disco Intervertebral/química , Microfibrilas , Espondilose/genética
2.
Artigo em Inglês | MEDLINE | ID: mdl-29994538

RESUMO

The Burrows-Wheeler transform (BWT) of short-read data has unexplored potential utilities, such as for efficient and sensitive variation analysis against multiple reference genome sequences, because it does not depend on any particular reference genome sequence, unlike conventional mapping-based methods. However, since the amount of read data is generally much larger than the size of the reference sequence, computation of the BWT of reads is not easy, and this hampers development of potential applications. For the alleviation of this problem, a new method of computing the BWT of reads in parallel is proposed. The BWT, corresponding to a sorted list of suffixes of reads, is constructed incrementally by successively including longer and longer suffixes. The working data is divided into more than 10,000 "blocks" corresponding to sublists of suffixes with the same prefixes. Thousands of groups of blocks can be processed in parallel while making exclusive writes and concurrent reads into a shared memory. Reads and writes are basically sequential, and the read concurrency is limited to two. Thus, a fine-grained parallelism, referred to as prefix parallelism, is expected to work efficiently. The time complexity for processing n reads of length l is O(nl2). On actual biological DNA sequence data of about 100 Gbp with a read length of 100 bp (base pairs), a tentative implementation of the proposed method took less than an hour on a single-node computer; i.e., it was about three times faster than one of the fastest programs developed so far.


Assuntos
Algoritmos , Compressão de Dados/métodos , Bases de Dados Genéticas , Análise de Sequência de DNA/métodos , Genômica , Humanos , Fatores de Tempo
3.
BMC Bioinformatics ; 16 Suppl 18: S5, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26678411

RESUMO

BACKGROUND: The potential utility of the Burrows-Wheeler transform (BWT) of a large amount of short-read data ("reads") has not been fully studied. The BWT basically serves as a lossless dictionary of reads, unlike the heuristic and lossy reads-to-genome mapping results conventionally obtained in the first step of sequence analysis. Thus, it is naturally expected to lead to development of sensitive methods for analysis of short-read data. Recently, one of the most active areas of research in sequence analysis is sensitive detection of rare genomic rearrangements from whole-genome sequencing (WGS) data of heterogeneous cancer samples. The application the BWT of reads to the analysis of genomic rearrangements is addressed in this study. RESULTS: A new method for sensitive detection of genomic rearrangements by using the BWT of reads in the following three steps is proposed: first, breakpoint regions, which contain breakpoints and are joined together by rearrangement, are predicted from the distribution of so-called discordant pairs by using a kind of the conjugate gradient method; second, reads partially matching the breakpoint regions are collected from the BWT of reads; and third, breakpoints are detected as branching points among the collected reads, and their precise positions are determined. The method was experimentally implemented, and its performance (i.e., sensitivity and specificity) was evaluated by using simulated data with known artificial rearrangements. It was applied to publicly available real biological WGS data of cancer patients, and the detection results were compared with published results. CONCLUSIONS: Serving as a lossless dictionary of reads, the BWT of short reads enables sensitive analysis of genomic rearrangements in heterogeneous cancer-genome samples when used in conjunction with breakpoint-region predictions based on a conjugate gradient method.


Assuntos
Algoritmos , Genômica , Bases de Dados Genéticas , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Software
4.
Brain Behav Immun ; 49: 148-55, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25986216

RESUMO

Etiology of narcolepsy-cataplexy involves multiple genetic and environmental factors. While the human leukocyte antigen (HLA)-DRB1*15:01-DQB1*06:02 haplotype is strongly associated with narcolepsy, it is not sufficient for disease development. To identify additional, non-HLA susceptibility genes, we conducted a genome-wide association study (GWAS) using Japanese samples. An initial sample set comprising 409 cases and 1562 controls was used for the GWAS of 525,196 single nucleotide polymorphisms (SNPs) located outside the HLA region. An independent sample set comprising 240 cases and 869 controls was then genotyped at 37 SNPs identified in the GWAS. We found that narcolepsy was associated with a SNP in the promoter region of chemokine (C-C motif) receptor 1 (CCR1) (rs3181077, P=1.6×10(-5), odds ratio [OR]=1.86). This rs3181077 association was replicated with the independent sample set (P=0.032, OR=1.36). We measured mRNA levels of candidate genes in peripheral blood samples of 38 cases and 37 controls. CCR1 and CCR3 mRNA levels were significantly lower in patients than in healthy controls, and CCR1 mRNA levels were associated with rs3181077 genotypes. In vitro chemotaxis assays were also performed to measure monocyte migration. We observed that monocytes from carriers of the rs3181077 risk allele had lower migration indices with a CCR1 ligand. CCR1 and CCR3 are newly discovered susceptibility genes for narcolepsy. These results highlight the potential role of CCR genes in narcolepsy and support the hypothesis that patients with narcolepsy have impaired immune function.


Assuntos
Narcolepsia/genética , Polimorfismo de Nucleotídeo Único , Receptores CCR1/genética , Receptores CCR3/genética , Povo Asiático , Estudo de Associação Genômica Ampla , Humanos , Japão
6.
Bioinformatics ; 31(10): 1577-83, 2015 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-25609790

RESUMO

MOTIVATION: Sequence-variation analysis is conventionally performed on mapping results that are highly redundant and occasionally contain undesirable heuristic biases. A straightforward approach to single-nucleotide polymorphism (SNP) analysis, using the Burrows-Wheeler transform (BWT) of short-read data, is proposed. RESULTS: The BWT makes it possible to simultaneously process collections of read fragments of the same sequences; accordingly, SNPs were found from the BWT much faster than from the mapping results. It took only a few minutes to find SNPs from the BWT (with a supplementary data, fragment depth of coverage [FDC]) using a desktop workstation in the case of human exome or transcriptome sequencing data and 20 min using a dual-CPU server in the case of human genome sequencing data. The SNPs found with the proposed method almost agreed with those found by a time-consuming state-of-the-art tool, except for the cases in which the use of fragments of reads led to sensitivity loss or sequencing depth was not sufficient. These exceptions were predictable in advance on the basis of minimum length for uniqueness (MLU) and FDC defined on the reference genome. Moreover, BWT and FDC were computed in less time than it took to get the mapping results, provided that the data were large enough.


Assuntos
Algoritmos , Exoma/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Humanos
7.
PLoS One ; 9(11): e111715, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25364816

RESUMO

Elucidation of the genetic susceptibility factors for diabetic retinopathy (DR) is important to gain insight into the pathogenesis of DR, and may help to define genetic risk factors for this condition. In the present study, we conducted a three-stage genome-wide association study (GWAS) to identify DR susceptibility loci in Japanese patients, which comprised a total of 837 type 2 diabetes patients with DR (cases) and 1,149 without DR (controls). From the stage 1 genome-wide scan of 446 subjects (205 cases and 241 controls) on 614,216 SNPs, 249 SNPs were selected for the stage 2 replication in 623 subjects (335 cases and 288 controls). Eight SNPs were further followed up in a stage 3 study of 297 cases and 620 controls. The top signal from the present association analysis was rs9362054 in an intron of RP1-90L14.1 showing borderline genome-wide significance (Pmet = 1.4×10(-7), meta-analysis of stage 1 and stage 2, allele model). RP1-90L14.1 is a long intergenic non-coding RNA (lincRNA) adjacent to KIAA1009/QN1/CEP162 gene; CEP162 plays a critical role in ciliary transition zone formation before ciliogenesis. The present study raises the possibility that the dysregulation of ciliary-associated genes plays a role in susceptibility to DR.


Assuntos
Retinopatia Diabética/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , RNA Longo não Codificante/genética , Adulto , Idoso , Cílios/genética , Feminino , Humanos , Japão , Masculino , Pessoa de Meia-Idade
8.
J Hum Genet ; 59(5): 235-40, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24694762

RESUMO

In humans, narcolepsy with cataplexy (narcolepsy) is a sleep disorder that is characterized by sleepiness, cataplexy and rapid eye movement (REM) sleep abnormalities. Narcolepsy is caused by a reduction in the number of neurons that produce hypocretin (orexin) neuropeptide. Both genetic and environmental factors contribute to the development of narcolepsy.Rare and large copy number variations (CNVs) reportedly play a role in the etiology of a number of neuropsychiatric disorders. Narcolepsy is considered a neurological disorder; therefore, we sought to investigate any possible association between rare and large CNVs and human narcolepsy. We used DNA microarray data and a CNV detection software application, PennCNV-Affy, to detect CNVs in 426 Japanese narcoleptic patients and 562 healthy individuals. Overall, we found a significant enrichment of rare and large CNVs (frequency ≤1%, size ≥100 kb) in the patients (case-control ratio of CNV count=1.54, P=5.00 × 10(-4)). Next, we extended a region-based association analysis by including CNVs with its size ≥30 kb. Rare and large CNVs in PARK2 region showed a significant association with narcolepsy. Four patients were assessed to carry duplications of the gene region, whereas no controls carried the duplication, which was further confirmed by quantitative PCR assay. This duplication was also found in 2 essential hypersomnia (EHS) patients out of 171 patients. Furthermore, a pathway analysis revealed enrichments of gene disruptions by rare and large CNVs in immune response, acetyltransferase activity, cell cycle regulation and regulation of cell development. This study constitutes the first report on the risk association between multiple rare and large CNVs and the pathogenesis of narcolepsy. In the future, replication studies are needed to confirm the associations.


Assuntos
Povo Asiático/genética , Variações do Número de Cópias de DNA , Estudo de Associação Genômica Ampla , Narcolepsia/genética , Estudos de Casos e Controles , Redes Reguladoras de Genes , Humanos , Japão , Narcolepsia/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Transdução de Sinais , Ubiquitina-Proteína Ligases/genética
9.
PLoS One ; 8(4): e58618, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23565137

RESUMO

To discover susceptibility genes of late-onset Alzheimer's disease (LOAD), we conducted a 3-stage genome-wide association study (GWAS) using three populations: Japanese from the Japanese Genetic Consortium for Alzheimer Disease (JGSCAD), Koreans, and Caucasians from the Alzheimer Disease Genetic Consortium (ADGC). In Stage 1, we evaluated data for 5,877,918 genotyped and imputed SNPs in Japanese cases (n = 1,008) and controls (n = 1,016). Genome-wide significance was observed with 12 SNPs in the APOE region. Seven SNPs from other distinct regions with p-values <2×10(-5) were genotyped in a second Japanese sample (885 cases, 985 controls), and evidence of association was confirmed for one SORL1 SNP (rs3781834, P = 7.33×10(-7) in the combined sample). Subsequent analysis combining results for several SORL1 SNPs in the Japanese, Korean (339 cases, 1,129 controls) and Caucasians (11,840 AD cases, 10,931 controls) revealed genome wide significance with rs11218343 (P = 1.77×10(-9)) and rs3781834 (P = 1.04×10(-8)). SNPs in previously established AD loci in Caucasians showed strong evidence of association in Japanese including rs3851179 near PICALM (P = 1.71×10(-5)) and rs744373 near BIN1 (P = 1.39×10(-4)). The associated allele for each of these SNPs was the same as in Caucasians. These data demonstrate for the first time genome-wide significance of LOAD with SORL1 and confirm the role of other known loci for LOAD in Japanese. Our study highlights the importance of examining associations in multiple ethnic populations.


Assuntos
Doença de Alzheimer/genética , Povo Asiático/genética , Predisposição Genética para Doença , Proteínas Relacionadas a Receptor de LDL/genética , Proteínas de Membrana Transportadoras/genética , População Branca/genética , Alelos , Mapeamento Cromossômico , Cromossomos Humanos Par 11 , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Japão , Razão de Chances , Polimorfismo de Nucleotídeo Único , República da Coreia
10.
J Bioinform Comput Biol ; 10(4): 1250002, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22809415

RESUMO

Myers' elegant and powerful bit-parallel dynamic programming algorithm for approximate string matching has a restriction that the query length should be within the word size of the computer, typically 64. We propose a modification of Myers' algorithm, in which the modification has a restriction not on the query length but on the maximum number of mismatches (substitutions, insertions, or deletions), which should be less than half of the word size. The time complexity is O(m log |Σ|), where m is the query length and |Σ| is the size of the alphabet Σ. Thus, it is particularly suited for sequences on a small alphabet such as DNA sequences. In particular, it is useful in quickly extending a large number of seed alignments against a reference genome for high-throughput short-read data produced by next-generation DNA sequencers.


Assuntos
Algoritmos , Sequência de Bases , DNA/química , Biologia Computacional , Genoma , Alinhamento de Sequência , Análise de Sequência de DNA
11.
PLoS One ; 7(6): e39175, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22737229

RESUMO

Hepatitis B virus (HBV) infection can lead to serious liver diseases, including liver cirrhosis (LC) and hepatocellular carcinoma (HCC); however, about 85-90% of infected individuals become inactive carriers with sustained biochemical remission and very low risk of LC or HCC. To identify host genetic factors contributing to HBV clearance, we conducted genome-wide association studies (GWAS) and replication analysis using samples from HBV carriers and spontaneously HBV-resolved Japanese and Korean individuals. Association analysis in the Japanese and Korean data identified the HLA-DPA1 and HLA-DPB1 genes with P(meta) = 1.89×10⁻¹² for rs3077 and P(meta) = 9.69×10⁻¹° for rs9277542. We also found that the HLA-DPA1 and HLA-DPB1 genes were significantly associated with protective effects against chronic hepatitis B (CHB) in Japanese, Korean and other Asian populations, including Chinese and Thai individuals (P(meta) = 4.40×10⁻¹9 for rs3077 and P(meta) = 1.28×10⁻¹5 for rs9277542). These results suggest that the associations between the HLA-DP locus and the protective effects against persistent HBV infection and with clearance of HBV were replicated widely in East Asian populations; however, there are no reports of GWAS in Caucasian or African populations. Based on the GWAS in this study, there were no significant SNPs associated with HCC development. To clarify the pathogenesis of CHB and the mechanisms of HBV clearance, further studies are necessary, including functional analyses of the HLA-DP molecule.


Assuntos
Estudo de Associação Genômica Ampla , Antígenos HLA-DP/imunologia , Vírus da Hepatite B/genética , Hepatite B Crônica/prevenção & controle , Hepatite B Crônica/virologia , Feminino , Genótipo , Antígenos HLA-DP/genética , Cadeias alfa de HLA-DP/genética , Cadeias beta de HLA-DP/genética , Haplótipos , Hepatite B/genética , Hepatite B Crônica/imunologia , Humanos , Japão , Coreia (Geográfico) , Desequilíbrio de Ligação , Masculino , Razão de Chances , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Prevalência , Análise de Componente Principal , Indução de Remissão
12.
BMC Bioinformatics ; 12: 469, 2011 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-22151604

RESUMO

BACKGROUND: Multiple genetic factors and their interactive effects are speculated to contribute to complex diseases. Detecting such genetic interactive effects, i.e., epistatic interactions, however, remains a significant challenge in large-scale association studies. RESULTS: We have developed a new method, named SNPInterForest, for identifying epistatic interactions by extending an ensemble learning technique called random forest. Random forest is a predictive method that has been proposed for use in discovering single-nucleotide polymorphisms (SNPs), which are most predictive of the disease status in association studies. However, it is less sensitive to SNPs with little marginal effect. Furthermore, it does not natively exhibit information on interaction patterns of susceptibility SNPs. We extended the random forest framework to overcome the above limitations by means of (i) modifying the construction of the random forest and (ii) implementing a procedure for extracting interaction patterns from the constructed random forest. The performance of the proposed method was evaluated by simulated data under a wide spectrum of disease models. SNPInterForest performed very well in successfully identifying pure epistatic interactions with high precision and was still more than capable of concurrently identifying multiple interactions under the existence of genetic heterogeneity. It was also performed on real GWAS data of rheumatoid arthritis from the Wellcome Trust Case Control Consortium (WTCCC), and novel potential interactions were reported. CONCLUSIONS: SNPInterForest, offering an efficient means to detect epistatic interactions without statistical analyses, is promising for practical use as a way to reveal the epistatic interactions involved in common complex diseases.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Artrite Reumatoide/genética , Estudos de Casos e Controles , Simulação por Computador , Predisposição Genética para Doença , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único
13.
J Hum Genet ; 56(12): 852-6, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22011818

RESUMO

Family and twin studies have indicated that genetic factors have an important role in panic disorder (PD), whereas its pathogenesis has remained elusive. We conducted a genome-wide copy number variation (CNV) association study to elucidate the involvement of structural variants in the etiology of PD. The participants were 2055 genetically unrelated Japanese people (535 PD cases and 1520 controls). CNVs were detected using Genome-Wide Human SNP array 6.0, determined by Birdsuite and confirmed by PennCNV. They were classified as rare CNVs (found in <1% of the total sample) or common CNVs (found in ≥5%). PLINK was used to perform global burden analysis for rare CNVs and association analysis for common CNVs. The sample yielded 2039 rare CNVs and 79 common CNVs. Significant increases in the rare CNV burden in PD cases were not found. Common duplications in 16p11.2 showed Bonferroni-corrected P-values <0.05. Individuals with PD did not exhibit an increased genome-wide rare CNV burden. Common duplications were associated with PD and found in the pericentromeric region of 16p11.2, which had been reported to be rich in low copy repeats and to harbor developmental disorders, neuropsychiatric disorders and dysmorphic features.


Assuntos
Variações do Número de Cópias de DNA , Transtorno de Pânico/genética , Adulto , Povo Asiático/genética , Estudos de Casos e Controles , Cromossomos Humanos Par 16 , Feminino , Estudo de Associação Genômica Ampla , Humanos , Japão , Masculino , Pessoa de Meia-Idade
14.
Hum Mol Genet ; 20(17): 3507-16, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21659334

RESUMO

Hematologic abnormalities during current therapy with pegylated interferon and ribavirin (PEG-IFN/RBV) for chronic hepatitis C (CHC) often necessitate dose reduction and premature withdrawal from therapy. The aim of this study was to identify host factors associated with IFN-induced thrombocytopenia by genome-wide association study (GWAS). In the GWAS stage using 900K single-nucleotide polymorphism (SNP) microarrays, 303 Japanese CHC patients treated with PEG-IFN/RBV therapy were genotyped. One SNP (rs11697186) located on DDRGK1 gene on chromosome 20 showed strong associations in the minor-allele-dominant model with the decrease of platelet counts in response to PEG-IFN/RBV therapy [P = 8.17 × 10(-9); odds ratio (OR) = 4.6]. These associations were replicated in another sample set (n = 391) and the combined P-values reached 5.29 × 10(-17) (OR = 4.5). Fine mapping with 22 SNPs around DDRGK1 and ITPA genes showed that rs11697186 at the GWAS stage had a strong linkage disequilibrium with rs1127354, known as a functional variant in the ITPA gene. The ITPA-AA/CA genotype was independently associated with a higher degree of reduction in platelet counts at week 4 (P < 0.0001), as well as protection against the reduction in hemoglobin, whereas the CC genotype had significantly less reduction in the mean platelet counts compared with the AA/CA genotype (P < 0.0001 for weeks 2, 4, 8, 12), due to a reactive increase of the platelet count through weeks 1-4. Our present results may provide a valuable pharmacogenetic diagnostic tool for tailoring PEG-IFN/RBV dosing to minimize drug-induced adverse events.


Assuntos
Antivirais/uso terapêutico , Estudo de Associação Genômica Ampla/métodos , Hepatite C Crônica/tratamento farmacológico , Interferons/uso terapêutico , Pirofosfatases/genética , Ribavirina/uso terapêutico , Trombocitopenia/genética , Antivirais/efeitos adversos , Genótipo , Humanos , Interferons/efeitos adversos , Desequilíbrio de Ligação/genética , Polimorfismo de Nucleotídeo Único/genética , Ribavirina/efeitos adversos , Trombocitopenia/induzido quimicamente
15.
BMC Genet ; 12: 29, 2011 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-21385384

RESUMO

BACKGROUND: Array-based detection of copy number variations (CNVs) is widely used for identifying disease-specific genetic variations. However, the accuracy of CNV detection is not sufficient and results differ depending on the detection programs used and their parameters. In this study, we evaluated five widely used CNV detection programs, Birdsuite (mainly consisting of the Birdseye and Canary modules), Birdseye (part of Birdsuite), PennCNV, CGHseg, and DNAcopy from the viewpoint of performance on the Affymetrix platform using HapMap data and other experimental data. Furthermore, we identified CNVs of 180 healthy Japanese individuals using parameters that showed the best performance in the HapMap data and investigated their characteristics. RESULTS: The results indicate that Hidden Markov model-based programs PennCNV and Birdseye (part of Birdsuite), or Birdsuite show better detection performance than other programs when the high reproducibility rates of the same individuals and the low Mendelian inconsistencies are considered. Furthermore, when rates of overlap with other experimental results were taken into account, Birdsuite showed the best performance from the view point of sensitivity but was expected to include many false negatives and some false positives. The results of 180 healthy Japanese demonstrate that the ratio containing repeat sequences, not only segmental repeats but also long interspersed nuclear element (LINE) sequences both in the start and end regions of the CNVs, is higher in CNVs that are commonly detected among multiple individuals than that in randomly selected regions, and the conservation score based on primates is lower in these regions than in randomly selected regions. Similar tendencies were observed in HapMap data and other experimental data. CONCLUSIONS: Our results suggest that not only segmental repeats but also interspersed repeats, especially LINE sequences, are deeply involved in CNVs, particularly in common CNV formations.The detected CNVs are stored in the CNV repository database newly constructed by the "Japanese integrated database project" for sharing data among researchers. http://gwas.lifesciencedb.jp/cgi-bin/cnvdb/cnv_top.cgi.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Bases de Dados Genéticas , Modelos Genéticos , Povo Asiático/genética , Humanos , Cadeias de Markov , Análise de Sequência com Séries de Oligonucleotídeos
16.
Hum Mutat ; 31(9): 1003-10, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20556799

RESUMO

An amyotrophic lateral sclerosis (ALS) mutation database has been constructed as a publicly accessible online resource for recording the nucleotide and amino acid variants identified in genes associated with ALS, along with corresponding clinical conditions. The database currently consists of more than 600 entries, including about 180 unique variants found in 25 disease-causative or disease-related genes. In addition to published data collected from literature, novel variants identified by microarray resequencing in our laboratory are incorporated into the database. Every reported gene has a respective page that provides information on its variation positions with various statistics, clinical characteristics, and primary references, as well as gene-sequence and protein-structure information that will assist in assessing variation significance. Users can access a homology search function to find variations in arbitrary sequences of interest and to check if they have already been described in the database. This database is expected to fulfill an essential need in terms of integrating comprehensive information on genetic and clinical data related to ALS, which will subsequently deepen our understanding of the possible mechanisms of the disease, as well as help with the clinical practice and treatment of ALS. The database is accessible at: https://reseq.lifesciencedb.jp/resequence/SearchDisease.do?targetId=1. Data submission is open to all researchers and is highly encouraged.


Assuntos
Esclerose Lateral Amiotrófica/genética , Bases de Dados Genéticas , Mutação/genética , Sequência de Bases , Humanos
17.
Artif Intell Med ; 49(3): 135-43, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20427165

RESUMO

OBJECTIVE: As more full-text biomedical papers are becoming available in digitized form online, there is a need for tools to mine information from all parts of such papers. Because the figures and legends/captions in biomedical papers provide important information about research outcomes, mining techniques targeting them have attracted a great deal of attention. In this study, we focused on pathway figures that illustrate signaling or metabolic pathways, because many of these are important in understanding disease mechanism(s). We developed a figure classification system based on textual information contained in biomedical papers to provide an automated acquisition system for such pathway figures. MATERIALS AND METHODS: We used full-text journal articles available on PubMed Central as our data set. We used several supervised machine learning methods, such as decision tree and a support vector machine, to classify figures in the data set. We compared the classification performance among the cases using only figure legends, using only sentences referring to the figure in the main text of the article, and combining figure legends with sentences referring to the figure in the main text of the article. RESULTS: Compared with previous related work, a sufficiently high performance was achieved with the figure legends alone. The performance with the sentences referring to the figure in the main text was actually lower than that with the figure legends alone, indicating that focusing on the main text alone is inadequate. The combination of legend and main text clearly had an effect, but including the prior and following sentences in addition to the sentence referring to the figure dramatically improved the performance. CONCLUSIONS: We developed an automatic pathway figure classification system based on both figure legends and the main text that has quite a high degree of accuracy. To our knowledge, this is the first attempt to address a figure classification task using legends and the main text, and it may provide a first stage for achieving efficient figure mining.


Assuntos
Doença , Ilustração Médica , Inteligência Artificial , Humanos , Publicações Periódicas como Assunto
18.
J Comput Biol ; 16(11): 1601-13, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19772398

RESUMO

Abstract We have developed efficient in-practice algorithms for computing rank and select functions on a binary string, based on a novel data structure, a hierarchical binary string with hierarchical accumulatives. It efficiently stores decomposed information on partial summations over various scales of subregions of a given binary string, so that the required space overhead ratio is only about 3.5% irrespective of the string length. Values of rank and select functions are computed hierarchically in [(log(2)n)/8] iterations, where n is the string length. For example, for an unbiased random binary string of 64 G bits, each value of these functions can be computed in about a microsecond, on average, on a single 3.0-GHz CPU using 8+ GB of memory. We also present their applications to genome mapping problems for large-scale short-read DNA sequence data, especially produced by ultra-high-throughput new-generation DNA sequencers. The algorithms are applied to the binarization of the Burrows-Wheeler transform of the human genome DNA sequence. For the sake of high-speed performance, we adopted a somewhat stringent mapping condition that allows at most a single-base mismatch (either a substitution, insertion, or deletion of a single base) per query sequence. An experimentally implemented program mapped several thousands of sequences per second on a single 3.0-GHz CPU, several times faster than ELAND, a widely used mapping program with the Illumina-Solexa 1G analyser.


Assuntos
Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Genoma Humano/genética , Algoritmos , Sequência de Bases , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Tempo
19.
Nat Genet ; 41(10): 1105-9, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19749757

RESUMO

The recommended treatment for patients with chronic hepatitis C, pegylated interferon-alpha (PEG-IFN-alpha) plus ribavirin (RBV), does not provide sustained virologic response (SVR) in all patients. We report a genome-wide association study (GWAS) to null virological response (NVR) in the treatment of patients with hepatitis C virus (HCV) genotype 1 within a Japanese population. We found two SNPs near the gene IL28B on chromosome 19 to be strongly associated with NVR (rs12980275, P = 1.93 x 10(-13), and rs8099917, 3.11 x 10(-15)). We replicated these associations in an independent cohort (combined P values, 2.84 x 10(-27) (OR = 17.7; 95% CI = 10.0-31.3) and 2.68 x 10(-32) (OR = 27.1; 95% CI = 14.6-50.3), respectively). Compared to NVR, these SNPs were also associated with SVR (rs12980275, P = 3.99 x 10(-24), and rs8099917, P = 1.11 x 10(-27)). In further fine mapping of the region, seven SNPs (rs8105790, rs11881222, rs8103142, rs28416813, rs4803219, rs8099917 and rs7248668) located in the IL28B region showed the most significant associations (P = 5.52 x 10(-28)-2.68 x 10(-32); OR = 22.3-27.1). Real-time quantitative PCR assays in peripheral blood mononuclear cells showed lower IL28B expression levels in individuals carrying the minor alleles (P = 0.015).


Assuntos
Antivirais/uso terapêutico , Estudo de Associação Genômica Ampla , Hepatite C Crônica/tratamento farmacológico , Hepatite C Crônica/genética , Interferon-alfa/uso terapêutico , Interleucinas/genética , Polimorfismo de Nucleotídeo Único , Ribavirina/uso terapêutico , Alelos , Povo Asiático/genética , Cromossomos Humanos Par 19 , Combinação de Medicamentos , Feminino , Genoma Humano , Haplótipos , Hepatite C Crônica/virologia , Humanos , Interferons , Masculino , Resultado do Tratamento
20.
J Hum Genet ; 54(9): 543-6, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19629137

RESUMO

The establishment of high-throughput single-nucleotide polymorphism (SNP)-typing technologies has enabled astonishing progress to be made in genome-wide association studies (GWAS), and various novel genetic factors associated with complex diseases have been discovered. Our organization has created a public repository database (DB) to achieve a continuous and intensive management of GWAS data and to facilitate data sharing among researchers. In the GWAS DB, information on study design, quality control protocols, allele frequencies, genotype frequencies and statistical genetic analysis results are stored as publicly available data and can be accessed freely, whereas individual genotyping data and raw data are stored as restricted data and can only be accessed with authorization. All data are presented by a graphic viewer, which is designed to be user friendly for researchers who are not familiar with GWAS to accelerate disease-related studies. Furthermore, the DB allows users to compare various study results obtained by different institutions and on different platforms. The same data are also managed as a distributed annotation system to call up useful data from other DBs and to superimpose them on the GWAS data for help in interpretation. The DB is accessible at https://gwas.lifesciencedb.jp/.


Assuntos
Povo Asiático/genética , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único/genética , Estudos de Casos e Controles , Biologia Computacional , Frequência do Gene , Genoma Humano , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...