Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
J Mov Disord ; 17(2): 171-180, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38346940

ABSTRACT

OBJECTIVE: The Montreal Cognitive Assessment (MoCA) is recommended for general cognitive evaluation in Parkinson's disease (PD) patients. However, age- and education-adjusted cutoffs specifically for PD have not been developed or systematically validated across PD cohorts with diverse education levels. METHODS: In this retrospective analysis, we utilized data from 1,293 Korean patients with PD whose cognitive diagnoses were determined through comprehensive neuropsychological assessments. Age- and education-adjusted cutoffs were formulated based on 1,202 patients with PD. To identify the optimal machine learning model, clinical parameters and MoCA domain scores from 416 patients with PD were used. Comparative analyses between machine learning. METHODS: and different cutoff criteria were conducted on an additional 91 consecutive patients with PD. RESULTS: The cutoffs for cognitive impairment decrease with increasing age within the same education level. Similarly, lower education levels within the same age group correspond to lower cutoffs. For individuals aged 60-80 years, cutoffs were set as follows: 25 or 24 years for those with more than 12 years of education, 23 or 22 years for 10-12 years, and 21 or 20 years for 7-9 years. Comparisons between age- and education-adjusted cutoffs and the machine learning method showed comparable accuracies. The cutoff method resulted in a higher sensitivity (0.8627), whereas machine learning yielded higher specificity (0.8250). CONCLUSION: Both the age- and education-adjusted cutoff. METHODS: and machine learning. METHODS: demonstrated high effectiveness in detecting cognitive impairment in PD patients. This study highlights the necessity of tailored cutoffs and suggests the potential of machine learning to improve cognitive assessment in PD patients.

2.
Genes Genomics ; 45(8): 1025-1036, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37300788

ABSTRACT

BACKGROUND: The identification of gene-phenotype relationships is important in medical genetics as it serves as a basis for precision medicine. However, most of the gene-phenotype relationship data are buried in the biomedical literature in textual form. OBJECTIVE: We propose RelCurator, a curation system that extracts sentences including both gene and phenotype entities related to specific disease categories from PubMed articles, provides rich additional information such as entity taggings, and predictions of gene-phenotype relationships. METHODS: We targeted neurodegenerative disorders and developed a deep learning model using Bidirectional Gated Recurrent Unit (BiGRU) networks and BioWordVec word embeddings for predicting gene-phenotype relationships from biomedical texts. The prediction model is trained with more than 130,000 labeled PubMed sentences including gene and phenotype entities, which are related to or unrelated to neurodegenerative disorders. RESULTS: We compared the performance of our deep learning model with those of Bidirectional Encoder Representations from Transformers (BERT), Support Vector Machine (SVM), and simple Recurrent Neural Network (simple RNN) models. Our model performed better with an F1-score of 0.96. Furthermore, the evaluation done using a few curation cases in the real scenario showed the effectiveness of our work. Therefore, we conclude that RelCurator can identify not only new causative genes, but also new genes associated with neurodegenerative disorders' phenotype. CONCLUSION: RelCurator is a user-friendly method for accessing deep learning-based supporting information and a concise web interface to assist curators while browsing the PubMed articles. Our curation process represents an important and broadly applicable improvement to the state of the art for the curation of gene-phenotype relationships.


Subject(s)
Data Mining , Neurodegenerative Diseases , Humans , Data Mining/methods , Neural Networks, Computer , Neurodegenerative Diseases/genetics
3.
Yonsei Med J ; 63(8): 724-734, 2022 Aug.
Article in English | MEDLINE | ID: mdl-35914754

ABSTRACT

PURPOSE: Hereditary parkinsonism genes consist of causative genes of familial Parkinson's disease (PD) with a locus symbol prefix (PARK genes) and hereditary atypical parkinsonian disorders that present atypical features and limited responsiveness to levodopa (non-PARK genes). Although studies have shown that hereditary parkinsonism genes are related to idiopathic PD at the phenotypic, gene expression, and genomic levels, no study has systematically investigated connectivity among the proteins encoded by these genes at the protein-protein interaction (PPI) level. MATERIALS AND METHODS: Topological measurements and physical interaction enrichment were performed to assess PPI networks constructed using some or all the proteins encoded by hereditary parkinsonism genes (n=96), which were curated using the Online Mendelian Inheritance in Man database and literature. RESULTS: Non-PARK and PARK genes were involved in common functional modules related to autophagy, mitochondrial or lysosomal organization, catecholamine metabolic process, chemical synapse transmission, response to oxidative stress, neuronal apoptosis, regulation of cellular protein catabolic process, and vesicle-mediated transport in synapse. The hereditary parkinsonism proteins formed a single large network comprising 51 nodes, 83 edges, and three PPI pairs. The probability of degree distribution followed a power-law scaling behavior, with a degree exponent of 1.24 and a correlation coefficient of 0.92. LRRK2 was identified as a hub gene with the highest degree of betweenness centrality; its physical interaction enrichment score was 1.28, which was highly significant. CONCLUSION: Both PARK and non-PARK genes show high connectivity at the PPI and biological functional levels.


Subject(s)
Parkinson Disease , Parkinsonian Disorders , Humans , Parkinson Disease/genetics , Parkinsonian Disorders/genetics , Phenotype , Protein Interaction Maps/genetics , Proteins
4.
J Mov Disord ; 15(2): 132-139, 2022 May.
Article in English | MEDLINE | ID: mdl-35670022

ABSTRACT

OBJECTIVE: The Montreal Cognitive Assessment (MoCA) is recommended for assessing general cognition in Parkinson's disease (PD). Several cutoffs of MoCA scores for diagnosing PD with cognitive impairment (PD-CI) have been proposed, with varying sensitivity and specificity. This study investigated the utility of machine learning algorithms using MoCA cognitive domain scores for improving diagnostic performance for PD-CI. METHODS: In total, 2,069 MoCA results were obtained from 397 patients with PD enrolled in the Parkinson's Progression Markers Initiative database with a diagnosis of cognitive status based on comprehensive neuropsychological assessments. Using the same number of MoCA results randomly sampled from patients with PD with normal cognition or PD-CI, discriminant validity was compared between machine learning (logistic regression, support vector machine, or random forest) with domain scores and a cutoff method. RESULTS: Based on cognitive status classification using a dataset that permitted sampling of MoCA results from the same individual (n = 221 per group), no difference was observed in accuracy between the cutoff value method (0.74 ± 0.03) and machine learning (0.78 ± 0.03). Using a more stringent dataset that excluded MoCA results (n = 101 per group) from the same patients, the accuracy of the cutoff method (0.66 ± 0.05), but not that of machine learning (0.74 ± 0.07), was significantly reduced. Inclusion of cognitive complaints as an additional variable improved the accuracy of classification using the machine learning method (0.87-0.89). CONCLUSION: Machine learning analysis using MoCA domain scores is a valid method for screening cognitive impairment in PD.

5.
J Pers Med ; 12(6)2022 Jun 12.
Article in English | MEDLINE | ID: mdl-35743744

ABSTRACT

Precision medicine has been revolutionized by the advent of high-throughput next-generation sequencing (NGS) technology and development of various bioinformatic analysis tools for large-scale NGS big data. At the population level, biomedical studies have identified human diseases and phenotype-associated genetic variations using NGS technology, such as whole-genome sequencing, exome sequencing, and gene panel sequencing. Furthermore, patients' genetic variations related to a specific phenotype can also be identified by analyzing their genomic information. These breakthroughs paved the way for the clinical diagnosis and precise treatment of patients' diseases. Although many bioinformatics tools have been developed to analyze the genetic variations from the individual patient's NGS data, it is still challenging to develop user-friendly programs for clinical physicians who do not have bioinformatics programing skills to diagnose a patient's disease using the genomic data. In response to this demand, we developed a Phenotype to Genotype Variation program (PhenGenVar), which is a user-friendly interface for monitoring the variations in a gene of interest for molecular diagnosis. This allows for flexible filtering and browsing of variants of the disease and phenotype-associated genes. To test this program, we analyzed the whole-genome sequencing data of an anonymous person from the 1000 human genome project data. As a result, we were able to identify several genomic variations, including single-nucleotide polymorphism, insertions, and deletions in specific gene regions. Therefore, PhenGenVar can be used to diagnose a patient's disease. PhenGenVar is freely accessible and is available at our website.

6.
Neurobiol Aging ; 100: 118.e5-118.e13, 2021 04.
Article in English | MEDLINE | ID: mdl-33423827

ABSTRACT

Increased burdens of rare coding variants in genes related to lysosomal storage disease or mitochondrial pathways were reported to be associated with idiopathic Parkinson's disease. Under a hypothesis that the burden of damaging rare coding variants is increased in causative genes for hereditary parkinsonism, we analyzed the burdens of rare coding variants with a case-control design. Two cohorts of whole-exome sequencing data and a cohort of genome-wide genotyping data of clinically validated idiopathic Parkinson's disease cases and controls, which were open to the public, were used. The sequence kernel association test-optimal was used to analyze the burden of rare variants in the hereditary parkinsonism gene set, which was constructed from the Online Mendelian Inheritance in Man database through manual curation. The hereditary parkinsonism gene set consisted of 17 genes with a locus symbol prefix for familial Parkinson's disease and 75 hereditary atypical parkinsonism genes. We detected a significant association of enriched burdens of predicted damaging rare coding variants in hereditary parkinsonism genes in all three datasets. Meta-analyses of the rare variant burden test in a subgroup of gene sets revealed an association between burdens of rare damaging variants with PD in a hereditary atypical parkinsonism gene set, but not in a subgroup gene set with a locus symbol prefix for familial Parkinson's disease. Our results highlight the roles of rare damaging variants in causative genes for hereditary atypical parkinsonian disorders. We propose that Mendelian genes associated with hereditary disorders accompanying parkinsonism are involved in Parkinson's disease-related genetic networks.


Subject(s)
Genetic Association Studies/methods , Genetic Variation/genetics , Parkinson Disease/genetics , Aged , Case-Control Studies , Cohort Studies , Databases, Genetic , Datasets as Topic , Female , Genotype , Humans , Lysosomal Storage Diseases/genetics , Male , Middle Aged , Mitochondria/genetics , Mitochondria/metabolism , Parkinson Disease Associated Proteins/genetics , Signal Transduction/genetics , Exome Sequencing
7.
Int J Mol Sci ; 21(18)2020 Sep 04.
Article in English | MEDLINE | ID: mdl-32899599

ABSTRACT

RNA decay is an important regulatory mechanism for gene expression at the posttranscriptional level. Although the main pathways and major enzymes that facilitate this process are well defined, global analysis of RNA turnover remains under-investigated. Recent advances in the application of next-generation sequencing technology enable its use in order to examine various RNA decay patterns at the genome-wide scale. In this study, we investigated human RNA decay patterns using parallel analysis of RNA end-sequencing (PARE-seq) data from XRN1-knockdown HeLa cell lines, followed by a comparison of steady state and degraded mRNA levels from RNA-seq and PARE-seq data, respectively. The results revealed 1103 and 1347 transcripts classified as stable and unstable candidates, respectively. Of the unstable candidates, we found that a subset of the replication-dependent histone transcripts was polyadenylated and rapidly degraded. Additionally, we identified 380 endonucleolytically cleaved candidates by analyzing the most abundant PARE sequence on a transcript. Of these, 41.4% of genes were classified as unstable genes, which implied that their endonucleolytic cleavage might affect their mRNA stability. Furthermore, we identified 1877 decapped candidates, including HSP90B1 and SWI5, having the most abundant PARE sequences at the 5'-end positions of the transcripts. These results provide a useful resource for further analysis of RNA decay patterns in human cells.


Subject(s)
Gene Expression Regulation/genetics , RNA Stability/physiology , Base Sequence/genetics , Databases, Genetic , Genome/genetics , HeLa Cells , High-Throughput Nucleotide Sequencing/methods , Histones/metabolism , Humans , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , Whole Genome Sequencing/methods
8.
Front Neurosci ; 14: 596105, 2020.
Article in English | MEDLINE | ID: mdl-33390883

ABSTRACT

BACKGROUND: Studies regarding differentially expressed genes (DEGs) in Parkinson's disease (PD) have focused on common upstream regulators or dysregulated pathways or ontologies; however, the relationships between DEGs and disease-related or cell type-enriched genes have not been systematically studied. Meta-analysis of DEGs (meta-DEGs) are expected to overcome the limitations, such as replication failure and small sample size of previous studies. PURPOSE: Meta-DEGs were performed to investigate dysregulated genes enriched with neurodegenerative disorder causative or risk genes in a phenotype-specific manner. METHODS: Six microarray datasets from PD patients and controls, for which substantia nigra sample transcriptome data were available, were downloaded from the NINDS data repository. Meta-DEGs were performed using two methods, combining p-values and combing effect size, and common DEGs were used for secondary analyses. Gene sets of cell type-enriched or disease-related genes for PD, Alzheimer's disease (AD), and hereditary progressive ataxia were constructed by curation of public databases and/or published literatures. RESULTS: Our meta-analyses revealed 449 downregulated and 137 upregulated genes. Overrepresentation analyses with cell type-enriched genes were significant in neuron-enriched genes but not in astrocyte- or microglia-enriched genes. Meta-DEGs were significantly enriched in causative genes for hereditary disorders accompanying parkinsonism but not in genes associated with AD or hereditary progressive ataxia. Enrichment of PD-related genes was highly significant in downregulated DEGs but insignificant in upregulated genes. CONCLUSION: Downregulated meta-DEGs were associated with PD-related genes, but not with other neurodegenerative disorder genes. These results highlight disease phenotype-specific changes in dysregulated genes in PD.

9.
Neurology ; 93(7): e665-e674, 2019 08 13.
Article in English | MEDLINE | ID: mdl-31289143

ABSTRACT

OBJECTIVE: To investigate the effect of polygenic load on the progression of striatal dopaminergic dysfunction in patients with Parkinson disease (PD). METHODS: Using data from 335 patients with PD in the Parkinson's Progression Markers Initiative (PPMI) database, we investigated the longitudinal association of PD-associated polygenic load with changes in striatal dopaminergic activity as measured by 123I-N-3-fluoropropyl-2-ß-carboxymethoxy-3ß-(4-iodophenyl) nortropane (123I-FP-CIT) SPECT over 4 years. PD-associated polygenic load was estimated by calculating weighted genetic risk scores (GRS) using 1) all available 27 PD-risk single nucleotide polymorphisms (SNPs) in the PPMI database (GRS1) and 2) 23 SNPs with minor allele frequency >0.05 (GRS2). RESULTS: GRS1 and GRS2 were correlated with younger age at onset in patients with PD (GRS1, Spearman ρ = -0.128, p = 0.019; GRS2, Spearman ρ = -0.109, p = 0.047). Although GRS1 did not show an association with changes in striatal 123I-FP-CIT availability, GRS2 was associated with a slower decline of striatal dopaminergic activity (interactions with disease duration in linear mixed model; caudate nucleus, estimate = 0.399, SE = 0.165, p = 0.028; putamen, estimate = 0.396, SE = 0.137, p = 0.016). CONCLUSIONS: Our results suggest that genetic factors for PD risk may have heterogeneous effects on striatal dopaminergic degeneration, and some factors may be associated with a slower decline of dopaminergic activity. Composition of PD progression-specific GRS may be useful in predicting disease progression in patients.


Subject(s)
Corpus Striatum/metabolism , Parkinson Disease/metabolism , Parkinsonian Disorders/metabolism , Tropanes/metabolism , Adult , Aged , Caudate Nucleus/metabolism , Disease Progression , Dopamine Plasma Membrane Transport Proteins/metabolism , Female , Humans , Male , Middle Aged , Parkinson Disease/complications , Parkinson Disease/genetics , Parkinsonian Disorders/complications , Putamen/metabolism
10.
Biomed Res Int ; 2019: 4767354, 2019.
Article in English | MEDLINE | ID: mdl-31346518

ABSTRACT

Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method.


Subject(s)
Databases, Nucleic Acid , Genome, Helminth , Molecular Sequence Annotation , Toxocara canis/genetics , Whole Genome Sequencing , Animals , Genomics
11.
Biomed Res Int ; 2017: 9631282, 2017.
Article in English | MEDLINE | ID: mdl-28698882

ABSTRACT

Copy number variations (CNVs) are structural variants associated with human diseases. Recent studies verified that disease-related genes are based on the extraction of rare de novo and transmitted CNVs from exome sequencing data. The need for more efficient and accurate methods has increased, which still remains a challenging problem due to coverage biases, as well as the sparse, small-sized, and noncontinuous nature of exome sequencing. In this study, we developed a new CNV detection method, ExCNVSS, based on read coverage depth evaluation and scale-space filtering to resolve these problems. We also developed the method ExCNVSS_noRatio, which is a version of ExCNVSS, for applying to cases with an input of test data only without the need to consider the availability of a matched control. To evaluate the performance of our method, we tested it with 11 different simulated data sets and 10 real HapMap samples' data. The results demonstrated that ExCNVSS outperformed three other state-of-the-art methods and that our method corrected for coverage biases and detected all-sized CNVs even without matched control data.


Subject(s)
DNA Copy Number Variations , Exome , High-Throughput Nucleotide Sequencing , Models, Genetic
12.
Mov Disord ; 32(8): 1211-1220, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28548297

ABSTRACT

BACKGROUND AND OBJECTIVES: Many hereditary movement disorders with complex phenotypes without a locus symbol prefix for familial PD present as parkinsonism; however, the dysregulation of genes associated with these phenotypes in the SNpc of PD patients has not been systematically studied. METHODS: Gene set enrichment analyses were performed using 10 previously published genome-wide expression datasets obtained by laser-captured microdissection of pigmented neurons in the SNpc. A custom-curated gene set for hereditary parkinsonism consisting of causative genes (n = 78) related to disorders with a parkinsonism phenotype, but not necessarily idiopathic or monogenic PD, was constructed from the Online Mendelian Inheritance in Man database. RESULTS: In 9 of the 10 gene expression data sets, gene set enrichment analysis showed that the disease-causing genes for hereditary parkinsonism were downregulated in the SNpc in PD patients compared to controls (nominal P values <0.05 in five studies). Among the 63 leading edge subset genes representing downregulated genes in PD, 79.4% were genes without a locus symbol prefix for familial PD. A meta-gene set enrichment analysis performed with a random-effect model showed an association between the gene set for hereditary parkinsonism and PD with a negative normalized enrichment score value (-1.40; 95% CI: -1.52∼-1.28; P < 6.2E-05). CONCLUSION: Disease-causing genes with a parkinsonism phenotype are downregulated in the SNpc in PD. Our study highlights the importance of genes associated with hereditary movement disorders with parkinsonism in understanding the pathogenesis of PD. © 2017 International Parkinson and Movement Disorder Society.


Subject(s)
Gene Expression Regulation/genetics , Genetic Predisposition to Disease , Mutation/genetics , Parkinson Disease/genetics , Parkinsonian Disorders/genetics , Substantia Nigra/physiopathology , Databases as Topic , Gene Ontology , Genetic Association Studies/methods , Genetic Association Studies/statistics & numerical data , Humans , Parkinson Disease/pathology , Parkinsonian Disorders/pathology , Phenotype , Substantia Nigra/pathology
13.
Korean J Parasitol ; 54(6): 751-758, 2016 Dec.
Article in English | MEDLINE | ID: mdl-28095660

ABSTRACT

This study aimed at constructing a draft genome of the adult female worm Toxocara canis using next-generation sequencing (NGS) and de novo assembly, as well as to find new genes after annotation using functional genomics tools. Using an NGS machine, we produced DNA read data of T. canis. The de novo assembly of the read data was performed using SOAPdenovo. RNA read data were assembled using Trinity. Structural annotation, homology search, functional annotation, classification of protein domains, and KEGG pathway analysis were carried out. Besides them, recently developed tools such as MAKER, PASA, Evidence Modeler, and Blast2GO were used. The scaffold DNA was obtained, the N50 was 108,950 bp, and the overall length was 341,776,187 bp. The N50 of the transcriptome was 940 bp, and its length was 53,046,952 bp. The GC content of the entire genome was 39.3%. The total number of genes was 20,178, and the total number of protein sequences was 22,358. Of the 22,358 protein sequences, 4,992 were newly observed in T. canis. Following proteins previously unknown were found: E3 ubiquitin-protein ligase cbl-b and antigen T-cell receptor, zeta chain for T-cell and B-cell regulation; endoprotease bli-4 for cuticle metabolism; mucin 12Ea and polymorphic mucin variant C6/1/40r2.1 for mucin production; tropomodulin-family protein and ryanodine receptor calcium release channels for muscle movement. We were able to find new hypothetical polypeptides sequences unique to T. canis, and the findings of this study are capable of serving as a basis for extending our biological understanding of T. canis.


Subject(s)
Genome, Helminth , Toxocara canis/genetics , Animals , Base Composition , Computational Biology , DNA, Helminth/chemistry , DNA, Helminth/genetics , Female , Genes, Helminth , Helminth Proteins/genetics , High-Throughput Nucleotide Sequencing , Molecular Sequence Annotation , Sequence Analysis, DNA , Toxocara canis/isolation & purification
14.
Int J Data Min Bioinform ; 9(3): 254-76, 2014.
Article in English | MEDLINE | ID: mdl-25163168

ABSTRACT

This study proposes a novel copy number variation (CNV) detection method, CNV_shape, based on variations in the shape of the read coverage data which are obtained from millions of short reads aligned to a reference sequence. The proposed method carries out two transforms, mean shift transform and mean slope transform, to extract the shape of a CNV more precisely from real human data, which are vulnerable to experimental and biological noises. The mean shift transform is a procedure for gaining a preliminary estimation of the CNVs by statistically evaluating moving averages of given read coverage data. The mean slope transform extracts candidate CNVs by filtering out non-stationary sub-regions from each of the primary CNVs pre-estimated in the mean shift procedure. Each of the candidate CNVs is merged with neighbours depending on the merging score to be finally identified as a putative CNV, where the merging score is estimated by the ratio of the positions with non-zero values of the mean shift transform to the total length of the region including two neighbouring candidate CNVs and the interval between them. The proposed CNV detection method was validated experimentally with simulated data and real human data. The simulated data with coverage in the range of 1x to 10x were generated for various sampling sizes and p-values. Five individual human genomes were used as real human data. The results show that relatively small CNVs (> 1 kbp) can be detected from low coverage (> 1.7x) data. The results also reveal that, in contrast to conventional methods, performance improvement from 8.18 to 87.90% was achieved in CNV_shape. The outcomes suggest that the proposed method is very effective in reducing noises inherent in real data as well as in detecting CNVs of various sizes and types.


Subject(s)
DNA Copy Number Variations , High-Throughput Nucleotide Sequencing/methods , Algorithms , Computational Biology/methods , Computer Simulation , Electronic Data Processing , Genetic Variation , Genome, Human , Humans , Models, Statistical , Reproducibility of Results , Signal Processing, Computer-Assisted
15.
BMC Bioinformatics ; 14: 57, 2013 Feb 18.
Article in English | MEDLINE | ID: mdl-23418726

ABSTRACT

BACKGROUND: As next-generation sequencing technology made rapid and cost-effective sequencing available, the importance of computational approaches in finding and analyzing copy number variations (CNVs) has been amplified. Furthermore, most genome projects need to accurately analyze sequences with fairly low-coverage read data. It is urgently needed to develop a method to detect the exact types and locations of CNVs from low coverage read data. RESULTS: Here, we propose a new CNV detection method, CNV_SS, which uses scale-space filtering. The scale-space filtering is evaluated by applying to the read coverage data the Gaussian convolution for various scales according to a given scaling parameter. Next, by differentiating twice and finding zero-crossing points, inflection points of scale-space filtered read coverage data are calculated per scale. Then, the types and the exact locations of CNVs are obtained by analyzing the finger print map, the contours of zero-crossing points for various scales. CONCLUSIONS: The performance of CNV_SS showed that FNR and FPR stay in the range of 1.27% to 2.43% and 1.14% to 2.44%, respectively, even at a relatively low coverage (0.5x ≤C ≤2x). CNV_SS gave also much more effective results than the conventional methods in the evaluation of FNR, at 3.82% at least and 76.97% at most even when the coverage level of read data is low. CNV_SS source code is freely available from http://dblab.hallym.ac.kr/CNV SS/.


Subject(s)
DNA Copy Number Variations , Sequence Analysis, DNA/methods , Computational Biology/methods , Genome , HapMap Project , Humans
17.
Article in English | MEDLINE | ID: mdl-22255597

ABSTRACT

This study proposes a novel CNV detection algorithm based on scale space filtering. It uses Gaussian filter for the convolution with a scale parameter. The range of the scale parameter is adjusted according to the coverage level of read data. The position of a CNV region is determined through a coarse and a fine searches over the scales. The results showed low dependency of the performance of the proposed method on the coverage level compared to the conventional methods. The results also showed that the proposed method outperforms the conventional methods by 63.29 ~ 73.57 %.


Subject(s)
Algorithms , DNA Copy Number Variations/genetics , DNA Mutational Analysis/methods , Gene Dosage/genetics , Sequence Analysis, DNA/methods , Base Sequence , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL
...