Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
1.
PLoS One ; 18(10): e0292579, 2023.
Article in English | MEDLINE | ID: mdl-37816033

ABSTRACT

Pancreatic islet failure is a key characteristic of type 2 diabetes besides insulin resistance. To get molecular insights into the pathology of islets in type 2 diabetes, we developed a computational approach to integrating expression profiles of Goto-Kakizaki and Wistar rat islets from a designed experiment with those of the human islets from an observational study. A principal gene-eigenvector in the expression profiles characterized by up-regulated angiogenesis and down-regulated oxidative phosphorylation was identified conserved across the two species. In the case of Goto-Kakizaki versus Wistar islets, such alteration in gene expression can be verified directly by the treatment-control tests over time, and corresponds to the alteration of α/ß-cell distribution obtained by quantifying the islet micrographs. Furthermore, the correspondence between the dual sample- and gene-eigenvectors unveils more delicate structures. In the case of rats, the up- and down-trend of insulin mRNA levels before and after week 8 correspond respectively to the top two principal eigenvectors. In the case of human, the top two principal eigenvectors correspond respectively to the late and early stages of diabetes. According to the aggregated expression signature, a large portion of genes involved in the hypoxia-inducible factor signaling pathway, which activates transcription of angiogenesis, were significantly up-regulated. Furthermore, top-ranked anti-angiogenic genes THBS1 and PEDF indicate the existence of a counteractive mechanism that is in line with thickened and fragmented capillaries found in the deteriorated islets. Overall, the integrative analysis unravels the principal transcriptional alterations underlying the islet deterioration of morphology and insulin secretion along type 2 diabetes progression.


Subject(s)
Diabetes Mellitus, Type 2 , Insulin-Secreting Cells , Islets of Langerhans , Rats , Humans , Animals , Diabetes Mellitus, Type 2/pathology , Rats, Wistar , Islets of Langerhans/metabolism , Insulin-Secreting Cells/metabolism , Insulin Secretion , Insulin/genetics , Insulin/metabolism
2.
BMC Infect Dis ; 23(1): 679, 2023 Oct 11.
Article in English | MEDLINE | ID: mdl-37821841

ABSTRACT

BACKGROUND: The emergency of new COVID-19 variants over the past three years posed a serious challenge to the public health. Cities in China implemented mass daily RT-PCR tests by pooling strategies. However, a random delay exists between an infection and its first positive RT-PCR test. It is valuable for disease control to know the delay pattern and daily infection incidences reconstructed from RT-PCR test observations. METHODS: We formulated the convolution model between daily incidences and positive RT-PCR test counts as a linear inverse problem with positivity restrictions. Consequently, the Richard-Lucy deconvolution algorithm was used to reconstruct COVID-19 incidences from daily PCR tests. A real-time deconvolution was further developed based on the same mathematical principle. The method was applied to an Omicron epidemic data set of a bar outbreak in Beijing and another in Wuxi in June 2022. We estimated the delay function by maximizing likelihood via an E-M algorithm. RESULTS: The delay function of the bar-outbreak in 2022 differs from that reported in 2020. Its mode was shortened to 4 days by one day. A 95% confidence interval of the mean delay is [4.43,5.55] as evaluated by bootstrap. In addition, the deconvolved infection incidences successfully detected two associated infection events after the bar was closed. The application of the real-time deconvolution to the Wuxi data identified all explosive incidence increases. The results revealed the progression of the two COVID-19 outbreaks and provided new insights for prevention and control strategies, especially for the role of mass daily RT-PCR testing. CONCLUSIONS: The proposed deconvolution method is generally applicable to other infectious diseases if the delay model can be assumed to be approximately valid. To ensure a fair reconstruction of daily infection incidences, the delay function should be estimated in a similar context in terms of virus variant and test protocol. Both the delay estimate from the E-M algorithm and the incidences resulted from deconvolution are valuable for epidemic prevention and control. The real-time feedback is particularly useful during the epidemic's acute phase because it can help the local disease control authorities modify the control measures more promptly and precisely.


Subject(s)
COVID-19 , Humans , COVID-19/diagnosis , COVID-19/epidemiology , SARS-CoV-2/genetics , Incidence , Reverse Transcriptase Polymerase Chain Reaction , COVID-19 Testing
3.
BMC Bioinformatics ; 24(1): 249, 2023 Jun 13.
Article in English | MEDLINE | ID: mdl-37312038

ABSTRACT

BACKGROUND: Closing gaps in draft genomes leads to more complete and continuous genome assemblies. The ubiquitous genomic repeats are challenges to the existing gap-closing methods, based on either the k-mer representation by the de Bruijn graph or the overlap-layout-consensus paradigm. Besides, chimeric reads will cause erroneous k-mers in the former and false overlaps of reads in the latter. RESULTS: We propose a novel local assembly approach to gap closing, called RegCloser. It represents read coordinates and their overlaps respectively by parameters and observations in a linear regression model. The optimal overlap is searched only in the restricted range consistent with insert sizes. Under this linear regression framework, the local DNA assembly becomes a robust parameter estimation problem. We solved the problem by a customized robust regression procedure that resists the influence of false overlaps by optimizing a convex global Huber loss function. The global optimum is obtained by iteratively solving the sparse system of linear equations. On both simulated and real datasets, RegCloser outperformed other popular methods in accurately resolving the copy number of tandem repeats, and achieved superior completeness and contiguity. Applying RegCloser to a plateau zokor draft genome that had been improved by long reads further increased contig N50 to 3-fold long. We also tested the robust regression approach on layout generation of long reads. CONCLUSIONS: RegCloser is a competitive gap-closing tool. The software is available at https://github.com/csh3/RegCloser . The robust regression approach has a prospect to be incorporated into the layout module of long read assemblers.


Subject(s)
Genomics , Software , Consensus , Linear Models , Tandem Repeat Sequences
4.
Mol Cell Proteomics ; 21(8): 100261, 2022 08.
Article in English | MEDLINE | ID: mdl-35738554

ABSTRACT

Brain development and function are governed by precisely regulated protein expressions in different regions. To date, multiregional brain proteomes have been systematically analyzed only for adult human and mouse brains. To understand the underpinnings of brain development and function, we generated proteomes from six regions of the postnatal brain at three developmental stages of domestic dogs (Canis familiaris), which are special among animals in terms of their remarkable human-like social cognitive abilities. Quantitative analysis of the spatiotemporal proteomes identified region-enriched synapse types at different developmental stages and differential myelination progression in different brain regions. Through integrative analysis of inter-regional expression patterns of orthologous proteins and genome-wide cis-regulatory element frequencies, we found that proteins related with myelination and hippocampus were highly correlated between dog and human but not between mouse and human, although mouse is phylogenetically closer to human. Moreover, the global expression patterns of neurodegenerative disease and autism spectrum disorder-associated proteins in dog brain more resemble human brain than in mouse brain. The high similarity of myelination and hippocampus-related pathways in dog and human at both proteomic and genetic levels may contribute to their shared social cognitive abilities. The inter-regional expression patterns of disease-associated proteins in the brain of different species provide important information to guide mechanistic and translational study using appropriate animal models.


Subject(s)
Autism Spectrum Disorder , Neurodegenerative Diseases , Adult , Animals , Brain , Dogs , Humans , Mice , Proteome , Proteomics
5.
Bioinformatics ; 38(10): 2675-2682, 2022 05 13.
Article in English | MEDLINE | ID: mdl-35561180

ABSTRACT

MOTIVATION: Crucial to the correctness of a genome assembly is the accuracy of the underlying scaffolds that specify the orders and orientations of contigs together with the gap distances between contigs. The current methods construct scaffolds based on the alignments of 'linking' reads against contigs. We found that some 'optimal' alignments are mistaken due to factors such as the contig boundary effect, particularly in the presence of repeats. Occasionally, the incorrect alignments can even overwhelm the correct ones. The detection of the incorrect linking information is challenging in any existing methods. RESULTS: In this study, we present a novel scaffolding method RegScaf. It first examines the distribution of distances between contigs from read alignment by the kernel density. When multiple modes are shown in a density, orientation-supported links are grouped into clusters, each of which defines a linking distance corresponding to a mode. The linear model parameterizes contigs by their positions on the genome; then each linking distance between a pair of contigs is taken as an observation on the difference of their positions. The parameters are estimated by minimizing a global loss function, which is a version of trimmed sum of squares. The least trimmed squares estimate has such a high breakdown value that it can automatically remove the mistaken linking distances. The results on both synthetic and real datasets demonstrate that RegScaf outperforms some popular scaffolders, especially in the accuracy of gap estimates by substantially reducing extremely abnormal errors. Its strength in resolving repeat regions is exemplified by a real case. Its adaptability to large genomes and TGS long reads is validated as well. AVAILABILITY AND IMPLEMENTATION: RegScaf is publicly available at https://github.com/lemontealala/RegScaf.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Contig Mapping/methods , Genome , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods
6.
BMC Bioinformatics ; 22(1): 386, 2021 Jul 28.
Article in English | MEDLINE | ID: mdl-34320923

ABSTRACT

BACKGROUND: Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. RESULTS: We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. CONCLUSIONS: MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.


Subject(s)
Gene Expression Profiling , Genes, Essential , RNA-Seq , Sequence Analysis, RNA , Exome Sequencing
7.
Mol Biol Evol ; 37(6): 1679-1693, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32068872

ABSTRACT

To understand the genomic basis accounting for the phenotypic differences between human and apes, we compare the matrices consisting of the cis-element frequencies in the proximal regulatory regions of their genomes. One such frequency matrix is represented by a robust singular value decomposition. For each singular value, the negative and positive ends of the sorted motif eigenvector correspond to the dual ends of the sorted gene eigenvector, respectively, comprising a dual eigen-module defined by cis-regulatory element frequencies (CREF). The CREF eigen-modules at levels 1, 2, 3, and 6 are highly conserved across humans, chimpanzees, and orangutans. The key biological processes embedded in the top three CREF eigen-modules are reproduction versus embryogenesis, fetal maturation versus immune system, and stress responses versus mitosis. Although the divergence at the nucleotide level between the chimpanzee and human genome was small, their cis-element frequency matrices crossed a singularity point, at which the fourth and fifth singular values were identical. The CREF eigen-modules corresponding to the fourth and fifth singular values were reorganized along the evolution from apes to human. Interestingly, the fourth sorted gene eigenvector encodes the phenotypes unique to human such as long-term memory, language development, and social behavior. The number of motifs present on Alu elements increases substantially at the fourth level. The motif analysis together with the cases of human-specific Alu insertions suggests that mutations related to Alu elements play a critical role in the evolution of the human-phenotypic gene eigenvector.


Subject(s)
Alu Elements , Biological Evolution , Genome, Human , Hominidae/genetics , Regulatory Elements, Transcriptional , Animals , Cell Cycle Proteins/genetics , Cognition , Embryonic Development/genetics , Humans , Language Development , Memory, Long-Term , Phenotype , Social Behavior
8.
BMC Bioinformatics ; 20(Suppl 7): 201, 2019 May 01.
Article in English | MEDLINE | ID: mdl-31074378

ABSTRACT

BACKGROUND: A key problem in systems biology is the determination of the regulatory mechanism corresponding to a phenotype. An empirical approach in this regard is to compare the expression profiles of cells under two conditions or tissues from two phenotypes and to unravel the underlying transcriptional regulation. We have proposed the method BASE to statistically infer the effective regulatory factors that are responsible for the gene expression differentiation with the help from the binding data between factors and genes. Usually the protein-DNA binding data are obtained by ChIP-seq experiments, which could be costly and are condition-specific. RESULTS: Here we report a definition of binding strength based on a probability model. Using this condition-free definition, the BASE method needs only the frequencies of cis-motifs in regulatory regions, thereby the inferences can be carried out in silico. The directional regulation can be inferred by considering down- and up-regulation separately. We showed the effectiveness of the approach by one case study. In the study of the effects of polyunsaturated fatty acids (PUFA), namely, docosahexaenoic (DHA) and eicosapentaenoic (EPA) diets on mouse small intestine cells, the inferences of regulations are consistent with those reported in the literature, including PPARα and NFκB, respectively corresponding to enhanced adipogenesis and reduced inflammation. Moreover, we discovered enhanced RORA regulation of circadian rhythm, and reduced ETS1 regulation of angiogenesis. CONCLUSIONS: With the probabilistic definition of cis-trans binding affinity, the BASE method could obtain the significances of TF regulation changes corresponding to a gene expression differentiation profile between treatment and control samples. The landscape of the inferred cis-trans regulations is helpful for revealing the underlying molecular mechanisms. Particularly we reported a more comprehensive regulation induced by EPA&DHA diet.


Subject(s)
Angiogenesis Inducing Agents/administration & dosage , Docosahexaenoic Acids/administration & dosage , Eicosapentaenoic Acid/administration & dosage , Gene Expression Regulation , Hyperlipidemias/genetics , Nucleotide Motifs , Transcription, Genetic , Adipogenesis/drug effects , Animals , Hyperlipidemias/drug therapy , Intestine, Small/metabolism , Mice , Promoter Regions, Genetic
9.
Bioinformatics ; 34(12): 2019-2028, 2018 06 15.
Article in English | MEDLINE | ID: mdl-29346504

ABSTRACT

Motivation: It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Results: Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. Availability and implementation: http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Repetitive Sequences, Nucleic Acid , Sequence Analysis, DNA/methods , Software , Genomics/methods , Uncertainty
10.
BMC Bioinformatics ; 18(1): 335, 2017 Jul 11.
Article in English | MEDLINE | ID: mdl-28697757

ABSTRACT

BACKGROUND: Phred quality scores are essential for downstream DNA analysis such as SNP detection and DNA assembly. Thus a valid model to define them is indispensable for any base-calling software. Recently, we developed the base-caller 3Dec for Illumina sequencing platforms, which reduces base-calling errors by 44-69% compared to the existing ones. However, the model to predict its quality scores has not been fully investigated yet. RESULTS: In this study, we used logistic regression models to evaluate quality scores from predictive features, which include different aspects of the sequencing signals as well as local DNA contents. Sparse models were further obtained by three methods: the backward deletion with either AIC or BIC and the L 1 regularization learning method. The L 1-regularized one was then compared with the Illumina scoring method. CONCLUSIONS: The L 1-regularized logistic regression improves the empirical discrimination power by as large as 14 and 25% respectively for two kinds of preprocessed sequencing signals, compared to the Illumina scoring method. Namely, the L 1 method identifies more base calls of high fidelity. Computationally, the L 1 method can handle large dataset and is efficient enough for daily sequencing. Meanwhile, the logistic model resulted from BIC is more interpretable. The modeling suggested that the most prominent quenching pattern in the current chemistry of Illumina occurred at the dinucleotide "GT". Besides, nucleotides were more likely to be miscalled as the previous bases if the preceding ones were not "G". It suggested that the phasing effect of bases after "G" was somewhat different from those after other nucleotide types.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Logistic Models
11.
Sci Rep ; 7(1): 5044, 2017 07 11.
Article in English | MEDLINE | ID: mdl-28698587

ABSTRACT

Type 2 diabetes (T2D) is a complex and polygenic disease yet in need of a complete picture of its development mechanisms. To better understand the mechanisms, we examined gene expression profiles of multi-tissues from outbred mice fed with a high-fat diet (HFD) or regular chow at weeks 1, 9, and 18. To analyze such complex data, we proposed a novel dual eigen-analysis, in which the sample- and gene-eigenvectors correspond respectively to the macro- and micro-biology information. The dual eigen-analysis identified the HFD eigenvectors as well as the endogenous eigenvectors for each tissue. The results imply that HFD influences the hepatic function or the pancreatic development as an exogenous factor, while in adipose HFD's impact roughly coincides with the endogenous eigenvector driven by aging. The enrichment analysis of the eigenvectors revealed diverse HFD impact on the three tissues over time. The diversity includes: inflammation, degradation of branched chain amino acids (BCAA), and regulation of peroxisome proliferator activated receptor gamma (PPARγ). We reported that in the pancreas remarkable up-regulation of angiogenesis as downstream of the HIF signaling pathway precedes hyperinsulinemia. The dual eigen-analysis and discoveries provide new evaluations/guidance in T2D prevention and therapy, and will also promote new thinking in biology and medicine.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Gene Expression Profiling , Organ Specificity/genetics , Adiponectin/metabolism , Adipose Tissue/metabolism , Amino Acids, Branched-Chain/metabolism , Animals , Cholesterol/biosynthesis , Diet, High-Fat , Down-Regulation/genetics , Insulin/metabolism , Liver/metabolism , Mice , PPAR gamma/metabolism , Pancreas/metabolism , Signal Transduction , Up-Regulation/genetics
12.
Sci Rep ; 7: 41348, 2017 02 20.
Article in English | MEDLINE | ID: mdl-28216647

ABSTRACT

Base-calling accuracy is crucial for high-throughput DNA sequencing and downstream analysis such as read mapping and genome assembly. Accordingly, we made an endeavor to reduce DNA sequencing errors of Illumina systems by correcting three kinds of crosstalk in the cluster intensity data. We discovered that signal crosstalk between adjacent clusters accounts for a large portion of sequencing errors in Illumina systems, even after correcting color crosstalk caused by the overlap of dye emission spectra and phasing/pre-phasing caused by out-of-step nucleotide synthesis. Interestingly and importantly, spatial crosstalk between adjacent clusters is cluster-specific and often asymmetric, which cannot be corrected by existing deconvolution methods. Therefore, we introduce a novel mathematical method able to estimate and remove spatial crosstalk, thereby reducing base-calling errors by 44-69% at a given mapping rate from Illumina systems. Furthermore, the resolution gained from this work provides new room for higher throughput of DNA sequencing and of general measurement systems using fluorescence-based imaging technology. The resulting base-caller 3Dec is available for academic users at http://github.com/flishwnag/3dec. Not only does it reduce 62.1% errors compared to the standard pipeline, but also its implementation is fast enough for daily sequencing.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing/methods , Databases, Protein
13.
Methods ; 67(3): 394-406, 2014 Jun 01.
Article in English | MEDLINE | ID: mdl-24440483

ABSTRACT

The nanoparticle gadolinium endohedral metallofullerenol [Gd@C82(OH)22]n is a new candidate for cancer treatment with low toxicity. However, its anti-cancer mechanisms remain mostly unknown. In this study, we took a systems biology view of the gene expression profiles of human breast cancer cells (MCF-7) and human umbilical vein endothelial cells (ECV304) treated with and without [Gd@C82(OH)22]n, respectively, measured by the Agilent Gene Chip G4112F. To properly analyze these data, we modified a suit of statistical methods we developed. For the first time we applied the sub-sub normalization to Agilent two-color microarrays. Instead of a simple linear regression, we proposed to use a one-knot SPLINE model in the sub-sub normalization to account for nonlinear spatial effects. The parameters estimated by least trimmed squares- and S-estimators show similar normalization results. We made several kinds of inferences by integrating the expression profiles with the bioinformatic knowledge in KEGG pathways, Gene Ontology, JASPAR, and TRANSFAC. In the transcriptional inference, we proposed the BASE2.0 method to infer a transcription factor's up-regulation and down-regulation activities separately. Overall, [Gd@C82(OH)22]n induces more differentiation in MCF-7 cells than in ECV304 cells, particularly in the reduction of protein processing such as protein glucosylation, folding, targeting, exporting, and transporting. Among the KEGG pathways, the ErbB signaling pathway is up-regulated, whereas protein processing in endoplasmic reticulum (ER) is down-regulated. CHOP, a key pro-apoptotic gene downstream of the ER stress pathway, increases to nine folds in MCF-7 cells after treatment. These findings indicate that ER stress may be one important factor that induces apoptosis in MCF-7 cells after [Gd@C82(OH)22]n treatment. The expression profiles of genes associated with ER stress and apoptosis are statistically consistent with other profiles reported in the literature, such as those of HEK293T and MCF-7 cells induced by the miR-23a∼27a∼24-2 cluster. Furthermore, one of the inferred regulatory mechanisms comprises the apoptosis network centered around TP53, whose effective regulation of apoptosis is somehow reestablished after [Gd@C82(OH)22]n treatment. These results elucidate the application and development of [Gd@C82(OH)22]n and other fullerene derivates.


Subject(s)
Apoptosis/drug effects , Endoplasmic Reticulum/drug effects , Systems Biology/methods , Cell Proliferation/drug effects , Fullerenes/chemistry , Fullerenes/therapeutic use , Gadolinium/chemistry , Gadolinium/therapeutic use , Gene Regulatory Networks , Humans , MCF-7 Cells , Nanoparticles/chemistry , Nanoparticles/therapeutic use , Oligonucleotide Array Sequence Analysis , Stress, Physiological , Transcriptome , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/metabolism , Tumor Suppressor Protein p53/physiology
14.
J Comput Biol ; 20(11): 847-60, 2013 Nov.
Article in English | MEDLINE | ID: mdl-24195707

ABSTRACT

Mapping reads to a reference genome is a routine yet computationally intensive task in research based on high-throughput sequencing. In recent years, the sequencing reads of the Illumina platform have become longer and their quality scores higher. According to our calculation, this allows perfect k-mer seed match for almost all reads when a close reference genome is available subject to reasonable specificity. Our other observation is that the majority reads contain at most one short INDEL polymorphism. Based on these observations, we propose a fast-mapping approach, referred to as "SEME," which has two core steps: First it scans a read sequentially in a specific order for a k-mer exact match seed; next it extends the alignment on both sides allowing, at most, one short INDEL each using a novel method called "auto-match function." We decompose the evaluation of the sensitivity and specificity into two parts corresponding to the seed and extension step, and the composite result provides an approximate overall reliability estimate of each mapping. We compare SEME with some existing mapping methods on several datasets, and SEME shows better performance in terms of both running time and mapping rates.


Subject(s)
Chromosome Mapping , Algorithms , Animals , Base Sequence , Computer Simulation , Data Interpretation, Statistical , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Models, Genetic , Sensitivity and Specificity , Sequence Analysis, DNA
15.
Nucleic Acids Res ; 39(Web Server issue): W557-61, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21576217

ABSTRACT

The massively parallel sequencing technologies have recently flourished and dramatically cut the cost to sequence personal human genomes. Haplotype assembly from personal genomes sequenced using the massively parallel sequencing technologies is becoming a cost-effective and promising tool for human disease study. Computational assembly of haplotypes has been proved to be very accurate, but obviously contains errors. Here we present a tool, HapEdit, to assess the accuracy of assembled haplotypes and edit them manually. Using this tool, a user can break erroneous haplotype segments into smaller segments, or concatenate haplotype segments if the concatenated haplotype segments are sufficiently supported. A user can also edit bases with low-quality scores. HapEdit displays haplotype assemblies so that a user can easily navigate and pinpoint a region of interest. As inputs, HapEdit currently takes reads from the Polonator, Illumina, SOLiD, 454 and Sanger sequencing technologies.


Subject(s)
Haplotypes , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Software , Internet
16.
Nucleic Acids Res ; 38(1): 143-58, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19880387

ABSTRACT

In an attempt to elucidate the underlying longevity-promoting mechanisms of mutants lacking SCH9, which live three times as long as wild type chronologically, we measured their time-course gene expression profiles. We interpreted their expression time differences by statistical inferences based on prior biological knowledge, and identified the following significant changes: (i) between 12 and 24 h, stress response genes were up-regulated by larger fold changes and ribosomal RNA (rRNA) processing genes were down-regulated more dramatically; (ii) mitochondrial ribosomal protein genes were not up-regulated between 12 and 60 h as wild type were; (iii) electron transport, oxidative phosphorylation and TCA genes were down-regulated early; (iv) the up-regulation of TCA and electron transport was accompanied by deep down-regulation of rRNA processing over time; and (v) rRNA processing genes were more volatile over time, and three associated cis-regulatory elements [rRNA processing element (rRPE), polymerase A and C (PAC) and glucose response element (GRE)] were identified. Deletion of AZF1, which encodes the transcriptional factor that binds to the GRE element, reversed the lifespan extension of sch9Delta. The significant alterations in these time-dependent expression profiles imply that the lack of SCH9 turns on the longevity programme that extends the lifespan through changes in metabolic pathways and protection mechanisms, particularly, the regulation of aerobic respiration and rRNA processing.


Subject(s)
Gene Expression Regulation, Fungal , Protein Serine-Threonine Kinases/genetics , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/genetics , Citric Acid Cycle/genetics , Electron Transport/genetics , Gene Expression Profiling , Kinetics , Mitochondrial Proteins/genetics , Mitochondrial Proteins/metabolism , Mutation , Oligonucleotide Array Sequence Analysis , Oxidative Phosphorylation , Promoter Regions, Genetic , RNA Processing, Post-Transcriptional , RNA, Ribosomal/metabolism , Response Elements , Ribosomal Proteins/genetics , Ribosomal Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Schizosaccharomyces/genetics , Schizosaccharomyces/metabolism , Stress, Physiological/genetics , Transcription Factors/metabolism
17.
Bioinformatics ; 25(18): 2430-1, 2009 Sep 15.
Article in English | MEDLINE | ID: mdl-19561337

ABSTRACT

SUMMARY: Haplotype assembly is becoming a very important tool in genome sequencing of human and other organisms. Although haplotypes were previously inferred from genome assemblies, there has never been a comparative haplotype browser that depicts a global picture of whole-genome alignments among haplotypes of different organisms. We introduce a whole-genome HAPLotype brOWSER (HAPLOWSER), providing evolutionary perspectives from multiple aligned haplotypes and functional annotations. Haplowser enables the comparison of haplotypes from metagenomes, and associates conserved regions or the bases at the conserved regions with functional annotations and custom tracks. The associations are quantified for further analysis and presented as pie charts. Functional annotations and custom tracks that are projected onto haplotypes are saved as multiple files in FASTA format. Haplowser provides a user-friendly interface, and can display alignments of haplotypes with functional annotations at any resolution. AVAILABILITY: Haplowser, written in Java, supports multiple platforms including Windows and Linux. Haplowser is publicly available at http://embio.yonsei.ac.kr/haplowser .


Subject(s)
Computational Biology/methods , Genome , Haplotypes , Metagenome , Software , Databases, Genetic , Genomics , Internet
18.
PLoS Genet ; 5(5): e1000467, 2009 May.
Article in English | MEDLINE | ID: mdl-19424415

ABSTRACT

The effect of calorie restriction (CR) on life span extension, demonstrated in organisms ranging from yeast to mice, may involve the down-regulation of pathways, including Tor, Akt, and Ras. Here, we present data suggesting that yeast Tor1 and Sch9 (a homolog of the mammalian kinases Akt and S6K) is a central component of a network that controls a common set of genes implicated in a metabolic switch from the TCA cycle and respiration to glycolysis and glycerol biosynthesis. During chronological survival, mutants lacking SCH9 depleted extracellular ethanol and reduced stored lipids, but synthesized and released glycerol. Deletion of the glycerol biosynthesis genes GPD1, GPD2, or RHR2, among the most up-regulated in long-lived sch9Delta, tor1Delta, and ras2Delta mutants, was sufficient to reverse chronological life span extension in sch9Delta mutants, suggesting that glycerol production, in addition to the regulation of stress resistance systems, optimizes life span extension. Glycerol, unlike glucose or ethanol, did not adversely affect the life span extension induced by calorie restriction or starvation, suggesting that carbon source substitution may represent an alternative to calorie restriction as a strategy to delay aging.


Subject(s)
Phosphatidylinositol 3-Kinases/genetics , Phosphatidylinositol 3-Kinases/metabolism , Protein Serine-Threonine Kinases/genetics , Protein Serine-Threonine Kinases/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Animals , Caloric Restriction , Carbon/metabolism , Cell Respiration , Citric Acid Cycle , Culture Media , Ethanol/metabolism , Gene Expression Profiling , Genes, Fungal , Glycerol/metabolism , Glycolysis , Longevity , Models, Biological , Mutation , Signal Transduction , ras Proteins/genetics , ras Proteins/metabolism
19.
BMC Genomics ; 10: 225, 2009 May 15.
Article in English | MEDLINE | ID: mdl-19442316

ABSTRACT

BACKGROUND: Aberrant activation or expression of transcription factors has been implicated in the tumorigenesis of various types of cancer. In spite of the prevalent application of microarray experiments for profiling gene expression in cancer samples, they provide limited information regarding the activities of transcription factors. However, the association between transcription factors and cancers is largely dependent on the transcription regulatory activities rather than mRNA expression levels. RESULTS: In this paper, we propose a computational approach that integrates microarray expression data with the transcription factor binding site information to systematically identify transcription factors associated with patient survival given a specific cancer type. This approach was applied to two gene expression data sets for breast cancer and acute myeloid leukemia. We found that two transcription factor families, the steroid nuclear receptor family and the ATF/CREB family, are significantly correlated with the survival of patients with breast cancer; and that a transcription factor named T-cell acute lymphocytic leukemia 1 is significantly correlated with acute myeloid leukemia patient survival. CONCLUSION: Our analysis identifies transcription factors associating with patient survival and provides insight into the regulatory mechanism underlying the breast cancer and leukemia. The transcription factors identified by our method are biologically meaningful and consistent with prior knowledge. As an insightful tool, this approach can also be applied to other microarray cancer data sets to help researchers better understand the intricate relationship between transcription factors and diseases.


Subject(s)
Breast Neoplasms/genetics , Gene Expression Profiling/methods , Leukemia, Myeloid, Acute/genetics , Oligonucleotide Array Sequence Analysis/methods , Transcription Factors/genetics , Basic Helix-Loop-Helix Transcription Factors/genetics , Humans , Logistic Models , Proportional Hazards Models , Proto-Oncogene Proteins/genetics , Receptors, Steroid/genetics , Survival Rate , T-Cell Acute Lymphocytic Leukemia Protein 1
20.
BMC Bioinformatics ; 9: 194, 2008 Apr 14.
Article in English | MEDLINE | ID: mdl-18410691

ABSTRACT

BACKGROUND: Microarray pre-processing usually consists of normalization and summarization. Normalization aims to remove non-biological variations across different arrays. The normalization algorithms generally require the specification of reference and target arrays. The issue of reference selection has not been fully addressed. Summarization aims to estimate the transcript abundance from normalized intensities. In this paper, we consider normalization and summarization jointly by a new strategy of reference selection. RESULTS: We propose a Probe-Treatment-Reference (PTR) model to streamline normalization and summarization by allowing multiple references. We estimate parameters in the model by the Least Absolute Deviations (LAD) approach and implement the computation by median polishing. We show that the LAD estimator is robust in the sense that it has bounded influence in the three-factor PTR model. This model fitting, implicitly, defines an "optimal reference" for each probe-set. We evaluate the effectiveness of the PTR method by two Affymetrix spike-in data sets. Our method reduces the variations of non-differentially expressed genes and thereby increases the detection power of differentially expressed genes. CONCLUSION: Our results indicate that the reference effect is important and should be considered in microarray pre-processing. The proposed PTR method is a general framework to deal with the issue of reference selection and can readily be applied to existing normalization algorithms such as the invariant-set, sub-array and quantile method.


Subject(s)
DNA Probes/genetics , Gene Expression Profiling/methods , Models, Genetic , Oligonucleotide Array Sequence Analysis/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Algorithms , Base Sequence , Computer Simulation , DNA Probes/standards , Gene Expression Profiling/standards , Molecular Sequence Data , Oligonucleotide Array Sequence Analysis/standards , Reference Values
SELECTION OF CITATIONS
SEARCH DETAIL
...