Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
1.
iScience ; 25(11): 105273, 2022 Nov 18.
Article in English | MEDLINE | ID: mdl-36304115

ABSTRACT

De novo genome assembly is a fundamental problem in computational molecular biology that aims to reconstruct an unknown genome sequence from a set of short DNA sequences (or reads) obtained from the genome. The relative ordering of the reads along the target genome is not known a priori, which is one of the main contributors to the increased complexity of the assembly process. In this article, with the dual objective of improving assembly quality and exposing a high degree of parallelism, we present a partitioning-based approach. Our framework, BOA (bucket-order-assemble), uses a bucketing alongside graph- and hypergraph-based partitioning techniques to produce a partial ordering of the reads. This partial ordering enables us to divide the read set into disjoint blocks that can be independently assembled in parallel using any state-of-the-art serial assembler of choice. Experimental results show that BOA improves both the overall assembly quality and performance.

2.
Bioinformatics ; 36(3): 945-947, 2020 02 01.
Article in English | MEDLINE | ID: mdl-31418766

ABSTRACT

SUMMARY: In exploring the epidemiology of infectious diseases, networks have been used to reconstruct contacts among individuals and/or populations. Summarizing networks using pathogen metadata (e.g. host species and place of isolation) and a phylogenetic tree is a nascent, alternative approach. In this paper, we introduce a tool for reconstructing transmission networks in arbitrary space from phylogenetic information and metadata. Our goals are to provide a means of deriving new insights and infection control strategies based on the dynamics of the pathogen lineages derived from networks and centrality metrics. We created a web-based application, called StrainHub, in which a user can input a phylogenetic tree based on genetic or other data along with characters derived from metadata using their preferred tree search method. StrainHub generates a transmission network based on character state changes in metadata, such as place or source of isolation, mapped on the phylogenetic tree. The user has the option to calculate centrality metrics on the nodes including betweenness, closeness, degree and a new metric, the source/hub ratio. The outputs include the network with values for metrics on its nodes and the tree with characters reconstructed. All of these results can be exported for further analysis. AVAILABILITY AND IMPLEMENTATION: strainhub.io and https://github.com/abschneider/StrainHub.


Subject(s)
Metadata , Humans , Phylogeny
3.
BMC Genomics ; 20(1): 1008, 2019 Dec 21.
Article in English | MEDLINE | ID: mdl-31864285

ABSTRACT

BACKGROUND: Rumen ciliates play important roles in rumen function by digesting and fermenting feed and shaping the rumen microbiome. However, they remain poorly understood due to the lack of definitive direct evidence without influence by prokaryotes (including symbionts) in co-cultures or the rumen. In this study, we used RNA-Seq to characterize the transcriptome of Entodinium caudatum, the most predominant and representative rumen ciliate species. RESULTS: Of a large number of transcripts, > 12,000 were annotated to the curated genes in the NR, UniProt, and GO databases. Numerous CAZymes (including lysozyme and chitinase) and peptidases were represented in the transcriptome. This study revealed the ability of E. caudatum to depolymerize starch, hemicellulose, pectin, and the polysaccharides of the bacterial and fungal cell wall, and to degrade proteins. Many signaling pathways, including the ones that have been shown to function in E. caudatum, were represented by many transcripts. The transcriptome also revealed the expression of the genes involved in symbiosis, detoxification of reactive oxygen species, and the electron-transport chain. Overall, the transcriptomic evidence is consistent with some of the previous premises about E. caudatum. However, the identification of specific genes, such as those encoding lysozyme, peptidases, and other enzymes unique to rumen ciliates might be targeted to develop specific and effective inhibitors to improve nitrogen utilization efficiency by controlling the activity and growth of rumen ciliates. The transcriptomic data will also help the assembly and annotation in future genomic sequencing of E. caudatum. CONCLUSION: As the first transcriptome of a single species of rumen ciliates ever sequenced, it provides direct evidence for the substrate spectrum, fermentation pathways, ability to respond to various biotic and abiotic stimuli, and other physiological and ecological features of E. caudatum. The presence and expression of the genes involved in the lysis and degradation of microbial cells highlight the dependence of E. caudatum on engulfment of other rumen microbes for its survival and growth. These genes may be explored in future research to develop targeted control of Entodinium species in the rumen. The transcriptome can also facilitate future genomic studies of E. caudatum and other related rumen ciliates.


Subject(s)
Alveolata/genetics , Alveolata/metabolism , Gene Expression Profiling , Alveolata/cytology , Alveolata/physiology , Animals , Carbohydrate Metabolism/genetics , Intracellular Space/metabolism , Phagocytosis/genetics , RNA, Messenger/genetics , RNA-Seq , Signal Transduction/genetics , Symbiosis/genetics
4.
Methods Mol Biol ; 1375: 55-74, 2016.
Article in English | MEDLINE | ID: mdl-26626937

ABSTRACT

Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.


Subject(s)
Cluster Analysis , Computational Biology/methods , Gene Expression Profiling/methods , Algorithms , Databases, Genetic , Gene Expression Regulation , Gene Expression Regulation, Neoplastic , Genes, BRCA1 , Genes, BRCA2 , Genes, p53 , Humans
5.
J Infect Dis ; 213(4): 502-8, 2016 Feb 15.
Article in English | MEDLINE | ID: mdl-25995194

ABSTRACT

BACKGROUND: Using a novel combination of whole-genome sequencing (WGS) analysis and geographic metadata, we traced the origins of Salmonella Bareilly isolates collected in 2012 during a widespread food-borne outbreak in the United States associated with scraped tuna imported from India. METHODS: Using next-generation sequencing, we sequenced the complete genome of 100 Salmonella Bareilly isolates obtained from patients who consumed contaminated product, from natural sources, and from unrelated historically and geographically disparate foods. Pathogen genomes were linked to geography by projecting the phylogeny on a virtual globe and produced a transmission network. RESULTS: Phylogenetic analysis of WGS data revealed a common origin for outbreak strains, indicating that patients in Maryland and New York were infected from sources originating at a facility in India. CONCLUSIONS: These data represent the first report fully integrating WGS analysis with geographic mapping and a novel use of transmission networks. Results showed that WGS vastly improves our ability to delimit the scope and source of bacterial food-borne contamination events. Furthermore, these findings reinforce the extraordinary utility that WGS brings to global outbreak investigation as a greatly enhanced approach to protecting the human food supply chain as well as public health in general.


Subject(s)
Disease Outbreaks , Foodborne Diseases/epidemiology , Salmonella Infections/epidemiology , Salmonella enterica/classification , Salmonella enterica/isolation & purification , Animals , Foodborne Diseases/microbiology , Genome, Bacterial , Genotype , Humans , India , Molecular Epidemiology , Molecular Typing , Phylogeography , Salmonella Infections/microbiology , Salmonella enterica/genetics , Sequence Analysis, DNA , Tuna/microbiology , United States/epidemiology
6.
Int J Data Min Bioinform ; 13(1): 31-49, 2015.
Article in English | MEDLINE | ID: mdl-26529906

ABSTRACT

Two-colour microarrays are used to study differential gene expression on a large scale. Experimental planning can help reduce the chances of wrong inferences about whether genes are differentially expressed. Previous research on this problem has focused on minimising estimation errors (according to variance-based criteria such as A-optimality) on the basis of optimistic assumptions about the system studied. In this paper, we propose a novel planning criterion to evaluate existing plans for microarray experiments. The proposed criterion is 'Generalised-A Optimality' that is based on realistic assumptions that include bias errors. Using Generalised-A Optimality, the reference-design approach is likely to yield greater estimation accuracy in specific situations in which loop designs had previously seemed superior. However, hybrid designs are likely to offer higher estimation accuracy than reference, loop and interwoven designs having the same number of samples and slides. These findings are supported by data from both simulated and real microarray experiments.


Subject(s)
Data Mining/methods , Databases, Genetic , Gene Expression Profiling , Gene Expression Regulation , Models, Theoretical , Oligonucleotide Array Sequence Analysis
7.
Stud Health Technol Inform ; 216: 766-70, 2015.
Article in English | MEDLINE | ID: mdl-26262155

ABSTRACT

Traditionally, epidemiologists have counted cases and groups of symptoms. Modeling on these data consists of predicting expansion or contraction in the number of cases over time in epidemic curves or compartment models. Geography is considered a variable when these data are presented in choropleth maps. These approaches have significant drawbacks if the cases counted are not accurately diagnosed. For example, most regional public health authorities count influenza like illnesses (ILI). Cases of these diseases are designated as ILI if the patient exhibits fever, respiratory symptoms, and perhaps gastrointestinal symptoms. Several molecular epidemiological studies have shown that there are many pathogens that cause these symptoms and the relative proportions of these pathogens change over time and space. One way to bridge the gap between syndromic and genetic surveillance of infectious diseases is to compare signals of symptoms to pathogens recorded in molecular databases. We present a web-based workflow application that uses chief complaints found in the public Twitter feed as a syndromic surveillance tool and connects outbreak signals in these data to pathogens historically known to circulate in the same area. For the pathogen(s) of interest, we provide Genbank links to metadata and sequences in a workflow for phylogeographic analysis and visualization. The visualizations provide information on the geographic traffic of the spread of the pathogens and places that are hubs for their transport.


Subject(s)
Communicable Diseases/epidemiology , Communicable Diseases/genetics , Molecular Epidemiology/methods , Phylogeography/methods , Social Media/statistics & numerical data , Workflow , Humans , Natural Language Processing , Population Surveillance/methods , Prevalence , Symptom Assessment/methods , Symptom Assessment/statistics & numerical data
8.
Int J Cancer ; 137(10): 2323-31, 2015 Nov 15.
Article in English | MEDLINE | ID: mdl-25973956

ABSTRACT

Colorectal cancer (CRC) can be classified into different types. Chromosomal instable (CIN) colon cancers are thought to be the most common type of colon cancer. The risk of developing a CIN-related CRC is due in part to inherited risk factors. Genome-wide association studies have yielded over 40 single nucleotide polymorphisms (SNPs) associated with CRC risk, but these only account for a subset of risk alleles. Some of this missing heritability may be due to gene-gene interactions. We developed a strategy to identify interacting candidate genes/loci for CRC risk that utilizes both linkage and RNA-seq data from mouse models in combination with allele-specific imbalance (ASI) studies in human tumors. We applied our strategy to three previously identified CRC susceptibility loci in the mouse that show evidence of genetic interaction: Scc4, Scc5 and Scc13. 525 SNPs from genes showing differential expression in the mouse and/or a previous role in cancer from the literature were evaluated for allele-specific imbalance in 194 paired human normal/tumor DNAs from CIN-related CRCs. One hundred three SNPs showing suggestive evidence of ASI (31 variants with uncorrected p values < 0.05) were genotyped in a validation set of 296 paired DNAs. Two variants in SNX10 (SCC13) showed significant evidence of allelic selection after multiple comparisons testing. Future studies will evaluate the role of these variants in combination with interacting genetic partners in colon cancer risk in mouse and humans.


Subject(s)
Allelic Imbalance , Colonic Neoplasms/genetics , Genetic Predisposition to Disease/genetics , Neoplasms, Experimental/genetics , Alleles , Animals , Chromosomal Instability/genetics , Comparative Genomic Hybridization , Female , Genotype , Humans , Linkage Disequilibrium , Mice , Polymorphism, Single Nucleotide , Sequence Analysis, RNA/methods
9.
AMIA Annu Symp Proc ; 2015: 861-9, 2015.
Article in English | MEDLINE | ID: mdl-26958222

ABSTRACT

Multiple choice questions play an important role in training and evaluating biomedical science students. However, the resource intensive nature of question generation limits their open availability, reducing their contribution to evaluation purposes mainly. Although applied-knowledge questions require a complex formulation process, the creation of concrete-knowledge questions (i.e., definitions, associations) could be assisted by the use of informatics methods. We envisioned a novel and simple algorithm that exploits validated knowledge repositories and generates concrete-knowledge questions by leveraging concepts' relationships. In this manuscript we present the development and validation of a prototype which successfully produced meaningful concrete-knowledge questions, opening new applications for existing knowledge repositories, potentially benefiting students of all biomedical sciences disciplines.


Subject(s)
Algorithms , Biological Science Disciplines/education , Education, Medical , Educational Measurement/methods , Vocabulary, Controlled , Choice Behavior , Humans , Medical Subject Headings
10.
Cladistics ; 31(6): 679-691, 2015 Dec.
Article in English | MEDLINE | ID: mdl-34753271

ABSTRACT

Viruses of influenza A subtype H7 can be highly pathogenic and periodically infect humans. For example, there have been numerous outbreaks of H7 in the Americas and Europe since 1996. More recently, a reassortant H7N9 has emerged among humans and birds during 2013-2014 in China, Taiwan and Hong Kong. This H7N9 genome consists of genetic segments that assort with H7 and H9 viruses previously circulating in chickens and wild birds in China and ducks in Korea. Epidemic risk modellers have used agricultural, climatic and demographic data to predict that the virus will spread to northern Vietnam via poultry. To shed light on the traffic of H7 viruses in general, we examine genetic segments of influenza that have assorted with many strains of H7 viruses dating back to 1902. We focus on use cases from the United States, Italy and China. We apply a novel metric, betweenness, an associated phylogenetic visualization technique, transmission networks, and compare these with another technique, route mapping. In contrast to traditional views, our results illustrate that segments that assort with H7 viruses are spread frequently between the Americas and Eurasia. In summary, genetic segments that historically assort with H7 influenza viruses have been spread from China to: Australia, Czech Republic, Denmark, Egypt, Germany, Hong Kong, Italy, Japan, Mongolia, the Netherlands, New Zealand, Pakistan, South Africa, South Korea, Spain, Sweden, the UK, the US, and Vietnam.

11.
BMC Bioinformatics ; 15: 73, 2014 Mar 15.
Article in English | MEDLINE | ID: mdl-24629096

ABSTRACT

BACKGROUND: MicroRNAs (miRNAs) are short (19-23 nucleotides) non-coding RNAs that bind to sites in the 3'untranslated regions (3'UTR) of a targeted messenger RNA (mRNA). Binding leads to degradation of the transcript or blocked translation resulting in decreased expression of the targeted gene. Single nucleotide polymorphisms (SNPs) have been found in 3'UTRs that disrupt normal miRNA binding or introduce new binding sites and some of these have been associated with disease pathogenesis. This raises the importance of detecting miRNA targets and predicting the possible effects of SNPs on binding sites. In the last decade a number of studies have been conducted to predict the location of miRNA binding sites. However, there have been fewer algorithms published to analyze the effects of SNPs on miRNA binding. Moreover, the existing software has some shortcomings including the requirement for significant manual labor when working with huge lists of SNPs and that algorithms work only for SNPs present in databases such as dbSNP. These limitations become problematic as next-generation sequencing is leading to large numbers of novel variants in 3'UTRs. RESULT: In order to overcome these issues, we developed a web-server named mrSNP which predicts the impact of a SNP in a 3'UTR on miRNA binding. The proposed tool reduces the manual labor requirements and allows users to input any SNP that has been identified by any SNP-calling program. In testing the performance of mrSNP on SNPs experimentally validated to affect miRNA binding, mrSNP correctly identified 69% (11/16) of the SNPs disrupting binding. CONCLUSIONS: mrSNP is a highly adaptable and performing tool for predicting the effect a 3'UTR SNP will have on miRNA binding. This tool has advantages over existing algorithms because it can assess the effect of novel SNPs on miRNA binding without requiring significant hands on time.


Subject(s)
MicroRNAs/genetics , Sequence Analysis, RNA/methods , Software , 3' Untranslated Regions , Algorithms , Binding Sites/genetics , Humans , MicroRNAs/metabolism , Polymorphism, Single Nucleotide , RNA, Messenger/genetics , RNA, Messenger/metabolism
12.
Cancer Biol Ther ; 15(5): 533-43, 2014 May.
Article in English | MEDLINE | ID: mdl-24521615

ABSTRACT

NUSAP1 has been reported to function in mitotic spindle assembly, chromosome segregation, and regulation of cytokinesis. In this study, we find that NUSAP1 has hitherto unknown functions in the key BRCA1-regulated pathways of double strand DNA break repair and centrosome duplication. Both these pathways are important for maintenance of genomic stability, and any defects in these pathways can cause tumorigenesis. Depletion of NUSAP1 from cells led to the suppression of double strand DNA break repair via the homologous recombination and single-strand annealing pathways. The presence of NUSAP1 was also found to be important for the control of centrosome numbers. We have found evidence that NUSAP1 plays a role in these processes through regulation of BRCA1 protein levels, and BRCA1 overexpression from a plasmid mitigates the defective phenotypes seen upon NUSAP1 depletion. We found that after NUSAP1 depletion there is a decrease in BRCA1 recruitment to ionizing radiation-induced foci. Results from this study reveal a novel association between BRCA1 and NUSAP1 and suggests a mechanism whereby NUSAP1 is involved in carcinogenesis.


Subject(s)
BRCA1 Protein/metabolism , DNA Damage , DNA Repair , Microtubule-Associated Proteins/metabolism , BRCA1 Protein/genetics , Cell Line, Tumor , Centrosome/metabolism , DNA Damage/radiation effects , DNA, Single-Stranded/metabolism , G2 Phase Cell Cycle Checkpoints , Homologous Recombination , Humans , Microtubule-Associated Proteins/genetics , S Phase Cell Cycle Checkpoints
13.
Concurr Comput ; 26(18): 2836-2855, 2014 Dec 01.
Article in English | MEDLINE | ID: mdl-25598745

ABSTRACT

Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature.

14.
PLoS One ; 8(11): e78507, 2013.
Article in English | MEDLINE | ID: mdl-24223817

ABSTRACT

The ruminal microbial community is a unique source of enzymes that underpin the conversion of cellulosic biomass. In this study, the microbial consortia adherent on solid digesta in the rumen of Jersey cattle were subjected to an activity-based metagenomic study to explore the genetic diversity of carbohydrolytic enzymes in Jersey cows, with a particular focus on cellulases and xylanases. Pyrosequencing and bioinformatic analyses of 120 carbohydrate-active fosmids identified genes encoding 575 putative Carbohydrate-Active Enzymes (CAZymes) and proteins putatively related to transcriptional regulation, transporters, and signal transduction coupled with polysaccharide degradation and metabolism. Most of these genes shared little similarity to sequences archived in databases. Genes that were predicted to encode glycoside hydrolases (GH) involved in xylan and cellulose hydrolysis (e.g., GH3, 5, 9, 10, 39 and 43) were well represented. A new subfamily (S-8) of GH5 was identified from contigs assigned to Firmicutes. These subfamilies of GH5 proteins also showed significant phylum-dependent distribution. A number of polysaccharide utilization loci (PULs) were found, and two of them contained genes encoding Sus-like proteins and cellulases that have not been reported in previous metagenomic studies of samples from the rumens of cows or other herbivores. Comparison with the large metagenomic datasets previously reported of other ruminant species (or cattle breeds) and wallabies showed that the rumen microbiome of Jersey cows might contain differing CAZymes. Future studies are needed to further explore how host genetics and diets affect the diversity and distribution of CAZymes and utilization of plant cell wall materials.


Subject(s)
Bacterial Proteins/genetics , Cellulases/genetics , Cellulose/metabolism , Endo-1,4-beta Xylanases/genetics , Glycoside Hydrolases/genetics , Metagenome , Xylans/metabolism , Animals , Bacterial Proteins/classification , Bacterial Proteins/metabolism , Cattle , Cellulases/classification , Cellulases/metabolism , Digestion/physiology , Endo-1,4-beta Xylanases/classification , Endo-1,4-beta Xylanases/metabolism , Glycoside Hydrolases/classification , Glycoside Hydrolases/metabolism , Microbial Consortia/genetics , Molecular Sequence Annotation , Phylogeny , Rumen/enzymology , Rumen/microbiology , Ruminants/microbiology , Ruminants/physiology
15.
BMC Bioinformatics ; 14: 184, 2013 Jun 07.
Article in English | MEDLINE | ID: mdl-23758764

ABSTRACT

BACKGROUND: The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. RESULTS: We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. CONCLUSION: The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Algorithms , Benchmarking , Genome
16.
Brief Bioinform ; 14(3): 279-92, 2013 May.
Article in English | MEDLINE | ID: mdl-22772837

ABSTRACT

The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.


Subject(s)
Algorithms , Gene Expression , Animals , Cluster Analysis , Factor Analysis, Statistical , Humans , Models, Theoretical , Oligonucleotide Array Sequence Analysis
17.
Hum Hered ; 72(4): 276-88, 2011.
Article in English | MEDLINE | ID: mdl-22189470

ABSTRACT

This paper describes the software package KELVIN, which supports the PPL (posterior probability of linkage) framework for the measurement of statistical evidence in human (or more generally, diploid) genetic studies. In terms of scope, KELVIN supports two-point (trait-marker or marker-marker) and multipoint linkage analysis, based on either sex-averaged or sex-specific genetic maps, with an option to allow for imprinting; trait-marker linkage disequilibrium (LD), or association analysis, in case-control data, trio data, and/or multiplex family data, with options for joint linkage and trait-marker LD or conditional LD given linkage; dichotomous trait, quantitative trait and quantitative trait threshold models; and certain types of gene-gene interactions and covariate effects. Features and data (pedigree) structures can be freely mixed and matched within analyses. The statistical framework is specifically tailored to accumulate evidence in a mathematically rigorous way across multiple data sets or data subsets while allowing for multiple sources of heterogeneity, and KELVIN itself utilizes sophisticated software engineering to provide a powerful and robust platform for studying the genetics of complex disorders.


Subject(s)
Genetic Linkage , Models, Statistical , Software , Chromosome Mapping , Epistasis, Genetic , Genomic Imprinting , Humans , Linkage Disequilibrium , Models, Genetic , Pedigree , Quantitative Trait Loci
18.
Cladistics ; 27(1): 61-66, 2011 Feb.
Article in English | MEDLINE | ID: mdl-32313364

ABSTRACT

Novel pathogens have the potential to become critical issues of national security, public health and economic welfare. As demonstrated by the response to Severe Acute Respiratory Syndrome (SARS) and influenza, genomic sequencing has become an important method for diagnosing agents of infectious disease. Despite the value of genomic sequences in characterizing novel pathogens, raw data on their own do not provide the information needed by public health officials and researchers. One must integrate knowledge of the genomes of pathogens with host biology and geography to understand the etiology of epidemics. To these ends, we have created an application called Supramap (http://supramap.osu.edu) to put information on the spread of pathogens and key mutations across time, space and various hosts into a geographic information system (GIS). To build this application, we created a web service for integrated sequence alignment and phylogenetic analysis as well as methods to describe the tree, mutations, and host shifts in Keyhole Markup Language (KML). We apply the application to 239 sequences of the polymerase basic 2 (PB2) gene of recent isolates of avian influenza (H5N1). We map a mutation, glutamic acid to lysine at position 627 in the PB2 protein (E627K), in H5N1 influenza that allows for increased replication of the virus in mammals. We use a statistical test to support the hypothesis of a correlation of E627K mutations with avian-mammalian host shifts but reject the hypothesis that lineages with E627K are moving westward. Data, instructions for use, and visualizations are included as supplemental materials at: http://supramap.osu.edu/sm/supramap/publications. © The Willi Hennig Society 2010.

19.
Article in English | MEDLINE | ID: mdl-30009263

ABSTRACT

The segmentation of tissue regions in high-resolution microscopy is a challenging problem due to both the size and appearance of digitized pathology sections. The two point correlation function (TPCF) has proved to be an effective feature to address the textural appearance of tissues. However the calculation of the TPCF functions is computationally burdensome and often intractable in the gigapixel images produced by slide scanning devices for pathology application. In this paper we present several approaches for accelerating deterministic calculation of point correlation functions using theory to reduce computation, parallelization on distributed systems, and parallelization on graphics processors. Previously we show that the correlation updating method of calculation offers an 8-35× speedup over frequency domain methods and decouples efficient computation from the select scales of Fourier methods. In this paper, using distributed computation on 64 compute nodes provides a further 42× speedup. Finally, parallelization on graphics processors (GPU) results in an additional 11-16× speedup using an implementation capable of running on a single desktop machine.

20.
Article in English | MEDLINE | ID: mdl-19963746

ABSTRACT

Histopathological examination is one of the most important steps in evaluating prognosis of patients with neuroblastoma (NB). NB is a pediatric tumor of sympathetic nervous system and current evaluation of NB tumor histology is done according to the International Neuroblastoma Pathology Classification. The number of cells undergoing either mitosis or karyorrhexis (MK) plays an important role in this classification system. However, manual counting of such cells is tedious and subject to considerable inter- and intra-reader variations. A computer-assisted system may allow more precise results leading to more accurate prognosis in clinical practice. In this study, we propose an image analysis approach that operates on digitized NB histology samples. Based on the likelihood functions estimated from the samples of manually marked regions, we compute the probability map that indicates how likely a pixel belongs to an MK cell. Component-wise 2-step thresholding of the generated probability map provides promising results in detecting MK cells with an average sensitivity of 81.1% and 12.2 false positive detections on average.


Subject(s)
Diagnosis, Computer-Assisted/methods , Neuroblastoma/pathology , Cell Death , Child , Diagnosis, Computer-Assisted/statistics & numerical data , Humans , Image Processing, Computer-Assisted , Likelihood Functions , Mitosis , Prognosis , Signal Processing, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL
...