Search | VHL Regional Portal

Unsupervised discovery of phenotype-specific multi-omics networks.

Shi, W Jenny; Zhuang, Yonghua; Russell, Pamela H; Hobbs, Brian D; Parker, Margaret M; Castaldi, Peter J; Rudra, Pratyaydipta; Vestal, Brian; Hersh, Craig P; Saba, Laura M; Kechris, Katerina.

Bioinformatics ; 35(21): 4336-4343, 2019 11 01.

Article in English | MEDLINE | ID: mdl-30957844

ABSTRACT

MOTIVATION: Complex diseases often involve a wide spectrum of phenotypic traits. Better understanding of the biological mechanisms relevant to each trait promotes understanding of the etiology of the disease and the potential for targeted and effective treatment plans. There have been many efforts towards omics data integration and network reconstruction, but limited work has examined the incorporation of relevant (quantitative) phenotypic traits. RESULTS: We propose a novel technique, sparse multiple canonical correlation network analysis (SmCCNet), for integrating multiple omics data types along with a quantitative phenotype of interest, and for constructing multi-omics networks that are specific to the phenotype. As a case study, we focus on miRNA-mRNA networks. Through simulations, we demonstrate that SmCCNet has better overall prediction performance compared to popular gene expression network construction and integration approaches under realistic settings. Applying SmCCNet to studies on chronic obstructive pulmonary disease (COPD) and breast cancer, we found enrichment of known relevant pathways (e.g. the Cadherin pathway for COPD and the interferon-gamma signaling pathway for breast cancer) as well as less known omics features that may be important to the diseases. Although those applications focus on miRNA-mRNA co-expression networks, SmCCNet is applicable to a variety of omics and other data types. It can also be easily generalized to incorporate multiple quantitative phenotype simultaneously. The versatility of SmCCNet suggests great potential of the approach in many areas. AVAILABILITY AND IMPLEMENTATION: The SmCCNet algorithm is written in R, and is freely available on the web at https://cran.r-project.org/web/packages/SmCCNet/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Gene Regulatory Networks , Algorithms , Breast Neoplasms , Humans , Phenotype , Signal Transduction

A large-scale analysis of bioinformatics code on GitHub.

Russell, Pamela H; Johnson, Rachel L; Ananthan, Shreyas; Harnke, Benjamin; Carlson, Nichole E.

PLoS One ; 13(10): e0205898, 2018.

Article in English | MEDLINE | ID: mdl-30379882

ABSTRACT

In recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of software. However, the actual state of the body of bioinformatics software remains largely unknown. The purpose of this paper is to investigate the state of source code in the bioinformatics community, specifically looking at relationships between code properties, development activity, developer communities, and software impact. To investigate these issues, we curated a list of 1,720 bioinformatics repositories on GitHub through their mention in peer-reviewed bioinformatics articles. Additionally, we included 23 high-profile repositories identified by their popularity in an online bioinformatics forum. We analyzed repository metadata, source code, development activity, and team dynamics using data made available publicly through the GitHub API, as well as article metadata. We found key relationships within our dataset, including: certain scientific topics are associated with more active code development and higher community interest in the repository; most of the code in the main dataset is written in dynamically typed languages, while most of the code in the high-profile set is statically typed; developer team size is associated with community engagement and high-profile repositories have larger teams; the proportion of female contributors decreases for high-profile repositories and with seniority level in author lists; and, multiple measures of project impact are associated with the simple variable of whether the code was modified at all after paper publication. In addition to providing the first large-scale analysis of bioinformatics code to our knowledge, our work will enable future analysis through publicly available data, code, and methods. Code to generate the dataset and reproduce the analysis is provided under the MIT license at https://github.com/pamelarussell/github-bioinformatics. Data are available at https://doi.org/10.17605/OSF.IO/UWHX8.

Subject(s)

Computational Biology/trends , Genome , Metadata/statistics & numerical data , Software/statistics & numerical data , Bibliometrics , Datasets as Topic , Humans , Internet , Software/supply & distribution

miR-MaGiC improves quantification accuracy for small RNA-seq.

Russell, Pamela H; Vestal, Brian; Shi, Wen; Rudra, Pratyaydipta D; Dowell, Robin; Radcliffe, Richard; Saba, Laura; Kechris, Katerina.

BMC Res Notes ; 11(1): 296, 2018 May 15.

Article in English | MEDLINE | ID: mdl-29764489

ABSTRACT

OBJECTIVE: Many tools have been developed to profile microRNA (miRNA) expression from small RNA-seq data. These tools must contend with several issues: the small size of miRNAs, the small number of unique miRNAs, the fact that similar miRNAs can be transcribed from multiple loci, and the presence of miRNA isoforms known as isomiRs. Methods failing to address these issues can return misleading information. We propose a novel quantification method designed to address these concerns. RESULTS: We present miR-MaGiC, a novel miRNA quantification method, implemented as a cross-platform tool in Java. miR-MaGiC performs stringent mapping to a core region of each miRNA and defines a meaningful set of target miRNA sequences by collapsing the miRNA space to "functional groups". We hypothesize that these two features, mapping stringency and collapsing, provide more optimal quantification to a more meaningful unit (i.e., miRNA family). We test miR-MaGiC and several published methods on 210 small RNA-seq libraries, evaluating each method's ability to accurately reflect global miRNA expression profiles. We define accuracy as total counts close to the total number of input reads originating from miRNAs. We find that miR-MaGiC, which incorporates both stringency and collapsing, provides the most accurate counts.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , MicroRNAs/metabolism , Sequence Analysis, RNA/methods , Animals , Mice

Radtools: R utilities for convenient extraction of medical image metadata.

Russell, Pamela H; Ghosh, Debashis.

F1000Res ; 72018.

Article in English | MEDLINE | ID: mdl-31131079

ABSTRACT

The radiology community has adopted several widely used standards for medical image files, including the popular DICOM (Digital Imaging and Communication in Medicine) and NIfTI (Neuroimaging Informatics Technology Initiative) standards. These file formats include image intensities as well as potentially extensive metadata. The NIfTI standard specifies a particular set of header fields describing the image and minimal information about the scan. DICOM headers can include any of >4,000 available metadata attributes spanning a variety of topics. NIfTI files contain all slices for an image series, while DICOM files capture single slices and image series are typically organized into a directory. Each DICOM file contains metadata for the image series as well as the individual image slice. The programming environment R is popular for data analysis due to its free and open code, active ecosystem of tools and users, and excellent system of contributed packages. Currently, many published radiological image analyses are performed with proprietary software or custom unpublished scripts. However, R is increasing in popularity in this area due to several packages for processing and analysis of image files. While these R packages handle image import and processing, no existing package makes image metadata conveniently accessible. Extracting image metadata, combining across slices, and converting to useful formats can be prohibitively cumbersome, especially for DICOM files. We present radtools, an R package for smooth navigation of medical image data. Radtools makes the problem of extracting image metadata trivially simple, providing simple functions to explore and return information in familiar R data structures. Radtools also facilitates extraction of image data and viewing of image slices. The package is freely available under the MIT license at https://github.com/pamelarussell/radtools and is easily installable from the Comprehensive R Archive Network ( https://cran.r-project.org/package=radtools).

Subject(s)

Diagnostic Imaging , Image Processing, Computer-Assisted/methods , Metadata , Software , Radiology Information Systems

Model based heritability scores for high-throughput sequencing data.

Rudra, Pratyaydipta; Shi, W Jenny; Vestal, Brian; Russell, Pamela H; Odell, Aaron; Dowell, Robin D; Radcliffe, Richard A; Saba, Laura M; Kechris, Katerina.

BMC Bioinformatics ; 18(1): 143, 2017 Mar 02.

Article in English | MEDLINE | ID: mdl-28253840

ABSTRACT

BACKGROUND: Heritability of a phenotypic or molecular trait measures the proportion of variance that is attributable to genotypic variance. It is an important concept in breeding and genetics. Few methods are available for calculating heritability for traits derived from high-throughput sequencing. RESULTS: We propose several statistical models and different methods to compute and test a heritability measure for such data based on linear and generalized linear mixed effects models. We also provide methodology for hypothesis testing and interval estimation. Our analyses show that, among the methods, the negative binomial mixed model (NB-fit), compound Poisson mixed model (CP-fit), and the variance stabilizing transformed linear mixed model (VST) outperform the voom-transformed linear mixed model (voom). NB-fit and VST appear to be more robust than CP-fit for estimating and testing the heritability scores, while NB-fit is the most computationally expensive. CP-fit performed best in terms of the coverage of the confidence intervals. In addition, we applied the methods to both microRNA (miRNA) and messenger RNA (mRNA) sequencing datasets from a recombinant inbred mouse panel. We show that miRNA and mRNA expression can be a highly heritable molecular trait in mouse, and that some top heritable features coincide with expression quantitative trait loci. CONCLUSIONS: The models and methods we investigated in this manuscript is applicable and extendable to sequencing experiments where some biological replicates are available and the environmental variation is properly controlled. The CP-fit approach for assessing heritability was implemented for the first time to our knowledge. All the methods presented, as well as the generation of simulated sequencing data under either negative binomial or compound Poisson mixed models, are provided in the R package HeritSeq.

Subject(s)

High-Throughput Nucleotide Sequencing , Models, Genetic , Animals , Genotype , Linear Models , Mice , MicroRNAs/chemistry , MicroRNAs/metabolism , Phenotype , Quantitative Trait, Heritable , RNA, Messenger/chemistry , RNA, Messenger/metabolism , Sequence Analysis, RNA

Genome-wide imputation study identifies novel HLA locus for pulmonary fibrosis and potential role for auto-immunity in fibrotic idiopathic interstitial pneumonia.

Fingerlin, Tasha E; Zhang, Weiming; Yang, Ivana V; Ainsworth, Hannah C; Russell, Pamela H; Blumhagen, Rachel Z; Schwarz, Marvin I; Brown, Kevin K; Steele, Mark P; Loyd, James E; Cosgrove, Gregory P; Lynch, David A; Groshong, Steve; Collard, Harold R; Wolters, Paul J; Bradford, Williamson Z; Kossen, Karl; Seiwert, Scott D; du Bois, Roland M; Garcia, Christine Kim; Devine, Megan S; Gudmundsson, Gunnar; Isaksson, Helgi J; Kaminski, Naftali; Zhang, Yingze; Gibson, Kevin F; Lancaster, Lisa H; Maher, Toby M; Molyneaux, Philip L; Wells, Athol U; Moffatt, Miriam F; Selman, Moises; Pardo, Annie; Kim, Dong Soon; Crapo, James D; Make, Barry J; Regan, Elizabeth A; Walek, Dinesha S; Daniel, Jerry J; Kamatani, Yoichiro; Zelenika, Diana; Murphy, Elissa; Smith, Keith; McKean, David; Pedersen, Brent S; Talbert, Janet; Powers, Julia; Markin, Cheryl R; Beckman, Kenneth B; Lathrop, Mark.

BMC Genet ; 17(1): 74, 2016 06 07.

Article in English | MEDLINE | ID: mdl-27266705

ABSTRACT

BACKGROUND: Fibrotic idiopathic interstitial pneumonias (fIIP) are a group of fatal lung diseases with largely unknown etiology and without definitive treatment other than lung transplant to prolong life. There is strong evidence for the importance of both rare and common genetic risk alleles in familial and sporadic disease. We have previously used genome-wide single nucleotide polymorphism data to identify 10 risk loci for fIIP. Here we extend that work to imputed genome-wide genotypes and conduct new RNA sequencing studies of lung tissue to identify and characterize new fIIP risk loci. RESULTS: We performed genome-wide genotype imputation association analyses in 1616 non-Hispanic white (NHW) cases and 4683 NHW controls followed by validation and replication (878 cases, 2017 controls) genotyping and targeted gene expression in lung tissue. Following meta-analysis of the discovery and replication populations, we identified a novel fIIP locus in the HLA region of chromosome 6 (rs7887 P meta = 3.7 × 10(-09)). Imputation of classic HLA alleles identified two in high linkage disequilibrium that are associated with fIIP (DRB1*15:01 P = 1.3 × 10(-7) and DQB1*06:02 P = 6.1 × 10(-8)). Targeted RNA-sequencing of the HLA locus identified 21 genes differentially expressed between fibrotic and control lung tissue (Q < 0.001), many of which are involved in immune and inflammatory response regulation. In addition, the putative risk alleles, DRB1*15:01 and DQB1*06:02, are associated with expression of the DQB1 gene among fIIP cases (Q < 1 × 10(-16)). CONCLUSIONS: We have identified a genome-wide significant association between the HLA region and fIIP. Two HLA alleles are associated with fIIP and affect expression of HLA genes in lung tissue, indicating that the potential genetic risk due to HLA alleles may involve gene regulation in addition to altered protein structure. These studies reveal the importance of the HLA region for risk of fIIP and a basis for the potential etiologic role of auto-immunity in fIIP.

Subject(s)

Genome-Wide Association Study/methods , HLA-DQ beta-Chains/genetics , HLA-DRB1 Chains/genetics , Idiopathic Pulmonary Fibrosis/genetics , Pulmonary Fibrosis/genetics , Sequence Analysis, RNA/methods , Adult , Aged , Chromosomes, Human, Pair 6/genetics , Female , Gene Expression Profiling , Gene Expression Regulation , Genetic Loci , Genetic Predisposition to Disease , Humans , Linkage Disequilibrium , Male , Middle Aged

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL