Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Hum Genomics ; 18(1): 75, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38956648

RESUMO

BACKGROUND: Aging represents a significant risk factor for the occurrence of cerebral small vessel disease, associated with white matter (WM) lesions, and to age-related cognitive alterations, though the precise mechanisms remain largely unknown. This study aimed to investigate the impact of polygenic risk scores (PRS) for WM integrity, together with age-related DNA methylation, and gene expression alterations, on cognitive aging in a cross-sectional healthy aging cohort. The PRSs were calculated using genome-wide association study (GWAS) summary statistics for magnetic resonance imaging (MRI) markers of WM integrity, including WM hyperintensities, fractional anisotropy (FA), and mean diffusivity (MD). These scores were utilized to predict age-related cognitive changes and evaluate their correlation with structural brain changes, which distinguish individuals with higher and lower cognitive scores. To reduce the dimensionality of the data and identify age-related DNA methylation and transcriptomic alterations, Sparse Partial Least Squares-Discriminant Analysis (sPLS-DA) was used. Subsequently, a canonical correlation algorithm was used to integrate the three types of omics data (PRS, DNA methylation, and gene expression data) and identify an individual "omics" signature that distinguishes subjects with varying cognitive profiles. RESULTS: We found a positive association between MD-PRS and long-term memory, as well as a correlation between MD-PRS and structural brain changes, effectively discriminating between individuals with lower and higher memory scores. Furthermore, we observed an enrichment of polygenic signals in genes related to both vascular and non-vascular factors. Age-related alterations in DNA methylation and gene expression indicated dysregulation of critical molecular features and signaling pathways involved in aging and lifespan regulation. The integration of multi-omics data underscored the involvement of synaptic dysfunction, axonal degeneration, microtubule organization, and glycosylation in the process of cognitive aging. CONCLUSIONS: These findings provide valuable insights into the biological mechanisms underlying the association between WM coherence and cognitive aging. Additionally, they highlight how age-associated DNA methylation and gene expression changes contribute to cognitive aging.


Assuntos
Envelhecimento Cognitivo , Metilação de DNA , Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Metilação de DNA/genética , Feminino , Masculino , Herança Multifatorial/genética , Idoso , Pessoa de Meia-Idade , Estudos Transversais , Substância Branca/diagnóstico por imagem , Substância Branca/patologia , Fatores de Risco , Imageamento por Ressonância Magnética , Envelhecimento/genética , Envelhecimento/patologia , Encéfalo/diagnóstico por imagem , Encéfalo/metabolismo , Encéfalo/patologia , Estratificação de Risco Genético
2.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38783706

RESUMO

RNA Polymerase II (Pol II) transcriptional elongation pausing is an integral part of the dynamic regulation of gene transcription in the genome of metazoans. It plays a pivotal role in many vital biological processes and disease progression. However, experimentally measuring genome-wide Pol II pausing is technically challenging and the precise governing mechanism underlying this process is not fully understood. Here, we develop RP3 (RNA Polymerase II Pausing Prediction), a network regularized logistic regression machine learning method, to predict Pol II pausing events by integrating genome sequence, histone modification, gene expression, chromatin accessibility, and protein-protein interaction data. RP3 can accurately predict Pol II pausing in diverse cellular contexts and unveil the transcription factors that are associated with the Pol II pausing machinery. Furthermore, we utilize a forward feature selection framework to systematically identify the combination of histone modification signals associated with Pol II pausing. RP3 is freely available at https://github.com/AMSSwanglab/RP3.


Assuntos
Código das Histonas , RNA Polimerase II , RNA Polimerase II/metabolismo , Humanos , Elongação da Transcrição Genética , Cromatina/metabolismo , Cromatina/genética , Histonas/metabolismo , Aprendizado de Máquina , Animais
3.
Cell Rep Methods ; 4(6): 100781, 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38761803

RESUMO

We present an innovative strategy for integrating whole-genome-wide multi-omics data, which facilitates adaptive amalgamation by leveraging hidden layer features derived from high-dimensional omics data through a multi-task encoder. Empirical evaluations on eight benchmark cancer datasets substantiated that our proposed framework outstripped the comparative algorithms in cancer subtyping, delivering superior subtyping outcomes. Building upon these subtyping results, we establish a robust pipeline for identifying whole-genome-wide biomarkers, unearthing 195 significant biomarkers. Furthermore, we conduct an exhaustive analysis to assess the importance of each omic and non-coding region features at the whole-genome-wide level during cancer subtyping. Our investigation shows that both omics and non-coding region features substantially impact cancer development and survival prognosis. This study emphasizes the potential and practical implications of integrating genome-wide data in cancer research, demonstrating the potency of comprehensive genomic characterization. Additionally, our findings offer insightful perspectives for multi-omics analysis employing deep learning methodologies.


Assuntos
Biomarcadores Tumorais , Genômica , Neoplasias , Humanos , Neoplasias/genética , Neoplasias/classificação , Genômica/métodos , Biomarcadores Tumorais/genética , Algoritmos , Prognóstico , Estudo de Associação Genômica Ampla/métodos , Biologia Computacional/métodos , Genoma Humano/genética , Multiômica
4.
Brief Funct Genomics ; 23(5): 549-560, 2024 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-38600757

RESUMO

Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact:  anirban@klyuniv.ac.in.


Assuntos
Genômica , Aprendizado de Máquina , Medicina de Precisão , Humanos , Medicina de Precisão/métodos , Genômica/métodos , Metabolômica/métodos , Proteômica/métodos , Oncologia/métodos , Neoplasias/genética , Neoplasias/metabolismo , Biologia Computacional/métodos , Multiômica
5.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38678587

RESUMO

Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.


Assuntos
Biomarcadores Tumorais , Aprendizado Profundo , Recidiva Local de Neoplasia , Humanos , Biomarcadores Tumorais/metabolismo , Biomarcadores Tumorais/genética , Recidiva Local de Neoplasia/metabolismo , Recidiva Local de Neoplasia/genética , Biologia Computacional/métodos , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/patologia , Genômica/métodos , Multiômica
6.
Cardiovasc Res ; 120(8): 927-942, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38661182

RESUMO

AIMS: In patients with heart failure (HF), concomitant sinus node dysfunction (SND) is an important predictor of mortality, yet its molecular underpinnings are poorly understood. Using proteomics, this study aimed to dissect the protein and phosphorylation remodelling within the sinus node in an animal model of HF with concurrent SND. METHODS AND RESULTS: We acquired deep sinus node proteomes and phosphoproteomes in mice with heart failure and SND and report extensive remodelling. Intersecting the measured (phospho)proteome changes with human genomics pharmacovigilance data, highlighted downregulated proteins involved in electrical activity such as the pacemaker ion channel, Hcn4. We confirmed the importance of ion channel downregulation for sinus node physiology using computer modelling. Guided by the proteomics data, we hypothesized that an inflammatory response may drive the electrophysiological remodeling underlying SND in heart failure. In support of this, experimentally induced inflammation downregulated Hcn4 and slowed pacemaking in the isolated sinus node. From the proteomics data we identified proinflammatory cytokine-like protein galectin-3 as a potential target to mitigate the effect. Indeed, in vivo suppression of galectin-3 in the animal model of heart failure prevented SND. CONCLUSION: Collectively, we outline the protein and phosphorylation remodeling of SND in heart failure, we highlight a role for inflammation in electrophysiological remodelling of the sinus node, and we present galectin-3 signalling as a target to ameliorate SND in heart failure.


Assuntos
Modelos Animais de Doenças , Insuficiência Cardíaca , Canais Disparados por Nucleotídeos Cíclicos Ativados por Hiperpolarização , Camundongos Endogâmicos C57BL , Proteômica , Síndrome do Nó Sinusal , Nó Sinoatrial , Animais , Insuficiência Cardíaca/metabolismo , Insuficiência Cardíaca/fisiopatologia , Insuficiência Cardíaca/genética , Insuficiência Cardíaca/patologia , Canais Disparados por Nucleotídeos Cíclicos Ativados por Hiperpolarização/metabolismo , Canais Disparados por Nucleotídeos Cíclicos Ativados por Hiperpolarização/genética , Nó Sinoatrial/metabolismo , Nó Sinoatrial/fisiopatologia , Fosforilação , Síndrome do Nó Sinusal/metabolismo , Síndrome do Nó Sinusal/fisiopatologia , Síndrome do Nó Sinusal/genética , Masculino , Mediadores da Inflamação/metabolismo , Inflamação/metabolismo , Inflamação/fisiopatologia , Inflamação/patologia , Frequência Cardíaca , Canais de Potássio/metabolismo , Canais de Potássio/genética , Simulação por Computador , Modelos Cardiovasculares , Humanos , Transdução de Sinais , Potenciais de Ação
7.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38426322

RESUMO

Cancer is a complex and high-mortality disease regulated by multiple factors. Accurate cancer subtyping is crucial for formulating personalized treatment plans and improving patient survival rates. The underlying mechanisms that drive cancer progression can be comprehensively understood by analyzing multi-omics data. However, the high noise levels in omics data often pose challenges in capturing consistent representations and adequately integrating their information. This paper proposed a novel variational autoencoder-based deep learning model, named Deeply Integrating Latent Consistent Representations (DILCR). Firstly, multiple independent variational autoencoders and contrastive loss functions were designed to separate noise from omics data and capture latent consistent representations. Subsequently, an Attention Deep Integration Network was proposed to integrate consistent representations across different omics levels effectively. Additionally, we introduced the Improved Deep Embedded Clustering algorithm to make integrated variable clustering friendly. The effectiveness of DILCR was evaluated using 10 typical cancer datasets from The Cancer Genome Atlas and compared with 14 state-of-the-art integration methods. The results demonstrated that DILCR effectively captures the consistent representations in omics data and outperforms other integration methods in cancer subtyping. In the Kidney Renal Clear Cell Carcinoma case study, cancer subtypes were identified by DILCR with significant biological significance and interpretability.


Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Neoplasias , Humanos , Multiômica , Neoplasias/genética , Carcinoma de Células Renais/genética , Algoritmos , Análise por Conglomerados , Neoplasias Renais/genética
8.
Biophys Rev ; 16(1): 13-28, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38495443

RESUMO

With the rapid advance of single-cell sequencing technology, cell heterogeneity in various biological processes was dissected at different omics levels. However, single-cell mono-omics results in fragmentation of information and could not provide complete cell states. In the past several years, a variety of single-cell multimodal omics technologies have been developed to jointly profile multiple molecular modalities, including genome, transcriptome, epigenome, and proteome, from the same single cell. With the availability of single-cell multimodal omics data, we can simultaneously investigate the effects of genomic mutation or epigenetic modification on transcription and translation, and reveal the potential mechanisms underlying disease pathogenesis. Driven by the massive single-cell omics data, the integration method of single-cell multi-omics data has rapidly developed. Integration of the massive multi-omics single-cell data in public databases in the future will make it possible to construct a cell atlas of multi-omics, enabling us to comprehensively understand cell state and gene regulation at single-cell resolution. In this review, we summarized the experimental methods for single-cell multimodal omics data and computational methods for multi-omics data integration. We also discussed the future development of this field.

9.
BMC Bioinformatics ; 25(1): 132, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38539064

RESUMO

BACKGROUND: Classifying breast cancer subtypes is crucial for clinical diagnosis and treatment. However, the early symptoms of breast cancer may not be apparent. Rapid advances in high-throughput sequencing technology have led to generating large number of multi-omics biological data. Leveraging and integrating the available multi-omics data can effectively enhance the accuracy of identifying breast cancer subtypes. However, few efforts focus on identifying the associations of different omics data to predict the breast cancer subtypes. RESULTS: In this paper, we propose a differential sparse canonical correlation analysis network (DSCCN) for classifying the breast cancer subtypes. DSCCN performs differential analysis on multi-omics expression data to identify differentially expressed (DE) genes and adopts sparse canonical correlation analysis (SCCA) to mine highly correlated features between multi-omics DE-genes. Meanwhile, DSCCN uses multi-task deep learning neural network separately to train the correlated DE-genes to predict breast cancer subtypes, which spontaneously tackle the data heterogeneity problem in integrating multi-omics data. CONCLUSIONS: The experimental results show that by mining the associations among multi-omics data, DSCCN is more capable of accurately classifying breast cancer subtypes than the existing methods.


Assuntos
Neoplasias da Mama , Aprendizado Profundo , Humanos , Feminino , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Multiômica , Análise de Correlação Canônica
10.
Int J Cancer ; 155(2): 282-297, 2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-38489486

RESUMO

Aberrant DNA methylation is a hallmark of many cancer types. Despite our knowledge of epigenetic and transcriptomic alterations in lung adenocarcinoma (LUAD), we lack robust multi-modal molecular classifications for patient stratification. This is partly because the impact of epigenetic alterations on lung cancer development and progression is still not fully understood. To that end, we identified disease-associated processes under epigenetic regulation in LUAD. We performed a genome-wide expression-methylation Quantitative Trait Loci (emQTL) analysis by integrating DNA methylation and gene expression data from 453 patients in the TCGA cohort. Using a community detection algorithm, we identified distinct communities of CpG-gene associations with diverse biological processes. Interestingly, we identified a community linked to hormone response and lipid metabolism; the identified CpGs in this community were enriched in enhancer regions and binding regions of transcription factors such as FOXA1/2, GRHL2, HNF1B, AR, and ESR1. Furthermore, the CpGs were connected to their associated genes through chromatin interaction loops. These findings suggest that the expression of genes involved in hormone response and lipid metabolism in LUAD is epigenetically regulated through DNA methylation and enhancer-promoter interactions. By applying consensus clustering on the integrated expression-methylation pattern of the emQTL-genes and CpGs linked to hormone response and lipid metabolism, we further identified subclasses of patients with distinct prognoses. This novel patient stratification was validated in an independent patient cohort of 135 patients and showed increased prognostic significance compared to previously defined molecular subtypes.


Assuntos
Adenocarcinoma de Pulmão , Ilhas de CpG , Metilação de DNA , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Neoplasias Pulmonares , Locos de Características Quantitativas , Humanos , Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/patologia , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Ilhas de CpG/genética , Feminino , Masculino , Adenocarcinoma/genética , Adenocarcinoma/patologia , Perfilação da Expressão Gênica/métodos , Multiômica
11.
Heliyon ; 10(1): e23195, 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38163104

RESUMO

Aims: The multi-omics data integration has emerged as a prominent avenue within the healthcare industry, presenting substantial potential for enhancing predictive models. The main motivation behind this study stems from the imperative need to advance prognostic methodologies in cancer diagnosis, an area where precision is pivotal for effective clinical decision-making. In this context, the present study introduces an innovative methodology that integrates copy number alteration (CNA), DNA methylation, and gene expression data. Methods: The three omics data were successfully merged into a two-dimensional (2D) map using the PaCMAP dimensionality reduction technique. Utilizing the RGB coloring scheme, a visual representation of the integration was produced utilizing the values of the three omics of each sample. Then, the colored 2D maps were fed into a convolutional neural network (CNN) to forecast the Gleason score. Results: Our proposed model outperforms the cutting-edge i-SOM-GSN model by integrating multi-omics data and the CNN architecture with an accuracy of 98.89, and AUC of 0.9996. Conclusion: This study demonstrates the effectiveness of multi-omics data integration in predicting health outcomes. The proposed methodology, combining PaCMAP for dimensionality reduction, RGB coloring for visualization, and CNN for prediction, offers a comprehensive framework for integrating heterogeneous omics data and improving predictive accuracy. These findings contribute to the advancement of personalized medicine and have the potential to aid in clinical decision-making for prostate cancer patients.

12.
EMBO Rep ; 25(1): 254-285, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38177910

RESUMO

Midbrain dopaminergic neurons (mDANs) control voluntary movement, cognition, and reward behavior under physiological conditions and are implicated in human diseases such as Parkinson's disease (PD). Many transcription factors (TFs) controlling human mDAN differentiation during development have been described, but much of the regulatory landscape remains undefined. Using a tyrosine hydroxylase (TH) human iPSC reporter line, we here generate time series transcriptomic and epigenomic profiles of purified mDANs during differentiation. Integrative analysis predicts novel regulators of mDAN differentiation and super-enhancers are used to identify key TFs. We find LBX1, NHLH1 and NR2F1/2 to promote mDAN differentiation and show that overexpression of either LBX1 or NHLH1 can also improve mDAN specification. A more detailed investigation of TF targets reveals that NHLH1 promotes the induction of neuronal miR-124, LBX1 regulates cholesterol biosynthesis, and NR2F1/2 controls neuronal activity.


Assuntos
Neurônios Dopaminérgicos , Células-Tronco Pluripotentes Induzidas , Humanos , Neurônios Dopaminérgicos/metabolismo , Multiômica , Mesencéfalo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Células-Tronco Pluripotentes Induzidas/metabolismo , Diferenciação Celular/genética , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética
13.
J Cancer Res Clin Oncol ; 149(17): 15923-15938, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37673824

RESUMO

PURPOSE: Skin cutaneous melanoma (SKCM) is a highly aggressive melanocytic carcinoma whose high heterogeneity and complex etiology make its prognosis difficult to predict. This study aimed to construct a risk subtype typing model for SKCM. METHODS: The study proposes a deep learning framework combining early fusion feature autoencoder (AE) and late fusion feature AE for risk subtype prediction of SKCM. The deep learning framework integrates mRNA, miRNA, and DNA methylation data of SKCM patients from The Cancer Genome Atlas (TCGA), and clusters the screened multi-omics features associated with survival prognosis to identify risk subtypes. Differential expression analysis and functional enrichment analysis were performed between risk subtypes, while SVM classifiers were constructed between differentially expressed genes (DEGs) obtained by Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression screening and risk subtype labels inferred from multi-omics data, and the predictive robustness of risk subtypes inferred from the risk subtype classification prediction model was validated using two independent datasets. RESULTS: The deep learning framework that combined early fusion feature AE with late fusion feature AE distinguished the two best risk subtypes compared to the multi-omics integration approach with single strategy AE or PCA. A promising C-index (C-index = 0.748) and a significant difference in survival (log-rank P value = 4.61 × 10-9) were found between the identified risk subtypes. The DEGs with the top significance values together with differentially expressed miRNAs provided the biological interpretation of risk subtypes on SKCM. Finally, the framework was applied to predict risk subtypes in two independent test datasets of SKCM patients, all of which showed good predictive power (C-index > 0.680) and significant survival differences (log-rank P value < 0.01). CONCLUSION: The SKCM risk subtypes identified by integrating multi-omics data based on deep learning can not only improve the understanding of the molecular mechanisms of SKCM, but also provide clinicians with assistance in treatment decisions.


Assuntos
Aprendizado Profundo , Melanoma , MicroRNAs , Neoplasias Cutâneas , Humanos , Melanoma/genética , Neoplasias Cutâneas/genética , Multiômica , MicroRNAs/genética , Medição de Risco , Melanoma Maligno Cutâneo
14.
BMC Med Inform Decis Mak ; 23(1): 82, 2023 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-37147619

RESUMO

BACKGROUND: Accurately classifying complex diseases is crucial for diagnosis and personalized treatment. Integrating multi-omics data has been demonstrated to enhance the accuracy of analyzing and classifying complex diseases. This can be attributed to the highly correlated nature of the data with various diseases, as well as the comprehensive and complementary information it provides. However, integrating multi-omics data for complex diseases is challenged by data characteristics such as high imbalance, scale variation, heterogeneity, and noise interference. These challenges further emphasize the importance of developing effective methods for multi-omics data integration. RESULTS: We proposed a novel multi-omics data learning model called MODILM, which integrates multiple omics data to improve the classification accuracy of complex diseases by obtaining more significant and complementary information from different single-omics data. Our approach includes four key steps: 1) constructing a similarity network for each omics data using the cosine similarity measure, 2) leveraging Graph Attention Networks to learn sample-specific and intra-association features from similarity networks for single-omics data, 3) using Multilayer Perceptron networks to map learned features to a new feature space, thereby strengthening and extracting high-level omics-specific features, and 4) fusing these high-level features using a View Correlation Discovery Network to learn cross-omics features in the label space, which results in unique class-level distinctiveness for complex diseases. To demonstrate the effectiveness of MODILM, we conducted experiments on six benchmark datasets consisting of miRNA expression, mRNA, and DNA methylation data. Our results show that MODILM outperforms state-of-the-art methods, effectively improving the accuracy of complex disease classification. CONCLUSIONS: Our MODILM provides a more competitive way to extract and integrate important and complementary information from multiple omics data, providing a very promising tool for supporting decision-making for clinical diagnosis.


Assuntos
MicroRNAs , Multiômica , Humanos , Algoritmos , MicroRNAs/genética , Redes Neurais de Computação , Metilação de DNA
15.
Methods Mol Biol ; 2660: 137-148, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37191795

RESUMO

Mass spectrometry (MS) is an important tool for biological studies because it is capable of interrogating a diversity of biomolecules (proteins, drugs, metabolites) not captured via alternate genomic platforms. Unfortunately, downstream data analysis becomes complicated when attempting to evaluate and integrate measurements of different molecular classes and requires the aggregation of expertise from different relevant disciplines. This complexity represents a significant bottleneck that limits the routine deployment of MS-based multi-omic methods, despite the unmatched biological and functional insight the data can provide. To address this unmet need, our group introduced Omics Notebook as an open-source framework for facilitating exploratory analysis, reporting and integrating MS-based multi-omic data in a way that is automated, reproducible and customizable. By deploying this pipeline, we have devised a framework for researchers to more rapidly identify functional patterns across complex data types and focus on statistically significant and biologically interesting aspects of their multi-omic profiling experiments. This chapter aims to describe a protocol which leverages our publicly accessible tools to analyze and integrate data from high-throughput proteomics and metabolomics experiments and produce reports that will facilitate more impactful research, cross-institutional collaborations, and wider data dissemination.


Assuntos
Proteômica , Software , Proteômica/métodos , Metabolômica/métodos , Genômica , Redes e Vias Metabólicas
17.
Comput Biol Med ; 153: 106545, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36646024

RESUMO

Screening cancer genomes has provided an in-depth characterization of genetic variants such as copy number variations (CNVs) and gene expression changes of non-coding transcripts. Single-dimensional experiments are often designed to differentiate a patient cohort into various sets with the aim of identifying molecular changes among groups; however, this may be inadequate to decipher the causal relationship between molecular signatures in individual patients. To overcome this challenge with respect to personalized medicine, we implemented a patient-specific multi-dimensional integrative approach to uncover coherent signals from multiple independent platforms. In particular, we focused on the consistent gene dosage effects of CNVs for both mRNA and long non-coding RNA (lncRNA) expression in nine colorectal cancer patients. We identified 511 CNV-lncRNA-mRNA regulatory triplets associated with CNVs and aberrant expression of both mRNAs and lncRNAs. By filtering out inconsistent changes among CNVs, mRNAs, and lncRNAs, we further characterized 165 coherent motifs associated with 56 genes. In total, 108 motifs were linked with 31 copy number gains, 44 upregulated lncRNAs, and 45 upregulated mRNAs. Another 57 coherent downregulated motifs were also collected. We discuss how for many of these CNV-lncRNA-mRNA regulatory triplets, their clinical impact remains to be explored, including survival time, microsatellite instability, tumor stage, and primary tumor sites. By validating two example CNV-lncRNA-mRNA triplets with up- and down-regulation, we confirmed that individual variations in multiple dimensions are a robust tool to identify reliable molecular signals for personalized medicine. In summary, we utilized a patient-specific computational pipeline to explore the consistent CNV-driven motifs consisting of lncRNAs and mRNAs. We also identified LSM14B as a potential promoter in colorectal cancer progression, suggesting that it may serve as a target for colorectal cancer treatment.


Assuntos
Neoplasias Colorretais , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Variações do Número de Cópias de DNA/genética , Transcriptoma , RNA Mensageiro/genética , Perfilação da Expressão Gênica/métodos , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Redes Reguladoras de Genes
18.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36433785

RESUMO

Differentiating cancer subtypes is crucial to guide personalized treatment and improve the prognosis for patients. Integrating multi-omics data can offer a comprehensive landscape of cancer biological process and provide promising ways for cancer diagnosis and treatment. Taking the heterogeneity of different omics data types into account, we propose a hierarchical multi-kernel learning (hMKL) approach, a novel cancer molecular subtyping method to identify cancer subtypes by adopting a two-stage kernel learning strategy. In stage 1, we obtain a composite kernel borrowing the cancer integration via multi-kernel learning (CIMLR) idea by optimizing the kernel parameters for individual omics data type. In stage 2, we obtain a final fused kernel through a weighted linear combination of individual kernels learned from stage 1 using an unsupervised multiple kernel learning method. Based on the final fusion kernel, k-means clustering is applied to identify cancer subtypes. Simulation studies show that hMKL outperforms the one-stage CIMLR method when there is data heterogeneity. hMKL can estimate the number of clusters correctly, which is the key challenge in subtyping. Application to two real data sets shows that hMKL identified meaningful subtypes and key cancer-associated biomarkers. The proposed method provides a novel toolkit for heterogeneous multi-omics data integration and cancer subtypes identification.


Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Multiômica , Neoplasias/genética , Análise por Conglomerados , Simulação por Computador , Biomarcadores Tumorais/genética
19.
BMC Genomics ; 23(1): 819, 2022 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-36496393

RESUMO

BACKGROUND: As omics measurements profiled on different molecular layers are interconnected, integrative approaches that incorporate the regulatory effect from multi-level omics data are needed. When the multi-level omics data are from the same individuals, gene expression (GE) clusters can be identified using information from regulators like genetic variants and DNA methylation. When the multi-level omics data are from different individuals, the choice of integration approaches is limited. METHODS: We developed an approach to improve GE clustering from microarray data by integrating regulatory data from different but partially overlapping sets of individuals. We achieve this through (1) decomposing gene expression into the regulated component and the other component that is not regulated by measured factors, (2) optimizing the clustering goodness-of-fit objective function. We do not require the availability of different omics measurements on all individuals. A certain amount of individual overlap between GE data and the regulatory data is adequate for modeling the regulation, thus improving GE clustering. RESULTS: A simulation study shows that the performance of the proposed approach depends on the strength of the GE-regulator relationship, degree of missingness, data dimensionality, sample size, and the number of clusters. Across the various simulation settings, the proposed method shows competitive performance in terms of accuracy compared to the alternative K-means clustering method, especially when the clustering structure is due mostly to the regulated component, rather than the unregulated component. We further validate the approach with an application to 8,902 Framingham Heart Study participants with data on up to 17,873 genes and regulation information of DNA methylation and genotype from different but partially overlapping sets of participants. We identify clustering structures of genes associated with pulmonary function while incorporating the predicted regulation effect from the measured regulators. We further investigate the over-representation of these GE clusters in pathways of other diseases that may be related to lung function and respiratory health. CONCLUSION: We propose a novel approach for clustering GE with the assistance of regulatory data that allowed for different but partially overlapping sets of individuals to be included in different omics data.


Assuntos
Metilação de DNA , Genômica , Humanos , Genômica/métodos , Análise por Conglomerados , Tamanho da Amostra , Expressão Gênica
20.
Cancer Inform ; 21: 11769351221124205, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36187912

RESUMO

Introduction: Multi-omics data integration facilitates collecting richer understanding and perceptions than separate omics data. Various promising integrative approaches have been utilized to analyze multi-omics data for biomedical applications, including disease prediction and disease subtypes, biomarker prediction, and others. Methods: In this paper, we introduce a multi-omics data integration method that is constructed using the combination of gene similarity network (GSN) based on uniform manifold approximation and projection (UMAP) and convolutional neural networks (CNNs). The method utilizes UMAP to embed gene expression, DNA methylation, and copy number alteration (CNA) to a lower dimension creating two-dimensional RGB images. Gene expression is used as a reference to construct the GSN and then integrate other omics data with the gene expression for better prediction. We used CNNs to predict the Gleason score levels of prostate cancer patients and the tumor stage in breast cancer patients. Results: The model proposed near perfection with accuracy above 99% with all other performance measurements at the same level. The proposed model outperformed the state-of-art iSOM-GSN model that constructs the GSN map based on the self-organizing map. Conclusion: The results show that UMAP as an embedding technique can better integrate multi-omics maps into the prediction model than SOM. The proposed model can also be applied to build a multi-omics prediction model for other types of cancer.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA