Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-38113155

ABSTRACT

The rapid increase in high-throughput, complex, and heterogeneous data has led to the adoption of network-structured models and analyses for interpretation. However, these data are inherently complex and challenging to understand, prompting researchers to turn to graph embedding methods to facilitate analysis. While general network embedding techniques have shown promise in improving downstream prediction and classification tasks, real-world data are complicated due to cross-domain interactions between different types of entities. Multilayered networks have been successful in integrating biological data to represent biological systems' hierarchy, but embedding nodes based on different types of interactions remains an unsolved problem. To address this challenge, we propose the Motif-aware deep representation learning in multilayer (MARML) networks for learning network representations. Our method considers recurring motif patterns, topological information, and attributive information from other sources as node features. We validated the MARML method using various multilayer network datasets. In addition, by incorporating motif information, MARML considers higher order connections across different hierarchies. The learned features exhibited excellent accuracy in tasks related to link prediction and link differentiation, enabling us to distinguish between existing and disconnected triplets. Through the integration of both intrinsic node attributes and topological network structures, we enhance our understanding of complex biological systems.

2.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3117-3127, 2023.
Article in English | MEDLINE | ID: mdl-37379184

ABSTRACT

Breast cancer is a heterogeneous disease consisting of a diverse set of genomic mutations and clinical characteristics. The molecular subtypes of breast cancer are closely tied to prognosis and therapeutic treatment options. We investigate using deep graph learning on a collection of patient factors from multiple diagnostic disciplines to better represent breast cancer patient information and predict molecular subtype. Our method models breast cancer patient data into a multi-relational directed graph with extracted feature embeddings to directly represent patient information and diagnostic test results. We develop a radiographic image feature extraction pipeline to produce vector representation of breast cancer tumors in DCE-MRI and an autoencoder-based genomic variant embedding method to map variant assay results to a low-dimensional latent space. We leverage related-domain transfer learning to train and evaluate a Relational Graph Convolutional Network to predict the probabilities of molecular subtypes for individual breast cancer patient graphs. Our work found that utilizing information from multiple multimodal diagnostic disciplines improved the model's prediction results and produced more distinct learned feature representations for breast cancer patients. This research demonstrates the capabilities of graph neural networks and deep learning feature representation to perform multimodal data fusion and representation in the breast cancer domain.


Subject(s)
Breast Neoplasms , Deep Learning , Humans , Female , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/genetics , Breast , Mutation , Neural Networks, Computer
3.
IEEE/ACM Trans Comput Biol Bioinform ; 18(5): 1996-2007, 2021.
Article in English | MEDLINE | ID: mdl-31944984

ABSTRACT

Next-generation sequencing techniques provide us with an opportunity for generating sequenced proteins and identifying the biological families and functions of these proteins. However, compared with identified proteins, uncharacterized proteins consist of a notable percentage of the overall proteins in the bioinformatics research field. Traditional family classification methods often devote themselves to extracting N-Gram features from sequences while ignoring motif information as well as affinity information between motifs and adjacent amino acids. Previous clustering-based algorithms have typically been used to define protein features with domain knowledge and annotate protein families based on extensive data samples. In this paper, we apply CNN based amino acid representation learning with limited characterized proteins to explore the performances of annotated protein families by taking into account the amino acid location information. Additionally, we apply the method to all reviewed protein sequences with their families retrieved from the UniProt database to evaluate our approach. Last but not least, we verify our model using those unreviewed protein records, which is typically ignored by other methods.


Subject(s)
Deep Learning , Proteins , Sequence Analysis, Protein/methods , Algorithms , Amino Acid Sequence , Cluster Analysis , Computational Biology/methods , Humans , Proteins/chemistry , Proteins/classification , Proteins/genetics
4.
Bioanalysis ; 11(12): 1139-1155, 2019 Jun.
Article in English | MEDLINE | ID: mdl-31179719

ABSTRACT

Aim: The complications that arise when performing meta-analysis of datasets from multiple metabolomics studies are addressed with computational methods that ensure data quality, completeness of metadata and accurate interpretation across studies. Results & methodology: This paper presents an integrated system of quality control (QC) methods to assess metabolomics results by evaluating the data acquisition strategies and metabolite identification process when integrating datasets for meta-analysis. An ontology knowledge base and a rule-based system representing the experiment and chemical background information direct the processes involved in data integration and QC verification. A diabetes meta-analysis study using these QC methods finds putative biomarkers that differ between cohorts. Conclusion: The methods presented here ensure the validity of meta-analysis when integrating data from different metabolic profiling studies.


Subject(s)
Biological Ontologies , Data Analysis , Metabolomics/methods , Diabetes Mellitus/metabolism , Humans , Quality Control
5.
IEEE Trans Knowl Data Eng ; 21(3): 401-414, 2009 Mar 01.
Article in English | MEDLINE | ID: mdl-19915690

ABSTRACT

The SPARQL LeftJoin abstract operator is not distributive over Union; this limits the algebraic manipulation of graph patterns, which in turn restricts the ability to create query plans for distributed processing or query optimization. In this paper, we present semQA, an algebraic extension for the SPARQL query language for RDF, which overcomes this issue by transforming graph patterns through the use of an idempotent disjunction operator Or as a substitute for Union. This permits the application of a set of equivalences that transform a query into distinct forms. We further present an algorithm to derive the solution set of the original query from the solution set of a query where Union has been substituted by Or. We also analyze the combined complexity of SPARQL, proving it to be NP-complete. It is also shown that the SPARQL query language is not, in the general case, fixed-parameter tractable. Experimental results are presented to validate the query evaluation methodology presented in this paper against the SPARQL standard to corroborate the complexity analysis and to illustrate the gains in processing cost reduction that can be obtained through the application of semQA.

6.
Web Semant ; 7(3): 235-251, 2009 Sep 01.
Article in English | MEDLINE | ID: mdl-20186256

ABSTRACT

ASMOV (Automated Semantic Matching of Ontologies with Verification) is a novel algorithm that uses lexical and structural characteristics of two ontologies to iteratively calculate a similarity measure between them, derives an alignment, and then verifies it to ensure that it does not contain semantic inconsistencies. In this paper, we describe the ASMOV algorithm, and then present experimental results that measure its accuracy using the OAEI 2008 tests, and that evaluate its use with two different thesauri: WordNet, and the Unified Medical Language System (UMLS). These results show the increased accuracy obtained by combining lexical, structural and extensional matchers with semantic verification, and demonstrate the advantage of using a domain-specific thesaurus for the alignment of specialized ontologies.

7.
J Am Med Inform Assoc ; 15(4): 559-68, 2008.
Article in English | MEDLINE | ID: mdl-18436897

ABSTRACT

OBJECTIVES: To develop mechanisms to formulate queries over the semantic representation of cancer-related data services available through the cancer Biomedical Informatics Grid (caBIG). DESIGN: The semCDI query formulation uses a view of caBIG semantic concepts, metadata, and data as an ontology, and defines a methodology to specify queries using the SPARQL query language, extended with Horn rules. semCDI enables the joining of data that represent different concepts through associations modeled as object properties, and the merging of data representing the same concept in different sources through Common Data Elements (CDE) modeled as datatype properties, using Horn rules to specify additional semantics indicating conditions for merging data. Validation In order to validate this formulation, a prototype has been constructed, and two queries have been executed against currently available caBIG data services. DISCUSSION: The semCDI query formulation uses the rich semantic metadata available in caBIG to build queries and integrate data from multiple sources. Its promise will be further enhanced as more data services are registered in caBIG, and as more linkages can be achieved between the knowledge contained within caBIG's NCI Thesaurus and the data contained in the Data Services. CONCLUSION: semCDI provides a formulation for the creation of queries on the semantic representation of caBIG. This constitutes the foundation to build a semantic data integration system for more efficient and effective querying and exploratory searching of cancer-related data.


Subject(s)
Information Management/methods , Information Storage and Retrieval/methods , Information Systems/organization & administration , Medical Oncology/organization & administration , Semantics , Biomedical Research/organization & administration , Cancer Care Facilities/organization & administration , Computational Biology , Humans , Internet , Vocabulary, Controlled
8.
J Digit Imaging ; 16(4): 365-77, 2003 Dec.
Article in English | MEDLINE | ID: mdl-14752607

ABSTRACT

In this article we describe a statistical model that was developed to segment brain magnetic resonance images. The statistical segmentation algorithm was applied after a pre-processing stage involving the use of a 3D anisotropic filter along with histogram equalization techniques. The segmentation algorithm makes use of prior knowledge and a probability-based multivariate model designed to semi-automate the process of segmentation. The algorithm was applied to images obtained from the Center for Morphometric Analysis at Massachusetts General Hospital as part of the Internet Brain Segmentation Repository (IBSR). The developed algorithm showed improved accuracy over the k-means, adaptive Maximum Apriori Probability (MAP), biased MAP, and other algorithms. Experimental results showing the segmentation and the results of comparisons with other algorithms are provided. Results are based on an overlap criterion against expertly segmented images from the IBSR. The algorithm produced average results of approximately 80% overlap with the expertly segmented images (compared with 85% for manual segmentation and 55% for other algorithms).


Subject(s)
Imaging, Three-Dimensional/methods , Models, Statistical , Algorithms , Brain/diagnostic imaging , Humans , Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging , Multivariate Analysis , Pattern Recognition, Automated , Radiographic Image Enhancement , Radiographic Image Interpretation, Computer-Assisted/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...