Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Front Genet ; 12: 642991, 2021.
Article in English | MEDLINE | ID: mdl-33763122

ABSTRACT

The steady elaboration of the Metagenomic and Metadesign of Subways and Urban Biomes (MetaSUB) international consortium project raises important new questions about the origin, variation, and antimicrobial resistance of the collected samples. CAMDA (Critical Assessment of Massive Data Analysis, http://camda.info/) forum organizes annual challenges where different bioinformatics and statistical approaches are tested on samples collected around the world for bacterial classification and prediction of geographical origin. This work proposes a method which not only predicts the locations of unknown samples, but also estimates the relative risk of antimicrobial resistance through spatial modeling. We introduce a new component in the standard analysis as we apply a Bayesian spatial convolution model which accounts for spatial structure of the data as defined by the longitude and latitude of the samples and assess the relative risk of antimicrobial resistance taxa across regions which is relevant to public health. We can then use the estimated relative risk as a new measure for antimicrobial resistance. We also compare the performance of several machine learning methods, such as Gradient Boosting Machine, Random Forest, and Neural Network to predict the geographical origin of the mystery samples. All three methods show consistent results with some superiority of Random Forest classifier. In our future work we can consider a broader class of spatial models and incorporate covariates related to the environment and climate profiles of the samples to achieve more reliable estimation of the relative risk related to antimicrobial resistance.

2.
Biol Direct ; 14(1): 22, 2019 11 21.
Article in English | MEDLINE | ID: mdl-31752974

ABSTRACT

BACKGROUND: Recently high-throughput technologies have been massively used alongside clinical tests to study various types of cancer. Data generated in such large-scale studies are heterogeneous, of different types and formats. With lack of effective integration strategies novel models are necessary for efficient and operative data integration, where both clinical and molecular information can be effectively joined for storage, access and ease of use. Such models, combined with machine learning methods for accurate prediction of survival time in cancer studies, can yield novel insights into disease development and lead to precise personalized therapies. RESULTS: We developed an approach for intelligent data integration of two cancer datasets (breast cancer and neuroblastoma) - provided in the CAMDA 2018 'Cancer Data Integration Challenge', and compared models for prediction of survival time. We developed a novel semantic network-based data integration framework that utilizes NoSQL databases, where we combined clinical and expression profile data, using both raw data records and external knowledge sources. Utilizing the integrated data we introduced Tumor Integrated Clinical Feature (TICF) - a new feature for accurate prediction of patient survival time. Finally, we applied and validated several machine learning models for survival time prediction. CONCLUSION: We developed a framework for semantic integration of clinical and omics data that can borrow information across multiple cancer studies. By linking data with external domain knowledge sources our approach facilitates enrichment of the studied data by discovery of internal relations. The proposed and validated machine learning models for survival time prediction yielded accurate results. REVIEWERS: This article was reviewed by Eran Elhaik, Wenzhong Xiao and Carlos Loucera.


Subject(s)
Breast Neoplasms/epidemiology , DNA Copy Number Variations , Genome, Human , Neuroblastoma/epidemiology , Breast Neoplasms/genetics , Computational Biology/methods , Female , Gene Expression Regulation, Neoplastic , Humans , Male , Models, Genetic , Neuroblastoma/genetics , Survival Analysis
3.
Biol Direct ; 10: 43, 2015 Aug 19.
Article in English | MEDLINE | ID: mdl-26282399

ABSTRACT

High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.


Subject(s)
Computational Biology/methods , Electronic Data Processing/methods , Workflow , High-Throughput Nucleotide Sequencing , Reproducibility of Results
4.
Archaea ; 2014: 196140, 2014.
Article in English | MEDLINE | ID: mdl-24711725

ABSTRACT

Uranium mining and milling activities adversely affect the microbial populations of impacted sites. The negative effects of uranium on soil bacteria and fungi are well studied, but little is known about the effects of radionuclides and heavy metals on archaea. The composition and diversity of archaeal communities inhabiting the waste pile of the Sliven uranium mine and the soil of the Buhovo uranium mine were investigated using 16S rRNA gene retrieval. A total of 355 archaeal clones were selected, and their 16S rDNA inserts were analysed by restriction fragment length polymorphism (RFLP) discriminating 14 different RFLP types. All evaluated archaeal 16S rRNA gene sequences belong to the 1.1b/Nitrososphaera cluster of Crenarchaeota. The composition of the archaeal community is distinct for each site of interest and dependent on environmental characteristics, including pollution levels. Since the members of 1.1b/Nitrososphaera cluster have been implicated in the nitrogen cycle, the archaeal communities from these sites were probed for the presence of the ammonia monooxygenase gene (amoA). Our data indicate that amoA gene sequences are distributed in a similar manner as in Crenarchaeota, suggesting that archaeal nitrification processes in uranium mining-impacted locations are under the control of the same key factors controlling archaeal diversity.


Subject(s)
Crenarchaeota/classification , Crenarchaeota/genetics , Genetic Variation , Oxidoreductases/genetics , Phylogeny , Soil Microbiology , Bulgaria , Cluster Analysis , DNA, Archaeal/chemistry , DNA, Archaeal/genetics , DNA, Ribosomal/chemistry , DNA, Ribosomal/genetics , Molecular Sequence Data , Polymorphism, Restriction Fragment Length , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA
5.
J Integr Bioinform ; 10(2): 221, 2013 Apr 03.
Article in English | MEDLINE | ID: mdl-23549604

ABSTRACT

This paper presents a study in the domain of semi-automated and fully-automated ontology mapping. A process for inferring additional cross-ontology links within the domain of anatomical ontologies is presented and evaluated on pairs from three model organisms. The results of experiments performed with various external knowledge sources and scoring schemes are discussed.


Subject(s)
Anatomy/methods , Algorithms , Animals , Automation , Mice , Xenopus/anatomy & histology , Zebrafish/anatomy & histology
SELECTION OF CITATIONS
SEARCH DETAIL
...