Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Comput Struct Biotechnol J ; 23: 2267-2276, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38827228

ABSTRACT

Machine Learning (ML) algorithms have been important tools for the extraction of useful knowledge from biological sequences, particularly in healthcare, agriculture, and the environment. However, the categorical and unstructured nature of these sequences requiring usually additional feature engineering steps, before an ML algorithm can be efficiently applied. The addition of these steps to the ML algorithm creates a processing pipeline, known as end-to-end ML. Despite the excellent results obtained by applying end-to-end ML to biotechnology problems, the performance obtained depends on the expertise of the user in the components of the pipeline. In this work, we propose an end-to-end ML-based framework called BioPrediction-RPI, which can identify implicit interactions between sequences, such as pairs of non-coding RNA and proteins, without the need for specialized expertise in end-to-end ML. This framework applies feature engineering to represent each sequence by structural and topological features. These features are divided into feature groups and used to train partial models, whose partial decisions are combined into a final decision, which, provides insights to the user by giving an interpretability report. In our experiments, the developed framework was competitive when compared with various expert-created models. We assessed BioPrediction-RPI with 12 datasets when it presented equal or better performance than all tools in 40% to 100% of cases, depending on the experiment. Finally, BioPrediction-RPI can fine-tune models based on new data and perform at the same level as ML experts, democratizing end-to-end ML and increasing its access to those working in biological sciences.

2.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35753697

ABSTRACT

Recent technological advances have led to an exponential expansion of biological sequence data and extraction of meaningful information through Machine Learning (ML) algorithms. This knowledge has improved the understanding of mechanisms related to several fatal diseases, e.g. Cancer and coronavirus disease 2019, helping to develop innovative solutions, such as CRISPR-based gene editing, coronavirus vaccine and precision medicine. These advances benefit our society and economy, directly impacting people's lives in various areas, such as health care, drug discovery, forensic analysis and food processing. Nevertheless, ML-based approaches to biological data require representative, quantitative and informative features. Many ML algorithms can handle only numerical data, and therefore sequences need to be translated into a numerical feature vector. This process, known as feature extraction, is a fundamental step for developing high-quality ML-based models in bioinformatics, by allowing the feature engineering stage, with design and selection of suitable features. Feature engineering, ML algorithm selection and hyperparameter tuning are often manual and time-consuming processes, requiring extensive domain knowledge. To deal with this problem, we present a new package: BioAutoML. BioAutoML automatically runs an end-to-end ML pipeline, extracting numerical and informative features from biological sequence databases, using the MathFeature package, and automating the feature selection, ML algorithm(s) recommendation and tuning of the selected algorithm(s) hyperparameters, using Automated ML (AutoML). BioAutoML has two components, divided into four modules: (1) automated feature engineering (feature extraction and selection modules) and (2) Metalearning (algorithm recommendation and hyper-parameter tuning modules). We experimentally evaluate BioAutoML in two different scenarios: (i) prediction of the three main classes of noncoding RNAs (ncRNAs) and (ii) prediction of the eight categories of ncRNAs in bacteria, including housekeeping and regulatory types. To assess BioAutoML predictive performance, it is experimentally compared with two other AutoML tools (RECIPE and TPOT). According to the experimental results, BioAutoML can accelerate new studies, reducing the cost of feature engineering processing and either keeping or improving predictive performance. BioAutoML is freely available at https://github.com/Bonidia/BioAutoML.


Subject(s)
COVID-19 Vaccines , COVID-19 , Algorithms , Bacteria/genetics , Humans , Machine Learning
3.
J Appl Microbiol ; 132(4): 2795-2811, 2022 Apr.
Article in English | MEDLINE | ID: mdl-34995421

ABSTRACT

AIMS: How benzene is metabolized by microbes under anoxic conditions is not fully understood. Here, we studied the degradation pathways in a benzene-mineralizing, nitrate-reducing enrichment culture. METHODS AND RESULTS: Benzene mineralization was dependent on the presence of nitrate and correlated to the enrichment of a Peptococcaceae phylotype only distantly related to known anaerobic benzene degraders of this family. Its relative abundance decreased after benzene mineralization had terminated, while other abundant taxa-Ignavibacteriaceae, Rhodanobacteraceae and Brocadiaceae-slightly increased. Generally, the microbial community remained diverse despite the amendment of benzene as single organic carbon source, suggesting complex trophic interactions between different functional groups. A subunit of the putative anaerobic benzene carboxylase previously detected in Peptococcaceae was identified by metaproteomic analysis suggesting that benzene was activated by carboxylation. Detection of proteins involved in anaerobic ammonium oxidation (anammox) indicates that benzene mineralization was accompanied by anammox, facilitated by nitrite accumulation and the presence of ammonium in the growth medium. CONCLUSIONS: The results suggest that benzene was activated by carboxylation and further assimilated by a novel Peptococcaceae phylotype. SIGNIFICANCE AND IMPACT OF THE STUDY: The results confirm the hypothesis that Peptococcaceae are important anaerobic benzene degraders.


Subject(s)
Microbiota , Nitrates , Anaerobiosis , Benzene/metabolism , Nitrates/metabolism , Oxidation-Reduction , Peptococcaceae/metabolism
4.
Mol Ecol Resour ; 21(1): 110-121, 2021 Jan.
Article in English | MEDLINE | ID: mdl-32866335

ABSTRACT

Plasmid transfers among bacterial populations can directly influence the ecological adaptation of these populations and their interactions with host species and environment. In this study, we developed a selective multiply-primed rolling circle amplification (smRCA) approach to enrich and characterize circular plasmid DNA from sponge microbial symbionts via high-throughput sequencing (HTS). DNA (plasmid and total community DNA) obtained from sponge (Cinachyrella sp.) samples and a bacterial symbiont (Vibrio sp. CyArs1) isolated from the same sponge species (carrying unknown plasmids) were used to develop and validate our methodology. The smRCA was performed during 16 hr with 141 plasmid-specific primers covering all known circular plasmid groups. The amplified products were purified and subjected to a reamplification with random hexamer primers (2 hr) and then sequenced using Illumina MiSeq. The developed method resulted in the successful amplification and characterization of the sponge plasmidome and allowed us to detect plasmids associated with the bacterial symbiont Vibrio sp. CyArs1 in the sponge host. In addition to this, a large number of small (<2 kbp) and cryptic plasmids were also amplified in sponge samples. Functional analysis identified proteins involved in the control of plasmid partitioning, maintenance and replication. However, most plasmids contained unknown genes, which could potentially serve as a resource of unknown genetic information and novel replication systems. Overall, our results indicate that the smRCA-HTS approach developed here was able to selectively enrich and characterize plasmids from bacterial isolates and sponge host microbial communities, including plasmids larger than 20 kbp.


Subject(s)
Bacteria/classification , DNA, Circular , Nucleic Acid Amplification Techniques , Plasmids/genetics , Porifera/microbiology , Animals , Base Sequence , DNA Primers , DNA, Bacterial/genetics , DNA, Circular/genetics
5.
Sci Rep ; 9(1): 1999, 2019 02 13.
Article in English | MEDLINE | ID: mdl-30760820

ABSTRACT

Marine sponges are early-branching, filter-feeding metazoans that usually host complex microbiomes comprised of several, currently uncultivatable symbiotic lineages. Here, we use a low-carbon based strategy to cultivate low-abundance bacteria from Spongia officinalis. This approach favoured the growth of Alphaproteobacteria strains in the genera Anderseniella, Erythrobacter, Labrenzia, Loktanella, Ruegeria, Sphingorhabdus, Tateyamaria and Pseudovibrio, besides two likely new genera in the Rhodobacteraceae family. Mapping of complete genomes against the metagenomes of S. officinalis, seawater, and sediments confirmed the rare status of all the above-mentioned lineages in the marine realm. Remarkably, this community of low-abundance Alphaproteobacteria possesses several genomic attributes common to dominant, presently uncultivatable sponge symbionts, potentially contributing to host fitness through detoxification mechanisms (e.g. heavy metal and metabolic waste removal, degradation of aromatic compounds), provision of essential vitamins (e.g. B6 and B12 biosynthesis), nutritional exchange (especially regarding the processing of organic sulphur and nitrogen) and chemical defence (e.g. polyketide and terpenoid biosynthesis). None of the studied taxa displayed signs of genome reduction, indicative of obligate mutualism. Instead, versatile nutrient metabolisms along with motility, chemotaxis, and tight-adherence capacities - also known to confer environmental hardiness - were inferred, underlying dual host-associated and free-living life strategies adopted by these diverse sponge-associated Alphaproteobacteria.


Subject(s)
Alphaproteobacteria/growth & development , Alphaproteobacteria/genetics , Genome, Bacterial/genetics , Porifera/microbiology , Symbiosis/genetics , Alphaproteobacteria/classification , Alphaproteobacteria/isolation & purification , Animals , Biodegradation, Environmental , Drug Resistance, Bacterial/genetics , Microbiota/genetics , RNA, Ribosomal, 16S/genetics , Seawater/microbiology
SELECTION OF CITATIONS
SEARCH DETAIL
...