Búsqueda | Portal Regional de la BVS

Feature extraction approaches for biological sequences: a comparative study of mathematical features.

Bonidia, Robson P; Sampaio, Lucas D H; Domingues, Douglas S; Paschoal, Alexandre R; Lopes, Fabrício M; de Carvalho, André C P L F; Sanches, Danilo S.

Brief Bioinform ; 22(5)2021 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-33585910

RESUMEN

As consequence of the various genomic sequencing projects, an increasing volume of biological sequence data is being produced. Although machine learning algorithms have been successfully applied to a large number of genomic sequence-related problems, the results are largely affected by the type and number of features extracted. This effect has motivated new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biological set is challenging. Considering this, our work proposes a new study of feature extraction approaches based on mathematical features (numerical mapping with Fourier, entropy and complex networks). As a case study, we analyze long non-coding RNA sequences. Moreover, we separated this work into three studies. First, we assessed our proposal with the most addressed problem in our review, e.g. lncRNA and mRNA; second, we also validate the mathematical features in different classification problems, to predict the class of lncRNA, e.g. circular RNAs sequences; third, we analyze its robustness in scenarios with imbalanced data. The experimental results demonstrated three main contributions: first, an in-depth study of several mathematical features; second, a new feature extraction pipeline; and third, its high performance and robustness for distinct RNA sequence classification. Availability:https://github.com/Bonidia/FeatureExtraction_BiologicalSequences.

Asunto(s)

Biología Computacional/métodos , Aprendizaje Profundo , Modelos Teóricos , ARN Circular/genética , ARN Largo no Codificante/genética , ARN Mensajero/genética , Secuencia de Bases/genética , Entropía , Análisis de Fourier , Humanos , Sistemas de Lectura Abierta , ARN Circular/clasificación , ARN Largo no Codificante/clasificación , ARN Mensajero/clasificación

Contributing to agriculture by using soybean seed data from the tetrazolium test.

Pereira, Douglas F; Bugatti, Pedro H; Lopes, Fabricio M; Souza, André L S M; Saito, Priscila T M.

Data Brief ; 23: 103652, 2019 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-30788393

RESUMEN

Agribusiness has a great relevance in the world×³s economy. It generates a considerable impact in the gross national product of several nations. Hence, it is the major driver of many national economies. Nowadays, from each new planting to harvesting process it is mandatory and crucial to apply some kind of technology to optimize a given singular process, or even the entire cropping chain. For instance, digital image analysis joined with machine learning methods can be applied to obtain and guarantee a higher quality of the harvest, leading to not only a greater profit for producers, but also better products with lower cost to the final consumers. Thus, to provide this possibility this work describes a visual feature dataset from soybean seed images obtained from the tetrazolium test. This is a test capable to define how healthy a given seed is (e.g. how much the plant will produce, or if it is resistant to inclement weather, among others). To answer these questions we proposed this dataset which is the cornerstone to provide an effective classification of the soybean seed vigor (i.e. an extremely tiresome human visual inspection process). Besides, as one of the most prominent international commodity, the soybean production must follow rigid quality control process to be part of world trade. Hence, small mistakes in the seed vigor definition of a given seed lot can lead to huge losses.

Entropic Biological Score: a cell cycle investigation for GRNs inference.

Lopes, Fabrício M; Ray, Shubhra Sankar; Hashimoto, Ronaldo F; Cesar, Roberto M.

Gene ; 541(2): 129-37, 2014 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-24631265

RESUMEN

Inference of gene regulatory networks (GRNs) is one of the most challenging research problems of Systems Biology. In this investigation, a new GRNs inference methodology, called Entropic Biological Score (EBS), which linearly combines the mean conditional entropy (MCE) from expression levels and a Biological Score (BS), obtained by integrating different biological data sources, is proposed. The EBS is validated with the Cell Cycle related functional annotation information, available from Munich Information Center for Protein Sequences (MIPS), and compared with some existing methods like MRNET, ARACNE, CLR and MCE for GRNs inference. For real networks, the performance of EBS, which uses the concept of integrating different data sources, is found to be superior to the aforementioned inference methods. The best results for EBS are obtained by considering the weights w1=0.2 and w2=0.8 for MCE and BS values, respectively, where approximately 40% of the inferred connections are found to be correct and significantly better than related methods. The results also indicate that expression profile is able to recover some true connections, that are not present in biological annotations, thus leading to the possibility of discovering new relations between its genes.

Asunto(s)

Ciclo Celular/genética , Biología Computacional/métodos , Redes Reguladoras de Genes , Entropía , Expresión Génica , Modelos Teóricos , Fenotipo , Mapeo de Interacción de Proteínas

Networking the host immune response in Plasmodium vivax malaria.

Mendonça, Vitor R R; Queiroz, Artur T L; Lopes, Fabrício M; Andrade, Bruno B; Barral-Netto, Manoel.

Malar J ; 12: 69, 2013 Feb 21.

Artículo en Inglés | MEDLINE | ID: mdl-23433077

RESUMEN

BACKGROUND: Plasmodium vivax malaria clinical outcomes are a consequence of the interaction of multiple parasite, environmental and host factors. The host molecular and genetic determinants driving susceptibility to disease severity in this infection are largely unknown. Here, a network analysis of large-scale data from a significant number of individuals with different clinical presentations of P. vivax malaria was performed in an attempt to identify patterns of association between various candidate biomarkers and the clinical outcomes. METHODS: A retrospective analysis of 530 individuals from the Brazilian Amazon, including P. vivax-infected individuals who developed different clinical outcomes (148 asymptomatic malaria, 187 symptomatic malaria, 13 severe non-lethal malaria, and six severe lethal malaria) as well as 176 non-infected controls, was performed. Plasma levels of liver transaminases, bilirubins, creatinine, fibrinogen, C-reactive protein, superoxide dismutase (SOD)-1, haem oxygenase (HO)-1 and a panel composed by multiple cytokines and chemokines were measured and compared between the different clinical groups using network analysis. RESULTS: Non-infected individuals displayed several statistically significant interactions in the networks, including associations between the levels of IL-10 and IL-4 with the chemokine CXCL9. Individuals with asymptomatic malaria displayed multiple significant interactions involving IL-4. Subjects with mild or severe non-lethal malaria displayed substantial loss of interactions in the networks and TNF had significant associations more frequently with other parameters. Cases of lethal P. vivax malaria infection were associated with significant interactions between TNF ALT, HO-1 and SOD-1. CONCLUSIONS: The findings imply that clinical immunity to P. vivax malaria is associated with multiple significant interactions in the network, mostly involving IL-4, while lethality is linked to a systematic reduction of complexity of these interactions and to an increase in connections between markers linked to haemolysis-induced damage.

Asunto(s)

Malaria Vivax/inmunología , Malaria Vivax/patología , Plasmodium vivax/inmunología , Adolescente , Adulto , Análisis Químico de la Sangre , Brasil , Femenino , Interacciones Huésped-Patógeno , Humanos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , Adulto Joven

Ptaquiloside reduces NK cell activities by enhancing metallothionein expression, which is prevented by selenium.

Latorre, Andreia O; Caniceiro, Beatriz D; Fukumasu, Heidge; Gardner, Dale R; Lopes, Fabricio M; Wysochi, Harry L; da Silva, Tereza C; Haraguchi, Mitsue; Bressan, Fabiana F; Górniak, Silvana L.

Toxicology ; 304: 100-8, 2013 Feb 08.

Artículo en Inglés | MEDLINE | ID: mdl-23274088

RESUMEN

Pteridium aquilinum, one of the most important poisonous plants in the world, is known to be carcinogenic to animals and humans. Moreover, our previous studies showed that the immunosuppressive effects of ptaquiloside, its main toxic agent, were prevented by selenium in mouse natural killer (NK) cells. We also verified that this immunosuppression facilitated development of cancer. Here, we performed gene expression microarray analysis in splenic NK cells from mice treated for 14 days with ptaquiloside (5.3 mg/kg) and/or selenium (1.3 mg/kg) to identify gene transcripts altered by ptaquiloside that could be linked to the immunosuppression and that would be prevented by selenium. Transcriptome analysis of ptaquiloside samples revealed that 872 transcripts were expressed differentially (fold change>2 and p<0.05), including 77 up-regulated and 795 down-regulated transcripts. Gene ontology analysis mapped these up-regulated transcripts to three main biological processes (cellular ion homeostasis, negative regulation of apoptosis and regulation of transcription). Considering the immunosuppressive effect of ptaquiloside, we hypothesized that two genes involved in cellular ion homeostasis, metallothionein 1 (Mt1) and metallothionein 2 (Mt2), could be implicated because Mt1 and Mt2 are responsible for zinc homeostasis, and a reduction of free intracellular zinc impairs NK functions. We confirm these hypotheses and show increased expression of metallothionein in splenic NK cells and reduction in free intracellular zinc following treatment with ptaquiloside that were completely prevented by selenium co-treatment. These findings could help avoid the higher susceptibility to cancer that is induced by P. aquilinum-mediated immunosuppressive effects.

Asunto(s)

Indanos/toxicidad , Células Asesinas Naturales/efectos de los fármacos , Metalotioneína/genética , Selenio/farmacología , Sesquiterpenos/toxicidad , Animales , Apoptosis/efectos de los fármacos , Carcinógenos/toxicidad , Regulación hacia Abajo/efectos de los fármacos , Perfilación de la Expresión Génica , Células Asesinas Naturales/metabolismo , Masculino , Ratones , Ratones Endogámicos C57BL , Análisis de Secuencia por Matrices de Oligonucleótidos , Pteridium/química , Bazo/citología , Bazo/efectos de los fármacos , Bazo/metabolismo , Transcripción Genética/efectos de los fármacos , Transcriptoma , Regulación hacia Arriba/efectos de los fármacos , Zinc/metabolismo

Assessing the gain of biological data integration in gene networks inference.

Vicente, Fábio F R; Lopes, Fabrício M; Hashimoto, Ronaldo F; Cesar, Roberto M.

BMC Genomics ; 13 Suppl 6: S7, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-23134775

RESUMEN

BACKGROUND: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. METHODS: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. RESULTS AND CONCLUSIONS: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.

Asunto(s)

Redes Reguladoras de Genes , Algoritmos , Bases de Datos Genéticas , Genoma , Plasmodium falciparum/genética , Mapas de Interacción de Proteínas , Proteínas/metabolismo

Gene expression complex networks: synthesis, identification, and analysis.

Lopes, Fabrício M; Cesar, Roberto M; Costa, Luciano Da F.

J Comput Biol ; 18(10): 1353-67, 2011 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-21548810

RESUMEN

Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree variation, decreasing its network recovery rate with the increase of . The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

Asunto(s)

Biología Computacional/métodos , Redes Reguladoras de Genes/genética , Modelos Genéticos , Validación de Programas de Computación , Biología de Sistemas/métodos , Algoritmos , Inteligencia Artificial , Simulación por Computador , Expresión Génica , Biología Sintética , Factores de Tiempo

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA