Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros











Intervalo de ano de publicação
1.
Braz J Microbiol ; 55(2): 1219-1229, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38705959

RESUMO

Cyanobacteria have developed acclimation strategies to adapt to harsh environments, making them a model organism. Understanding the molecular mechanisms of tolerance to abiotic stresses can help elucidate how cells change their gene expression patterns in response to stress. Recent advances in sequencing techniques and bioinformatics analysis methods have led to the discovery of many genes involved in stress response in organisms. The Synechocystis sp. PCC 6803 is a suitable microorganism for studying transcriptome response under environmental stress. Therefore, for the first time, we employed two effective feature selection techniques namely and support vector machine recursive feature elimination (SVM-RFE) and LASSO (Least Absolute Shrinkage Selector Operator) to pinpoint the crucial genes responsive to environmental stresses in Synechocystis sp. PCC 6803. We applied these algorithms of machine learning to analyze the transcriptomic data of Synechocystis sp. PCC 6803 under distinct conditions, encompassing light, salt and iron stress conditions. Seven candidate genes namely sll1862, slr0650, sll0760, slr0091, ssl3044, slr1285, and slr1687 were selected by both LASSO and SVM-RFE algorithms. RNA-seq analysis was performed to validate the efficiency of our feature selection approach in selecting the most important genes. The RNA-seq analysis revealed significantly high expression for five genes namely sll1862, slr1687, ssl3044, slr1285, and slr0650 under ion stress condition. Among these five genes, ssl3044 and slr0650 could be introduced as new potential candidate genes for further confirmatory genetic studies, to determine their roles in their response to abiotic stresses.


Assuntos
Algoritmos , Aprendizado de Máquina , Estresse Fisiológico , Synechocystis , Synechocystis/genética , Synechocystis/fisiologia , Estresse Fisiológico/genética , Regulação Bacteriana da Expressão Gênica , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Transcriptoma , Biologia Computacional/métodos , Máquina de Vetores de Suporte , Perfilação da Expressão Gênica , Luz , Genes Bacterianos
2.
Clin Transl Oncol ; 26(5): 1170-1186, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-37989822

RESUMO

BACKGROUND: Anoikis is a cell death programmed to eliminate dysfunctional or damaged cells induced by detachment from the extracellular matrix. Utilizing an anoikis-based risk stratification is anticipated to understand melanoma's prognostic and immune landscapes comprehensively. METHODS: Differential expression genes (DEGs) were analyzed between melanoma and normal skin tissues in The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression data sets. Next, least absolute shrinkage and selection operator, support vector machine-recursive feature elimination algorithm, and univariate and multivariate Cox analyses on the 308 DEGs were performed to build the prognostic signature in the TCGA-melanoma data set. Finally, the signature was validated in GSE65904 and GSE22155 data sets. NOTCH3, PIK3R2, and SOD2 were validated in our clinical samples by immunohistochemistry. RESULTS: The prognostic model for melanoma patients was developed utilizing ten hub anoikis-related genes. The overall survival (OS) of patients in the high-risk subgroup, which was classified by the optimal cutoff value, was remarkably shorter in the TCGA-melanoma, GSE65904, and GSE22155 data sets. Low-risk patients exhibited low immune cell infiltration and high expression of immunophenoscores and immune checkpoints. They also demonstrated increased sensitivity to various drugs, including dasatinib and dabrafenib. NOTCH3, PIK3R2, and SOD2 were notably associated with OS by univariate Cox analysis in the GSE65904 data set. The clinical melanoma samples showed remarkably higher protein expressions of NOTCH3 (P = 0.003) and PIK3R2 (P = 0.009) than the para-melanoma samples, while the SOD2 protein expression remained unchanged. CONCLUSIONS: In this study, we successfully established a prognostic anoikis-connected signature using machine learning. This model may aid in evaluating patient prognosis, clinical characteristics, and immune treatment modalities for melanoma.

3.
Sensors (Basel) ; 23(21)2023 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-37960535

RESUMO

Scene classification in autonomous navigation is a highly complex task due to variations, such as light conditions and dynamic objects, in the inspected scenes; it is also a challenge for small-factor computers to run modern and highly demanding algorithms. In this contribution, we introduce a novel method for classifying scenes in simultaneous localization and mapping (SLAM) using the boundary object function (BOF) descriptor on RGB-D points. Our method aims to reduce complexity with almost no performance cost. All the BOF-based descriptors from each object in a scene are combined to define the scene class. Instead of traditional image classification methods such as ORB or SIFT, we use the BOF descriptor to classify scenes. Through an RGB-D camera, we capture points and adjust them onto layers than are perpendicular to the camera plane. From each plane, we extract the boundaries of objects such as furniture, ceilings, walls, or doors. The extracted features compose a bag of visual words classified by a support vector machine. The proposed method achieves almost the same accuracy in scene classification as a SIFT-based algorithm and is 2.38× faster. The experimental results demonstrate the effectiveness of the proposed method in terms of accuracy and robustness for the 7-Scenes and SUNRGBD datasets.

4.
Med Biol Eng Comput ; 61(3): 835-845, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36626112

RESUMO

Motor imagery brain-computer interface (MI-BCI) is one of the most used paradigms in EEG-based brain-computer interface (BCI). The current state-of-the-art in BCI involves tuning classifiers to subject-specific training data, acquired over several sessions, in order to perform calibration prior to actual use of the so-called subject-specific BCI system (SS-BCI). Herein, the goal is to provide a ready-to-use system requiring minimal effort for setup. Thus, our challenge was to design a subject-independent BCI (SI-BCI) to be used by any new user without the constraint of individual calibration. Outcomes from other studies with the same purpose were used to undertake comparisons and validate our findings. For the EEG signal processing, we used a combination of the delta (0.5-4 Hz), alpha (8-13 Hz), and beta+gamma (13-40 Hz) bands at a stage prior to feature extraction. Next, we extracted features from the 27-channel EEG using common spatial pattern (CSP) and performed binary classification (MI of right- and left-hand) with linear discriminant analysis (LDA) and support vector machine (SVM) classifiers. These analyses were done for both the SS-BCI and SI-BCI models. We employed "leave-one-subject-out" (LOSO) arrangement and 10-fold cross-validation to evaluate our SI-BCI and SS-BCI systems, respectively. Compared with other two studies, our work was the only one that showed higher accuracy for the LDA classifier in SI-BCI as compared to SS-BCI. On the other hand, LDA accuracy was lower than accuracy achieved with SVM in both conditions (SI-BCI and SS-BCI). Our SS-BCI accuracy reached 76.85% using LDA and 94.20% using SVM and for SI-BCI we got 80.30% with LDA and 83.23% with SVM. We conclude that SI-BCI may be a feasible and relevant option, which can be used in scenarios where subjects are not able to submit themselves to long training sessions or to fast evaluation of the so called "BCI illiteracy." Comparatively, our strategy proved to be more efficient, giving us the best result for SI-BCI when faced against the classification performances of other three studies, even considering the caveat that different datasets were used in the comparison of the four studies.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Humanos , Máquina de Vetores de Suporte , Análise Discriminante , Imagens, Psicoterapia , Imaginação , Algoritmos
5.
Smart Health (Amst) ; 26: 100323, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36159078

RESUMO

The large amount of data generated during the COVID-19 pandemic requires advanced tools for the long-term prediction of risk factors associated with COVID-19 mortality with higher accuracy. Machine learning (ML) methods directly address this topic and are essential tools to guide public health interventions. Here, we used ML to investigate the importance of demographic and clinical variables on COVID-19 mortality. We also analyzed how comorbidity networks are structured according to age groups. We conducted a retrospective study of COVID-19 mortality with hospitalized patients from Londrina, Parana, Brazil, registered in the database for severe acute respiratory infections (SIVEP-Gripe), from January 2021 to February 2022. We tested four ML models to predict the COVID-19 outcome: Logistic Regression, Support Vector Machine, Random Forest, and XGBoost. We also constructed a comorbidity network to investigate the impact of co-occurring comorbidities on COVID-19 mortality. Our study comprised 8358 hospitalized patients, of whom 2792 (33.40%) died. The XGBoost model achieved excellent performance (ROC-AUC = 0.90). Both permutation method and SHAP values highlighted the importance of age, ventilatory support status, and intensive care unit admission as key features in predicting COVID-19 outcomes. The comorbidity networks for old deceased patients are denser than those for young patients. In addition, the co-occurrence of heart disease and diabetes may be the most important combination to predict COVID-19 mortality, regardless of age and sex. This work presents a valuable combination of machine learning and comorbidity network analysis to predict COVID-19 outcomes. Reliable evidence on this topic is crucial for guiding the post-pandemic response and assisting in COVID-19 care planning and provision.

6.
PeerJ ; 10: e13470, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35651746

RESUMO

Chagas disease is a life-threatening illness caused by the parasite Trypanosoma cruzi. The diagnosis of the acute form of the disease is performed by trained microscopists who detect parasites in blood smear samples. Since this method requires a dedicated high-resolution camera system attached to the microscope, the diagnostic method is more expensive and often prohibitive for low-income settings. Here, we present a machine learning approach based on a random forest (RF) algorithm for the detection and counting of T. cruzi trypomastigotes in mobile phone images. We analyzed micrographs of blood smear samples that were acquired using a mobile device camera capable of capturing images in a resolution of 12 megapixels. We extracted a set of features that describe morphometric parameters (geometry and curvature), as well as color, and texture measurements of 1,314 parasites. The features were divided into train and test sets (4:1) and classified using the RF algorithm. The values of precision, sensitivity, and area under the receiver operating characteristic (ROC) curve of the proposed method were 87.6%, 90.5%, and 0.942, respectively. Automating image analysis acquired with a mobile device is a viable alternative for reducing costs and gaining efficiency in the use of the optical microscope.


Assuntos
Telefone Celular , Doença de Chagas , Parasitos , Trypanosoma cruzi , Animais , Doença de Chagas/diagnóstico , Curva ROC
7.
Inform Med Unlocked ; 30: 100958, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35528315

RESUMO

The prediction of host human miRNA binding to the SARS-COV-2-CoV-2 RNA sequence is of particular interest. This biological process could lead to virus repression, serve as biomarkers for diagnosis, or as potential treatments for this disease. One source of concern is attempting to uncover the viral regions in which this binding could occur, as well as how these miRNAs binding could affect the SARS-COV-2 virus's processes. Using extracted sequence features from this base pairing, we predicted the relationships between miRNAs that interact with genes involved in immune function and bind to the SARS-COV-2 genome in their 5' UTR region. We compared two supervised models, SVM and Random Forest, with an unsupervised One-Class SVM. When the results of the confusion matrices were inspected, the results of the supervised models were misleading, resulting in a Type II error. However, with the latter model, we achieved an average accuracy of 92%, sensitivity of 96.18%, and specificity of 78%. We hypothesize that studying the bind of miRNAs that affect immunological genes and bind to the SARS-COV-2 virus will lead to potential genetic therapies for fighting the disease or understanding how the immune system is affected when this type of viral infection occurs.

8.
Sensors (Basel) ; 22(4)2022 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-35214585

RESUMO

In this research, we analyse data obtained from sensors when a user handwrites or draws on a tablet to detect whether the user is in a specific mood state. First, we calculated the features based on the temporal, kinematic, statistical, spectral and cepstral domains for the tablet pressure, the horizontal and vertical pen displacements and the azimuth of the pen's position. Next, we selected features using a principal component analysis (PCA) pipeline, followed by modified fast correlation-based filtering (mFCBF). PCA was used to calculate the orthogonal transformation of the features, and mFCBF was used to select the best PCA features. The EMOTHAW database was used for depression, anxiety and stress scale (DASS) assessment. The process involved the augmentation of the training data by first augmenting the mood states such that all the data were the same size. Then, 80% of the training data was randomly selected, and a small random Gaussian noise was added to the extracted features. Automated machine learning was employed to train and test more than ten plain and ensembled classifiers. For all three moods, we obtained 100% accuracy results when detecting two possible grades of mood severities using this architecture. The results obtained were superior to the results obtained by using state-of-the-art methods, which enabled us to define the three mood states and provide precise information to the clinical psychologist. The accuracy results obtained when detecting these three possible mood states using this architecture were 82.5%, 72.8% and 74.56% for depression, anxiety and stress, respectively.


Assuntos
Ansiedade , Aprendizado de Máquina , Ansiedade/diagnóstico , Distribuição Normal , Análise de Componente Principal , Máquina de Vetores de Suporte
9.
Medicina (Kaunas) ; 57(6)2021 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-34074037

RESUMO

Background and Objectives: Thyroid nodules are lumps of solid or liquid-filled tumors that form inside the thyroid gland, which can be malignant or benign. Our aim was to test whether the described features of the Thyroid Imaging Reporting and Data System (TI-RADS) could improve radiologists' decision making when integrated into a computer system. In this study, we developed a computer-aided diagnosis system integrated into multiple-instance learning (MIL) that would focus on benign-malignant classification. Data were available from the Universidad Nacional de Colombia. Materials and Methods: There were 99 cases (33 Benign and 66 malignant). In this study, the median filter and image binarization were used for image pre-processing and segmentation. The grey level co-occurrence matrix (GLCM) was used to extract seven ultrasound image features. These data were divided into 87% training and 13% validation sets. We compared the support vector machine (SVM) and artificial neural network (ANN) classification algorithms based on their accuracy score, sensitivity, and specificity. The outcome measure was whether the thyroid nodule was benign or malignant. We also developed a graphic user interface (GUI) to display the image features that would help radiologists with decision making. Results: ANN and SVM achieved an accuracy of 75% and 96% respectively. SVM outperformed all the other models on all performance metrics, achieving higher accuracy, sensitivity, and specificity score. Conclusions: Our study suggests promising results from MIL in thyroid cancer detection. Further testing with external data is required before our classification model can be employed in practice.


Assuntos
Nódulo da Glândula Tireoide , Colômbia , Diagnóstico por Computador , Humanos , Aprendizado de Máquina , Sensibilidade e Especificidade , Nódulo da Glândula Tireoide/diagnóstico por imagem , Ultrassonografia
10.
Rev. bras. med. esporte ; Rev. bras. med. esporte;27(spe): 80-82, Mar. 2021. tab, graf
Artigo em Inglês | LILACS | ID: biblio-1156132

RESUMO

ABSTRACT In recent years, China has paid more and more attention to students' physical health, but it is difficult for schools to provide scientific guarantee for students' physical health evaluation. How to use scientific algorithm for accurate guidance has become the current hotspot. Based on this, this paper studies the evaluation model of students' physical health based on the integration of home and school sports. Firstly, this paper analyzes the research status of physical health evaluation at home and outside, then optimizes and improves the deficiencies in the integration of home and school sports in the current research hotspot, then applies SVM algorithm to the physical health evaluation model. Finally, the experimental results show that the SVM algorithm can objectively evaluate the integration of home and school sports, and can optimize the evaluation strategy according to the differences of students in the process of physical exercise, and the accuracy of physical health evaluation can reach more than 97%.


RESUMO Nos últimos anos, a China tem prestado cada vez mais atenção à saúde física dos estudantes, mas é difícil para as escolas fornecer garantias científicas para o processo de avaliação da saúde física dos estudantes. Como usar o algoritmo científico para orientação precisa tornou-se um ponto crucial. Com base nisso, este documento estuda o modelo de avaliação da saúde física dos estudantes com base na integração dos esportes domésticos e escolares. Em primeiro lugar, este artigo analisa o estado de investigação da avaliação da saúde física em casa e fora de casa, e, em seguida, otimiza e melhora as deficiências na integração dos esportes domésticos e escolares no atual foco de pesquisa, e, em seguida, aplica o algoritmo SVM ao modelo de avaliação da saúde física. Finalmente, os resultados experimentais mostram que o algoritmo SVM pode realizar a avaliação objetiva do processo de integração de esportes domésticos e escolares, e pode otimizar a estratégia de avaliação de acordo com as diferenças dos estudantes no processo de exercício físico, e a precisão da avaliação de saúde física pode atingir mais de 97%.


RESUMEN En los últimos años, China ha prestado cada vez más atención a la salud física de los estudiantes, pero es difícil para las escuelas brindar garantías científicas para la evaluación de la salud física de los estudiantes. Cómo utilizar el algoritmo científico para una guía precisa se ha convertido en el punto de acceso actual. Con base en esto, este trabajo estudia el modelo de evaluación de la salud física de los estudiantes basado en la integración de los deportes domésticos y escolares. En primer lugar, este artículo analiza el estado de la investigación de la evaluación de la salud física en el hogar y en el exterior, luego optimiza y mejora las deficiencias en la integración de los deportes en el hogar y la escuela en el punto de acceso de investigación actual. Luego aplica el algoritmo SVM al modelo de evaluación de la salud física. Finalmente, los resultados experimentales muestran que el algoritmo SVM puede evaluar objetivamente la integración de los deportes en el hogar y la escuela, y puede optimizar la estrategia de evaluación de acuerdo con las diferencias de los estudiantes en el proceso de ejercicio físico, y la precisión de la evaluación de la salud física puede alcanzar más del 97%.


Assuntos
Humanos , Serviços de Saúde Escolar , Exercício Físico , Nível de Saúde , Algoritmos
11.
Molecules ; 25(13)2020 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-32630676

RESUMO

Food analysis is a challenging analytical problem, often addressed using sophisticated laboratory methods that produce large data sets. Linear and non-linear multivariate methods can be used to process these types of datasets and to answer questions such as whether product origin is accurately labeled or whether a product is safe to eat. In this review, we present the application of non-linear methods such as artificial neural networks, support vector machines, self-organizing maps, and multi-layer artificial neural networks in the field of chemometrics related to food analysis. We discuss criteria to determine when non-linear methods are better suited for use instead of traditional methods. The principles of algorithms are described, and examples are presented for solving the problems of exploratory analysis, classification, and prediction.


Assuntos
Quimioinformática/métodos , Análise de Alimentos/métodos , Algoritmos , Análise de Alimentos/estatística & dados numéricos , Redes Neurais de Computação , Dinâmica não Linear , Máquina de Vetores de Suporte
12.
Braz. arch. biol. technol ; Braz. arch. biol. technol;62: e19170821, 2019. tab, graf
Artigo em Inglês | LILACS | ID: biblio-1055410

RESUMO

Abstract: Thyroid nodules are cell growths in the thyroid which might be for in one of two categories benign or malignant. Nodular thyroid disease is common and because of the associated risk of malignancy and hyper-function; these nodules have to be examined thoroughly. Hence diagnosing thyroid nodule malignancy in the early stage can mitigate the possibility of death. This paper presents an intelligent thyroid nodules malignancy diagnosis using texture information in run-length matrix derived from 2- level 2D wavelet transform bands (approximation and details). In this work, ANOVA test has been used to for feature selection to reduce for feature selection about 45 run-length features with and without wavelet generated, before feeding those features which clinical importance to the Support Vector Machine(SVM) and Decision Tree (DT) classifier to perform the automated diagnosis. The validation of this work is activated using 100-thyroid nodule images spliced equally between the two categories (50 Benign and 50 Malignant). The proposed system can detect thyroid nodules malignancy with an average accuracy of about 97% using SVM classifier for the run- length matrix, features derived from spatial domain while the average accuracy is increased to 98% in case of hybrid feature derived from spatial domain and 2-level wavelet decomposition. For the other proposed classifier (DT), the average accuracy in case of spatial domain based features is 93% whereas the average accuracy of the hybrid features system is 97%. Hence the proposed system can be used for the screening of thyroid nodules.


Assuntos
Diagnóstico por Computador/instrumentação , Nódulo da Glândula Tireoide/diagnóstico por imagem , Programas de Rastreamento , Análise de Variância
13.
Neuroimage Clin ; 20: 724-730, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30238916

RESUMO

Multiple Sclerosis patients' clinical symptoms do not correlate strongly with structural assessment done with traditional magnetic resonance images. However, its diagnosis and evaluation of the disease's progression are based on a combination of this imaging analysis complemented with clinical examination. Therefore, other biomarkers are necessary to better understand the disease. In this paper, we capitalize on machine learning techniques to classify relapsing-remitting multiple sclerosis patients and healthy volunteers based on machine learning techniques, and to identify relevant brain areas and connectivity measures for characterizing patients. To this end, we acquired magnetic resonance imaging data from relapsing-remitting multiple sclerosis patients and healthy subjects. Fractional anisotropy maps, structural and functional connectivity were extracted from the scans. Each of them were used as separate input features to construct support vector machine classifiers. A fourth input feature was created by combining structural and functional connectivity. Patients were divided in two groups according to their degree of disability and, together with the control group, three group pairs were formed for comparison. Twelve separate classifiers were built from the combination of these four input features and three group pairs. The classifiers were able to distinguish between patients and healthy subjects, reaching accuracy levels as high as 89% ±â€¯2%. In contrast, the performance was noticeably lower when comparing the two groups of patients with different levels of disability, reaching levels below 63% ±â€¯5%. The brain regions that contributed the most to the classification were the right occipital, left frontal orbital, medial frontal cortices and lingual gyrus. The developed classifiers based on MRI data were able to distinguish multiple sclerosis patients and healthy subjects reliably. Moreover, the resulting classification models identified brain regions, and functional and structural connections relevant for better understanding of the disease.


Assuntos
Mapeamento Encefálico/métodos , Encéfalo/diagnóstico por imagem , Imagem de Difusão por Ressonância Magnética , Processamento de Imagem Assistida por Computador/métodos , Esclerose Múltipla Recidivante-Remitente/diagnóstico por imagem , Adolescente , Adulto , Encéfalo/patologia , Encéfalo/fisiopatologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Esclerose Múltipla Recidivante-Remitente/patologia , Esclerose Múltipla Recidivante-Remitente/fisiopatologia , Estudos Prospectivos , Máquina de Vetores de Suporte , Adulto Jovem
14.
Appl Spectrosc ; 72(12): 1774-1780, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30063378

RESUMO

Identification of different chicken parts using portable equipment could provide useful information for the processing industry and also for authentication purposes. Traditionally, physical-chemical analysis could deal with this task, but some disadvantages arise such as time constraints and requirements of chemicals. Recently, near-infrared (NIR) spectroscopy and machine learning (ML) techniques have been widely used to obtain a rapid, noninvasive, and precise characterization of biological samples. This study aims at classifying chicken parts (breasts, thighs, and drumstick) using portable NIR equipment combined with ML algorithms. Physical and chemical attributes (pH and L*a*b* color features) and chemical composition (protein, fat, moisture, and ash) were determined for each sample. Spectral information was acquired using a portable NIR spectrophotometer within the range 900-1700 nm and principal component analysis was used as screening approach. Support vector machine and random forest algorithms were compared for chicken meat classification. Results confirmed the possibility of differentiating breast samples from thighs and drumstick with 98.8% accuracy. The results showed the potential of using a NIR portable spectrophotometer combined with a ML approach for differentiation of chicken parts in the processing industry.


Assuntos
Galinhas/anatomia & histologia , Aprendizado de Máquina , Produtos Avícolas/análise , Produtos Avícolas/classificação , Algoritmos , Animais , Gorduras/análise , Proteínas de Aves Domésticas/análise , Análise de Componente Principal , Espectroscopia de Luz Próxima ao Infravermelho/métodos
15.
Data Brief ; 19: 264-270, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-29892645

RESUMO

This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the B. subtilis genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the B. subtilis genome. After processing the data, we obtained 767 promoter sequences for B. subtilis, most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip.

16.
Rev. mex. ing. bioméd ; 39(1): 95-104, ene.-abr. 2018. tab, graf
Artigo em Inglês | LILACS | ID: biblio-902386

RESUMO

Abstract: In this work, a Brain Computer interface able to decode imagery motor task from EEG is presented. The method uses time-frequency representation of the brain signal recorded in different regions of the brain to extract important features. Principal Component Analysis and Sequential Forward Selection methods are compared in their ability to represent the feature set in a compact form, removing at the same time unnecessary information. Finally, two method based on machine learning are implemented for the task of classification. Results show that it is possible to decode the mental activity of the subjects with accuracy above 80%. Furthermore, visualization of the main components extracted from the brain signal allow for physiological insights on the activity that take place in the sensorimotor cortex during execution of imaginary movement of different parts of the body.


Resumen: En este trabajo es presentada una Interfaz Cerebro Computadora que tiene la capacidad de decodificar actividades motrices. El método utiliza representación en el dominio de la frecuencia y el tiempo de las señales del cerebro grabadas en distintas regiones de este mismo, con el fin de extraer características importantes. Los métodos: Análisis de Componentes Principales y Selección Secuencial, son comparados en términos de su capacidad para representar características de la señal de una forma compacta, removiendo de esta forma, información innecesaria. Finalmente, dos métodos basados en aprendizaje de máquinas fueron implementados para la clasificación de actividades motrices utilizando solo las señales cerebrales. Los resultados muestran que es posible decodificar la actividad mental en los sujetos con una precisión superior al 80%. Además, la visualización de las componentes principales extraídas de las señales del cerebro permite un analísis de la actividad que toma lugar en la corteza cerebral sensorimotora durante la ejecución de la imaginación de movimientos de distintas partes del cuerpo.

17.
Environ Sci Pollut Res Int ; 25(22): 21362-21367, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28424959

RESUMO

The concentrations of 17 non-essential elements (Al, As, Ba, Be, Cd, Ce, Cr, Hg, La, Li, Pb, Sb, Sn, Sr, Th, Ti, and Tl) were determined in brown grain rice samples of two varieties: Fortuna and Largo Fino. The samples were collected from the four main producing regions of Corrientes province (Argentina). Quantitative determinations were performed by inductively coupled plasma mass spectrometry (ICP-MS), using a validated method. The contents of As, Be, Cd, Ce, Cr, Hg, Pb, Sb, Sn, Th, and Tl were very low or not detected in most samples. The non-essential element levels detected were in line with studies conducted in rice from different parts of the world. In order to characterize the influence of geographical origin in the samples, the following classification methods were carried out: linear discriminant analysis (LDA), k-nearest neighbors (k-NN), partial least squares discriminant analysis (PLS-DA), support vector machine (SVM) and random forests (RF). The best performance was obtained by using RF (96%) and SVM (96%). The results reported here showed the variation in the non-essential element profiles in rice grain depending on the geographical origin.


Assuntos
Grão Comestível/química , Oryza/química , Oligoelementos/análise , Argentina , Mineração de Dados , Análise Discriminante , Geografia , Análise dos Mínimos Quadrados , Espectrometria de Massas
18.
BMC Genomics ; 18(1): 804, 2017 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-29047334

RESUMO

BACKGROUND: In recent years, a rapidly increasing number of RNA transcripts has been generated by thousands of sequencing projects around the world, creating enormous volumes of transcript data to be analyzed. An important problem to be addressed when analyzing this data is distinguishing between long non-coding RNAs (lncRNAs) and protein coding transcripts (PCTs). Thus, we present a Support Vector Machine (SVM) based method to distinguish lncRNAs from PCTs, using features based on frequencies of nucleotide patterns and ORF lengths, in transcripts. METHODS: The proposed method is based on SVM and uses the first ORF relative length and frequencies of nucleotide patterns selected by PCA as features. FASTA files were used as input to calculate all possible features. These features were divided in two sets: (i) 336 frequencies of nucleotide patterns; and (ii) 4 features derived from ORFs. PCA were applied to the first set to identify 6 groups of frequencies that could most contribute to the distinction. Twenty-four experiments using the 6 groups from the first set and the features from the second set where built to create the best model to distinguish lncRNAs from PCTs. RESULTS: This method was trained and tested with human (Homo sapiens), mouse (Mus musculus) and zebrafish (Danio rerio) data, achieving 98.21%, 98.03% and 96.09%, accuracy, respectively. Our method was compared to other tools available in the literature (CPAT, CPC, iSeeRNA, lncRNApred, lncRScan-SVM and FEELnc), and showed an improvement in accuracy by ≈3.00%. In addition, to validate our model, the mouse data was classified with the human model, and vice-versa, achieving ≈97.80% accuracy in both cases, showing that the model is not overfit. The SVM models were validated with data from rat (Rattus norvegicus), pig (Sus scrofa) and fruit fly (Drosophila melanogaster), and obtained more than 84.00% accuracy in all these organisms. Our results also showed that 81.2% of human pseudogenes and 91.7% of mouse pseudogenes were classified as non-coding. Moreover, our method was capable of re-annotating two uncharacterized sequences of Swiss-Prot database with high probability of being lncRNAs. Finally, in order to use the method to annotate transcripts derived from RNA-seq, previously identified lncRNAs of human, gorilla (Gorilla gorilla) and rhesus macaque (Macaca mulatta) were analyzed, having successfully classified 98.62%, 80.8% and 91.9%, respectively. CONCLUSIONS: The SVM method proposed in this work presents high performance to distinguish lncRNAs from PCTs, as shown in the results. To build the model, besides using features known in the literature regarding ORFs, we used PCA to identify features among nucleotide pattern frequencies that contribute the most in distinguishing lncRNAs from PCTs, in reference data sets. Interestingly, models created with two evolutionary distant species could distinguish lncRNAs of even more distant species.


Assuntos
Biologia Computacional/métodos , Fases de Leitura Aberta/genética , RNA não Traduzido/genética , Máquina de Vetores de Suporte , Animais , Humanos , Camundongos , Anotação de Sequência Molecular , RNA Mensageiro/genética , Peixe-Zebra/genética
19.
BMC Bioinformatics ; 18(1): 81, 2017 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-28152994

RESUMO

BACKGROUND: The correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA). Therefore, it is fundamental to research for computational methods aiming to help the patterns discovery for identification of the Translation Initiation Sites (TIS). In the field of Bioinformatics, machine learning methods have been widely applied based on the inductive inference, as Inductive Support Vector Machine (ISVM). On the other hand, not so much attention has been given to transductive inference-based machine learning methods such as Transductive Support Vector Machine (TSVM). The transductive inference performs well for problems in which the amount of unlabeled sequences is considerably greater than the labeled ones. Similarly, the problem of predicting the TIS may take advantage of transductive methods due to the fact that the amount of new sequences grows rapidly with the progress of Genome Project that allows the study of new organisms. Consequently, this work aims to investigate the transductive learning towards TIS identification and compare the results with those obtained in inductive method. RESULTS: The transductive inference presents better results both in F-measure and in sensitivity in comparison with the inductive method for predicting the TIS. Additionally, it presents the least failure rate for identifying the TIS, presenting a smaller number of False Negatives (FN) than the ISVM. The ISVM and TSVM methods were validated with the molecules from the most representative organisms contained in the RefSeq database: Rattus norvegicus, Mus musculus, Homo sapiens, Drosophila melanogaster and Arabidopsis thaliana. The transductive method presented F-measure and sensitivity higher than 90% and also higher than the results obtained with ISVM. The ISVM and TSVM approaches were implemented in the TransduTIS tool, TransduTIS-I and TransduTIS-T respectively, available in a web interface. These approaches were compared with the TISHunter, TIS Miner, NetStart tools, presenting satisfactory results. CONCLUSIONS: In relation to precision, the results are similar for the ISVM and TSVM classifiers. However, the results show that the application of TSVM approach ensured an improvement, specially for F-measure and sensitivity. Moreover, it was possible to identify a potential for the application of TSVM, which is for organisms in the initial study phase with few identified sequences in the databases.


Assuntos
Iniciação Traducional da Cadeia Peptídica , Máquina de Vetores de Suporte , Animais , Arabidopsis/genética , Biologia Computacional/métodos , Drosophila melanogaster/genética , Humanos , Camundongos , Ratos , Software
20.
Noncoding RNA ; 3(1)2017 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-29657283

RESUMO

Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA