Search | VHL Regional Portal

1.

Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance.

Mosquera, Candelaria; Ferrer, Luciana; Milone, Diego H; Luna, Daniel; Ferrante, Enzo.

Eur Radiol ; 2024 Jun 11.

Article in English | MEDLINE | ID: mdl-38861161

ABSTRACT

PURPOSE: This work aims to assess standard evaluation practices used by the research community for evaluating medical imaging classifiers, with a specific focus on the implications of class imbalance. The analysis is performed on chest X-rays as a case study and encompasses a comprehensive model performance definition, considering both discriminative capabilities and model calibration. MATERIALS AND METHODS: We conduct a concise literature review to examine prevailing scientific practices used when evaluating X-ray classifiers. Then, we perform a systematic experiment on two major chest X-ray datasets to showcase a didactic example of the behavior of several performance metrics under different class ratios and highlight how widely adopted metrics can conceal performance in the minority class. RESULTS: Our literature study confirms that: (1) even when dealing with highly imbalanced datasets, the community tends to use metrics that are dominated by the majority class; and (2) it is still uncommon to include calibration studies for chest X-ray classifiers, albeit its importance in the context of healthcare. Moreover, our systematic experiments confirm that current evaluation practices may not reflect model performance in real clinical scenarios and suggest complementary metrics to better reflect the performance of the system in such scenarios. CONCLUSION: Our analysis underscores the need for enhanced evaluation practices, particularly in the context of class-imbalanced chest X-ray classifiers. We recommend the inclusion of complementary metrics such as the area under the precision-recall curve (AUC-PR), adjusted AUC-PR, and balanced Brier score, to offer a more accurate depiction of system performance in real clinical scenarios, considering metrics that reflect both, discrimination and calibration performance. CLINICAL RELEVANCE STATEMENT: This study underscores the critical need for refined evaluation metrics in medical imaging classifiers, emphasizing that prevalent metrics may mask poor performance in minority classes, potentially impacting clinical diagnoses and healthcare outcomes. KEY POINTS: Common scientific practices in papers dealing with X-ray computer-assisted diagnosis (CAD) systems may be misleading. We highlight limitations in reporting of evaluation metrics for X-ray CAD systems in highly imbalanced scenarios. We propose adopting alternative metrics based on experimental evaluation on large-scale datasets.

2.

sincFold: end-to-end learning of short- and long-range interactions in RNA secondary structure.

Bugnon, Leandro A; Di Persia, Leandro; Gerard, Matias; Raad, Jonathan; Prochetto, Santiago; Fenoy, Emilio; Chorostecki, Uciel; Ariel, Federico; Stegmayer, Georgina; Milone, Diego H.

Brief Bioinform ; 25(4)2024 May 23.

Article in English | MEDLINE | ID: mdl-38855913

ABSTRACT

MOTIVATION: Coding and noncoding RNA molecules participate in many important biological processes. Noncoding RNAs fold into well-defined secondary structures to exert their functions. However, the computational prediction of the secondary structure from a raw RNA sequence is a long-standing unsolved problem, which after decades of almost unchanged performance has now re-emerged due to deep learning. Traditional RNA secondary structure prediction algorithms have been mostly based on thermodynamic models and dynamic programming for free energy minimization. More recently deep learning methods have shown competitive performance compared with the classical ones, but there is still a wide margin for improvement. RESULTS: In this work we present sincFold, an end-to-end deep learning approach, that predicts the nucleotides contact matrix using only the RNA sequence as input. The model is based on 1D and 2D residual neural networks that can learn short- and long-range interaction patterns. We show that structures can be accurately predicted with minimal physical assumptions. Extensive experiments were conducted on several benchmark datasets, considering sequence homology and cross-family validation. sincFold was compared with classical methods and recent deep learning models, showing that it can outperform the state-of-the-art methods.

Subject(s)

Computational Biology , Deep Learning , Nucleic Acid Conformation , RNA , RNA/chemistry , RNA/genetics , Computational Biology/methods , Algorithms , Neural Networks, Computer , Thermodynamics

3.

CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images.

Gaggion, Nicolás; Mosquera, Candelaria; Mansilla, Lucas; Saidman, Julia Mariel; Aineseder, Martina; Milone, Diego H; Ferrante, Enzo.

Sci Data ; 11(1): 511, 2024 May 17.

Article in English | MEDLINE | ID: mdl-38760409

ABSTRACT

The development of successful artificial intelligence models for chest X-ray analysis relies on large, diverse datasets with high-quality annotations. While several databases of chest X-ray images have been released, most include disease diagnosis labels but lack detailed pixel-level anatomical segmentation labels. To address this gap, we introduce an extensive chest X-ray multi-center segmentation dataset with uniform and fine-grain anatomical annotations for images coming from five well-known publicly available databases: ChestX-ray8, CheXpert, MIMIC-CXR-JPG, Padchest, and VinDr-CXR, resulting in 657,566 segmentation masks. Our methodology utilizes the HybridGNet model to ensure consistent and high-quality segmentations across all datasets. Rigorous validation, including expert physician evaluation and automatic quality control, was conducted to validate the resulting masks. Additionally, we provide individualized quality indices per mask and an overall quality estimation per dataset. This dataset serves as a valuable resource for the broader scientific community, streamlining the development and assessment of innovative methodologies in chest X-ray analysis.

Subject(s)

Radiography, Thoracic , Humans , Databases, Factual , Artificial Intelligence , Lung/diagnostic imaging

4.

Evaluating large language models for annotating proteins.

Vitale, Rosario; Bugnon, Leandro A; Fenoy, Emilio Luis; Milone, Diego H; Stegmayer, Georgina.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38706315

ABSTRACT

In UniProtKB, up to date, there are more than 251 million proteins deposited. However, only 0.25% have been annotated with one of the more than 15000 possible Pfam family domains. The current annotation protocol integrates knowledge from manually curated family domains, obtained using sequence alignments and hidden Markov models. This approach has been successful for automatically growing the Pfam annotations, however at a low rate in comparison to protein discovery. Just a few years ago, deep learning models were proposed for automatic Pfam annotation. However, these models demand a considerable amount of training data, which can be a challenge with poorly populated families. To address this issue, we propose and evaluate here a novel protocol based on transfer learningThis requires the use of protein large language models (LLMs), trained with self-supervision on big unnanotated datasets in order to obtain sequence embeddings. Then, the embeddings can be used with supervised learning on a small and annotated dataset for a specialized task. In this protocol we have evaluated several cutting-edge protein LLMs together with machine learning architectures to improve the actual prediction of protein domain annotations. Results are significatively better than state-of-the-art for protein families classification, reducing the prediction error by an impressive 60% compared to standard methods. We explain how LLMs embeddings can be used for protein annotation in a concrete and easy way, and provide the pipeline in a github repo. Full source code and data are available at https://github.com/sinc-lab/llm4pfam.

Subject(s)

Databases, Protein , Proteins , Proteins/chemistry , Molecular Sequence Annotation/methods , Computational Biology/methods , Machine Learning

5.

Transfer learning: The key to functionally annotate the protein universe.

Bugnon, Leandro A; Fenoy, Emilio; Edera, Alejandro A; Raad, Jonathan; Stegmayer, Georgina; Milone, Diego H.

Patterns (N Y) ; 4(2): 100691, 2023 Feb 10.

Article in English | MEDLINE | ID: mdl-36873903

ABSTRACT

The automatic annotation of the protein universe is still an unresolved challenge. Today, there are 229,149,489 entries in the UniProtKB database, but only 0.25% of them have been functionally annotated. This manual process integrates knowledge from the protein families database Pfam, annotating family domains using sequence alignments and hidden Markov models. This approach has grown the Pfam annotations at a low rate in the last years. Recently, deep learning models appeared with the capability of learning evolutionary patterns from unaligned protein sequences. However, this requires large-scale data, while many families contain just a few sequences. Here, we contend this limitation can be overcome by transfer learning, exploiting the full potential of self-supervised learning on large unannotated data and then supervised learning on a small labeled dataset. We show results where errors in protein family prediction can be reduced by 55% with respect to standard methods.

6.

exp2GO: Improving Prediction of Functions in the Gene Ontology With Expression Data.

Di Persia, Leandro; Lopez, Tiago; Arce, Agustin; Milone, Diego H; Stegmayer, Georgina.

IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 999-1008, 2023.

Article in English | MEDLINE | ID: mdl-35417352

ABSTRACT

The computational methods for the prediction of gene function annotations aim to automatically find associations between a gene and a set of Gene Ontology (GO) terms describing its functions. Since the hand-made curation process of novel annotations and the corresponding wet experiments validations are very time-consuming and costly procedures, there is a need for computational tools that can reliably predict likely annotations and boost the discovery of new gene functions. This work proposes a novel method for predicting annotations based on the inference of GO similarities from expression similarities. The novel method was benchmarked against other methods on several public biological datasets, obtaining the best comparative results. exp2GO effectively improved the prediction of GO annotations in comparison to state-of-the-art methods. Furthermore, the proposal was validated with a full genome case where it was capable of predicting relevant and accurate biological functions. The repository of this project withh full data and code is available at https://github.com/sinc-lab/exp2GO.

Subject(s)

Computational Biology , Gene Ontology , Computational Biology/methods , Molecular Sequence Annotation , Phenotype

7.

Improving Anatomical Plausibility in Medical Image Segmentation via Hybrid Graph Neural Networks: Applications to Chest X-Ray Analysis.

Gaggion, Nicolas; Mansilla, Lucas; Mosquera, Candelaria; Milone, Diego H; Ferrante, Enzo.

IEEE Trans Med Imaging ; 42(2): 546-556, 2023 02.

Article in English | MEDLINE | ID: mdl-36423313

ABSTRACT

Anatomical segmentation is a fundamental task in medical image computing, generally tackled with fully convolutional neural networks which produce dense segmentation masks. These models are often trained with loss functions such as cross-entropy or Dice, which assume pixels to be independent of each other, thus ignoring topological errors and anatomical inconsistencies. We address this limitation by moving from pixel-level to graph representations, which allow to naturally incorporate anatomical constraints by construction. To this end, we introduce HybridGNet, an encoder-decoder neural architecture that leverages standard convolutions for image feature encoding and graph convolutional neural networks (GCNNs) to decode plausible representations of anatomical structures. We also propose a novel image-to-graph skip connection layer which allows localized features to flow from standard convolutional blocks to GCNN blocks, and show that it improves segmentation accuracy. The proposed architecture is extensively evaluated in a variety of domain shift and image occlusion scenarios, and audited considering different types of demographic domain shift. Our comprehensive experimental setup compares HybridGNet with other landmark and pixel-based models for anatomical segmentation in chest x-ray images, and shows that it produces anatomically plausible results in challenging scenarios where other models tend to fail.

Subject(s)

Image Processing, Computer-Assisted , Neural Networks, Computer , X-Rays , Image Processing, Computer-Assisted/methods , Radiography , Thorax/diagnostic imaging

8.

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.

Merino, Gabriela A; Saidi, Rabie; Milone, Diego H; Stegmayer, Georgina; Martin, Maria J.

Bioinformatics ; 38(19): 4488-4496, 2022 09 30.

Article in English | MEDLINE | ID: mdl-35929781

ABSTRACT

MOTIVATION: Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet. RESULTS: We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations. AVAILABILITY AND IMPLEMENTATION: DeeProtGO and a case of use are available at https://github.com/gamerino/DeeProtGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Deep Learning , Gene Ontology , Computational Biology/methods , Molecular Sequence Annotation , Proteins/metabolism

9.

Anc2vec: embedding gene ontology terms by preserving ancestors relationships.

Edera, Alejandro A; Milone, Diego H; Stegmayer, Georgina.

Brief Bioinform ; 23(2)2022 03 10.

Article in English | MEDLINE | ID: mdl-35136916

ABSTRACT

The gene ontology (GO) provides a hierarchical structure with a controlled vocabulary composed of terms describing functions and localization of gene products. Recent works propose vector representations, also known as embeddings, of GO terms that capture meaningful information about them. Significant performance improvements have been observed when these representations are used on diverse downstream tasks, such as the measurement of semantic similarity between GO terms and functional similarity between proteins. Despite the success shown by these approaches, existing embeddings of GO terms still fail to capture crucial structural features of the GO. Here, we present anc2vec, a novel protocol based on neural networks for constructing vector representations of GO terms by preserving three important ontological features: its ontological uniqueness, ancestors hierarchy and sub-ontology membership. The advantages of using anc2vec are demonstrated by systematic experiments on diverse tasks: visualization, sub-ontology prediction, inference of structurally related terms, retrieval of terms from aggregated embeddings, and prediction of protein-protein interactions. In these tasks, experimental results show that the performance of anc2vec representations is better than those of recent approaches. This demonstrates that higher performances on diverse tasks can be achieved by embeddings when the structure of the GO is better represented. Full source code and data are available at https://github.com/sinc-lab/anc2vec.

Subject(s)

Semantics , Software , Computational Biology/methods , Gene Ontology , Neural Networks, Computer , Proteins/genetics

10.

Transfer Learning Based on Optimal Transport for Motor Imagery Brain-Computer Interfaces.

Peterson, Victoria; Nieto, Nicolas; Wyser, Dominik; Lambercy, Olivier; Gassert, Roger; Milone, Diego H; Spies, Ruben D.

IEEE Trans Biomed Eng ; 69(2): 807-817, 2022 02.

Article in English | MEDLINE | ID: mdl-34406935

ABSTRACT

OBJECTIVE: This paper tackles the cross-sessions variability of electroencephalography-based brain-computer interfaces (BCIs) in order to avoid the lengthy recalibration step of the decoding method before every use. METHODS: We develop a new approach of domain adaptation based on optimal transport to tackle brain signal variability between sessions of motor imagery BCIs. We propose a backward method where, unlike the original formulation, the data from a new session are transported to a calibration session, and thereby avoiding model retraining. Several domain adaptation approaches are evaluated and compared. We simulated two possible online scenarios: i) block-wise adaptation and ii) sample-wise adaptation. In this study, we collect a dataset of 10 subjects performing a hand motor imagery task in 2 sessions. A publicly available dataset is also used. RESULTS: For the first scenario, results indicate that classifier retraining can be avoided by means of our backward formulation yielding to equivalent classification performance as compared to retraining solutions. In the second scenario, classification performance rises up to 90.23% overall accuracy when the label of the indicated mental task is used to learn the transport. Adaptive time is between 10 and 80 times faster than the other methods. CONCLUSIONS: The proposed method is able to mitigate the cross-session variability in motor imagery BCIs. SIGNIFICANCE: The backward formulation is an efficient retraining-free approach built to avoid lengthy calibration times. Thus, the BCI can be actively used after just a few minutes of setup. This is important for practical applications such as BCI-based motor rehabilitation.

Subject(s)

Brain-Computer Interfaces , Brain , Electroencephalography/methods , Humans , Learning , Machine Learning

11.

Bridging physiological and perceptual views of autism by means of sampling-based Bayesian inference.

Echeveste, Rodrigo; Ferrante, Enzo; Milone, Diego H; Samengo, Inés.

Netw Neurosci ; 6(1): 196-212, 2022 Feb.

Article in English | MEDLINE | ID: mdl-36605888

ABSTRACT

Theories for autism spectrum disorder (ASD) have been formulated at different levels, ranging from physiological observations to perceptual and behavioral descriptions. Understanding the physiological underpinnings of perceptual traits in ASD remains a significant challenge in the field. Here we show how a recurrent neural circuit model that was optimized to perform sampling-based inference and displays characteristic features of cortical dynamics can help bridge this gap. The model was able to establish a mechanistic link between two descriptive levels for ASD: a physiological level, in terms of inhibitory dysfunction, neural variability, and oscillations, and a perceptual level, in terms of hypopriors in Bayesian computations. We took two parallel paths-inducing hypopriors in the probabilistic model, and an inhibitory dysfunction in the network model-which lead to consistent results in terms of the represented posteriors, providing support for the view that both descriptions might constitute two sides of the same coin.

12.

miRe2e: a full end-to-end deep model based on transformers for prediction of pre-miRNAs.

Raad, Jonathan; Bugnon, Leandro A; Milone, Diego H; Stegmayer, Georgina.

Bioinformatics ; 38(5): 1191-1197, 2022 02 07.

Article in English | MEDLINE | ID: mdl-34875006

ABSTRACT

MOTIVATION: MicroRNAs (miRNAs) are small RNA sequences with key roles in the regulation of gene expression at post-transcriptional level in different species. Accurate prediction of novel miRNAs is needed due to their importance in many biological processes and their associations with complicated diseases in humans. Many machine learning approaches were proposed in the last decade for this purpose, but requiring handcrafted features extraction to identify possible de novo miRNAs. More recently, the emergence of deep learning (DL) has allowed the automatic feature extraction, learning relevant representations by themselves. However, the state-of-art deep models require complex pre-processing of the input sequences and prediction of their secondary structure to reach an acceptable performance. RESULTS: In this work, we present miRe2e, the first full end-to-end DL model for pre-miRNA prediction. This model is based on Transformers, a neural architecture that uses attention mechanisms to infer global dependencies between inputs and outputs. It is capable of receiving the raw genome-wide data as input, without any pre-processing nor feature engineering. After a training stage with known pre-miRNAs, hairpin and non-harpin sequences, it can identify all the pre-miRNA sequences within a genome. The model has been validated through several experimental setups using the human genome, and it was compared with state-of-the-art algorithms obtaining 10 times better performance. AVAILABILITY AND IMPLEMENTATION: Webdemo available at https://sinc.unl.edu.ar/web-demo/miRe2e/ and source code available for download at https://github.com/sinc-lab/miRe2e. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

MicroRNAs , Humans , MicroRNAs/genetics , MicroRNAs/chemistry , Algorithms , Machine Learning , Genome, Human , Computational Biology

13.

On the relationship between research parasites and fairness in machine learning: challenges and opportunities.

Nieto, Nicolás; Larrazabal, Agostina; Peterson, Victoria; Milone, Diego H; Ferrante, Enzo.

Gigascience ; 10(12)2021 12 20.

Article in English | MEDLINE | ID: mdl-34927190

ABSTRACT

Machine learning systems influence our daily lives in many different ways. Hence, it is crucial to ensure that the decisions and recommendations made by these systems are fair, equitable, and free of unintended biases. Over the past few years, the field of fairness in machine learning has grown rapidly, investigating how, when, and why these models capture, and even potentiate, biases that are deeply rooted not only in the training data but also in our society. In this Commentary, we discuss challenges and opportunities for rigorous posterior analyses of publicly available data to build fair and equitable machine learning systems, focusing on the importance of training data, model construction, and diversity in the team of developers. The thoughts presented here have grown out of the work we did, which resulted in our winning the annual Research Parasite Award that GigaSciencesponsors.

Subject(s)

Parasites , Animals , Machine Learning

14.

Deepred-Mt: Deep representation learning for predicting C-to-U RNA editing in plant mitochondria.

Edera, Alejandro A; Small, Ian; Milone, Diego H; Sanchez-Puerta, M Virginia.

Comput Biol Med ; 136: 104682, 2021 09.

Article in English | MEDLINE | ID: mdl-34343887

ABSTRACT

In land plant mitochondria, C-to-U RNA editing converts cytidines into uridines at highly specific RNA positions called editing sites. This editing step is essential for the correct functioning of mitochondrial proteins. When using sequence homology information, edited positions can be computationally predicted with high precision. However, predictions based on the sequence contexts of such edited positions often result in lower precision, which is limiting further advances on novel genetic engineering techniques for RNA regulation. Here, a deep convolutional neural network called Deepred-Mt is proposed. It predicts C-to-U editing events based on the 40 nucleotides flanking a given cytidine. Unlike existing methods, Deepred-Mt was optimized by using editing extent information, novel strategies of data augmentation, and a large-scale training dataset, constructed with deep RNA sequencing data of 21 plant mitochondrial genomes. In comparison to predictive methods based on sequence homology, Deepred-Mt attains significantly better predictive performance, in terms of average precision as well as F1 score. In addition, our approach is able to recognize well-known sequence motifs linked to RNA editing, and shows that the local RNA structure surrounding editing sites may be a relevant factor regulating their editing. These results demonstrate that Deepred-Mt is an effective tool for predicting C-to-U RNA editing in plant mitochondria. Source code, datasets, and detailed use cases are freely available at https://github.com/aedera/deepredmt.

Subject(s)

Mitochondria , RNA Editing , Mitochondria/genetics , RNA Editing/genetics

15.

ChronoRoot: High-throughput phenotyping by deep segmentation networks reveals novel temporal parameters of plant root system architecture.

Gaggion, Nicolás; Ariel, Federico; Daric, Vladimir; Lambert, Éric; Legendre, Simon; Roulé, Thomas; Camoirano, Alejandra; Milone, Diego H; Crespi, Martin; Blein, Thomas; Ferrante, Enzo.

Gigascience ; 10(7)2021 07 20.

Article in English | MEDLINE | ID: mdl-34282452

ABSTRACT

BACKGROUND: Deep learning methods have outperformed previous techniques in most computer vision tasks, including image-based plant phenotyping. However, massive data collection of root traits and the development of associated artificial intelligence approaches have been hampered by the inaccessibility of the rhizosphere. Here we present ChronoRoot, a system that combines 3D-printed open-hardware with deep segmentation networks for high temporal resolution phenotyping of plant roots in agarized medium. RESULTS: We developed a novel deep learning-based root extraction method that leverages the latest advances in convolutional neural networks for image segmentation and incorporates temporal consistency into the root system architecture reconstruction process. Automatic extraction of phenotypic parameters from sequences of images allowed a comprehensive characterization of the root system growth dynamics. Furthermore, novel time-associated parameters emerged from the analysis of spectral features derived from temporal signals. CONCLUSIONS: Our work shows that the combination of machine intelligence methods and a 3D-printed device expands the possibilities of root high-throughput phenotyping for genetics and natural variation studies, as well as the screening of clock-related mutants, revealing novel root traits.

Subject(s)

Artificial Intelligence , Neural Networks, Computer , Phenotype , Plant Roots , Plants

16.

Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning.

Bugnon, Leandro A; Yones, Cristian; Milone, Diego H; Stegmayer, Georgina.

Brief Bioinform ; 22(3)2021 05 20.

Article in English | MEDLINE | ID: mdl-34020552

ABSTRACT

MOTIVATION: The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data. RESULTS: In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives. AVAILABILITY: The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.

Subject(s)

Genome , Machine Learning , MicroRNAs/genetics , RNA Precursors/genetics , Animals , Arabidopsis/genetics , Computational Biology/methods , Humans

17.

Novel SARS-CoV-2 encoded small RNAs in the passage to humans.

Merino, Gabriela A; Raad, Jonathan; Bugnon, Leandro A; Yones, Cristian; Kamenetzky, Laura; Claus, Juan; Ariel, Federico; Milone, Diego H; Stegmayer, Georgina.

Bioinformatics ; 36(24): 5571-5581, 2021 04 05.

Article in English | MEDLINE | ID: mdl-33244583

ABSTRACT

MOTIVATION: The Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) has recently emerged as the responsible for the pandemic outbreak of the coronavirus disease 2019. This virus is closely related to coronaviruses infecting bats and Malayan pangolins, species suspected to be an intermediate host in the passage to humans. Several genomic mutations affecting viral proteins have been identified, contributing to the understanding of the recent animal-to-human transmission. However, the capacity of SARS-CoV-2 to encode functional putative microRNAs (miRNAs) remains largely unexplored. RESULTS: We have used deep learning to discover 12 candidate stem-loop structures hidden in the viral protein-coding genome. Among the precursors, the expression of eight mature miRNAs-like sequences was confirmed in small RNA-seq data from SARS-CoV-2 infected human cells. Predicted miRNAs are likely to target a subset of human genes of which 109 are transcriptionally deregulated upon infection. Remarkably, 28 of those genes potentially targeted by SARS-CoV-2 miRNAs are down-regulated in infected human cells. Interestingly, most of them have been related to respiratory diseases and viral infection, including several afflictions previously associated with SARS-CoV-1 and SARS-CoV-2. The comparison of SARS-CoV-2 pre-miRNA sequences with those from bat and pangolin coronaviruses suggests that single nucleotide mutations could have helped its progenitors jumping inter-species boundaries, allowing the gain of novel mature miRNAs targeting human mRNAs. Our results suggest that the recent acquisition of novel miRNAs-like sequences in the SARS-CoV-2 genome may have contributed to modulate the transcriptional reprograming of the new host upon infection. AVAILABILITY AND IMPLEMENTATION: https://github.com/sinc-lab/sarscov2-mirna-discovery. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

COVID-19 , Coronavirus , Animals , Betacoronavirus , Coronavirus/genetics , Genome, Viral , Humans , Pandemics , SARS-CoV-2

18.

Audio recordings dataset of grazing jaw movements in dairy cattle.

Vanrell, Sebastián R; Chelotti, José O; Bugnon, Leandro A; Rufiner, H Leonardo; Milone, Diego H; Laca, Emilio A; Galli, Julio R.

Data Brief ; 30: 105623, 2020 Jun.

Article in English | MEDLINE | ID: mdl-32420421

ABSTRACT

This dataset is composed of correlated audio recordings and labels of ingestive jaw movements performed during grazing by dairy cattle. Using a wireless microphone, we recorded sounds of three Holstein dairy cows grazing short and tall alfalfa and short and tall fescue. Two experts in grazing behavior identified and labeled the start, end, and type of each jaw movement: bite, chew, and chew-bite (compound movement). For each segment of raw audio corresponding to a jaw movement we computed four well-known features: amplitude, duration, zero crossings, and envelope symmetry. These features are in the dataset and can be used as inputs to build automated methods for classification of ingestive jaw movements. Cow's grazing behavior can be monitored and characterized by identifying and analyzing these masticatory events.

19.

Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis.

Larrazabal, Agostina J; Nieto, Nicolás; Peterson, Victoria; Milone, Diego H; Ferrante, Enzo.

Proc Natl Acad Sci U S A ; 117(23): 12592-12594, 2020 06 09.

Article in English | MEDLINE | ID: mdl-32457147

ABSTRACT

Artificial intelligence (AI) systems for computer-aided diagnosis and image-based screening are being adopted worldwide by medical institutions. In such a context, generating fair and unbiased classifiers becomes of paramount importance. The research community of medical image computing is making great efforts in developing more accurate algorithms to assist medical doctors in the difficult task of disease diagnosis. However, little attention is paid to the way databases are collected and how this may influence the performance of AI systems. Our study sheds light on the importance of gender balance in medical imaging datasets used to train AI systems for computer-assisted diagnosis. We provide empirical evidence supported by a large-scale study, based on three deep neural network architectures and two well-known publicly available X-ray image datasets used to diagnose various thoracic diseases under different gender imbalance conditions. We found a consistent decrease in performance for underrepresented genders when a minimum balance is not fulfilled. This raises the alarm for national agencies in charge of regulating and approving computer-assisted diagnosis systems, which should include explicit gender balance and diversity recommendations. We also establish an open problem for the academic medical image computing community which needs to be addressed by novel algorithms endowed with robustness to gender imbalance.

Subject(s)

Datasets as Topic/standards , Deep Learning/standards , Radiographic Image Interpretation, Computer-Assisted/standards , Radiography, Thoracic/standards , Bias , Female , Humans , Male , Reference Standards , Sex Factors

20.

Learning deformable registration of medical images with anatomical constraints.

Mansilla, Lucas; Milone, Diego H; Ferrante, Enzo.

Neural Netw ; 124: 269-279, 2020 Apr.

Article in English | MEDLINE | ID: mdl-32035306

ABSTRACT

Deformable image registration is a fundamental problem in the field of medical image analysis. During the last years, we have witnessed the advent of deep learning-based image registration methods which achieve state-of-the-art performance, and drastically reduce the required computational time. However, little work has been done regarding how can we encourage our models to produce not only accurate, but also anatomically plausible results, which is still an open question in the field. In this work, we argue that incorporating anatomical priors in the form of global constraints into the learning process of these models, will further improve their performance and boost the realism of the warped images after registration. We learn global non-linear representations of image anatomy using segmentation masks, and employ them to constraint the registration process. The proposed AC-RegNet architecture is evaluated in the context of chest X-ray image registration using three different datasets, where the high anatomical variability makes the task extremely challenging. Our experiments show that the proposed anatomically constrained registration model produces more realistic and accurate results than state-of-the-art methods, demonstrating the potential of this approach.

Subject(s)

Deep Learning , Radiographic Image Enhancement/methods , Humans , Radiographic Image Enhancement/standards

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL