Pesquisa | Portal Regional da BVS (teste)

1.

FAIR privacy-preserving operation of large genomic variant calling format (VCF) data without download or installation.

Martins, Yasmmin C; Bhawsar, Praphulla Ms; Balasubramanian, Jeya B; Russ, Daniel; Wong, Wendy Sw; Maass, Wolfgang; Almeida, Jonas S.

AMIA Jt Summits Transl Sci Proc ; 2024: 65-74, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38827109

RESUMO

Motivation: The proliferation of genetic testing and consumer genomics represents a logistic challenge to the personalized use of GWAS data in VCF format. Specifically, the challenge of retrieving target genetic variation from large compressed files filled with unrelated variation information. Compounding the data traversal challenge, privacy-sensitive VCF files are typically managed as large stand-alone single files (no companion index file) composed of variable-sized compressed chunks, hosted in consumer-facing environments with no native support for hosted execution. Results: A portable JavaScript module was developed to support in-browser fetching of partial content using byte-range requests. This includes on-the-fly decompressing irregularly positioned compressed chunks, coupled with a binary search algorithm iteratively identifying chromosome-position ranges. The in-browser zero-footprint solution (no downloads, no installations) enables the interoperability, reusability, and user-facing governance advanced by the FAIR principles for stewardship of scientific data. Availability - https://episphere.github.io/vcf, including supplementary material.

2.

mSigSDK - private, at scale, computation of mutation signatures.

Ge, Aaron; Zhang, Tongwu; Martins, Yasmmin Côrtes; Landi, Maria Teresa; Park, Brian; Chen, Kailing; Balasubramanian, Jeya; Almeida, Jonas S.

ArXiv ; 2024 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-38327678

RESUMO

In our previous work, we demonstrated that it is feasible to perform analysis on mutation signature data without the need for downloads or installations and analyze individual patient data at scale without compromising privacy. Building on this foundation, we developed an in-browser Software Development Kit (a JavaScript SDK), mSigSDK, to facilitate the orchestration of distributed data processing workflows and graphic visualization of mutational signature analysis results. We strictly adhered to modern web computing standards, particularly the modularization standards set by the ECMAScript ES6 framework (JavaScript modules). Our approach allows for the computation to be entirely performed by secure delegation to the computational resources of the user's own machine (in-browser), without any downloads or installations. The mSigSDK was developed primarily as a companion library to the mSig Portal resource of the National Cancer Institute Division of Cancer Epidemiology and Genetics (NIH/NCI/DCEG), with a focus on FAIR extensibility as components of other researchers' own data science constructs. Anticipated extensions include the programmatic operation of other mutation signature API ecosystems such as SIGNAL and COSMIC, advancing towards a data commons for mutational signature research (Grossman et al., 2016).

3.

PRScalc, a privacy-preserving calculation of raw polygenic risk scores from direct-to-consumer genomics data.

Sandoval, Lorena; Jafri, Saleet; Balasubramanian, Jeya Balaji; Bhawsar, Praphulla; Edelson, Jacob L; Martins, Yasmmin; Maass, Wolfgang; Chanock, Stephen J; Garcia-Closas, Montserrat; Almeida, Jonas S.

Bioinform Adv ; 3(1): vbad145, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37868335

RESUMO

Motivation: Currently, the Polygenic Score (PGS) Catalog curates over 400 publications on over 500 traits corresponding to over 3000 polygenic risk scores (PRSs). To assess the feasibility of privately calculating the underlying multivariate relative risk for individuals with consumer genomics data, we developed an in-browserPRS calculator for genomic data that does not circulate any data or engage in any computation outside of the user's personal device. Results: A prototype personal risk score calculator, created for research purposes, was developed to demonstrate how the PGS Catalog can be privately and readily applied to readily available direct-to-consumer genetic testing services, such as 23andMe. No software download, installation, or configuration is needed. The PRS web calculator matches individual PGS catalog entries with an individual's 23andMe genome data composed of 600k to 1.4 M single-nucleotide polymorphisms (SNPs). Beta coefficients provide researchers with a convenient assessment of risk associated with matched SNPs. This in-browser application was tested in a variety of personal devices, including smartphones, establishing the feasibility of privately calculating personal risk scores with up to a few thousand reference genetic variations and from the full 23andMe SNP data file (compressed or not). Availability and implementation: The PRScalc web application is developed in JavaScript, HTML, and CSS and is available at GitHub repository (https://episphere.github.io/prs) under an MIT license. The datasets were derived from sources in the public domain: [PGS Catalog, Personal Genome Project].

4.

PPIntegrator: semantic integrative system for protein-protein interaction and application for host-pathogen datasets.

Martins, Yasmmin Côrtes; Ziviani, Artur; Cerqueira E Costa, Maiana de Oliveira; Cavalcanti, Maria Cláudia Reis; Nicolás, Marisa Fabiana; de Vasconcelos, Ana Tereza Ribeiro.

Bioinform Adv ; 3(1): vbad067, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37359724

RESUMO

Summary: Semantic web standards have shown importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is protein-protein interactions (PPIs) which have applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential host-pathogen datasets by transitivity analysis. PPIntegrator contains a data preparation module to organize data from three reference databases and a triplification and data fusion module to describe the provenance information and results. This work provides an overview of the PPIntegrator system applied to integrate and compare host-pathogen PPI datasets from four bacterial species using our proposed transitivity analysis pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system. Availability and implementation: https://github.com/YasCoMa/ppintegrator, https://github.com/YasCoMa/ppi_validation_process and https://github.com/YasCoMa/predprin.

5.

Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion.

Terra Machado, Douglas; Bernardes Brustolini, Otávio José; Côrtes Martins, Yasmmin; Grivet Mattoso Maia, Marco Antonio; Ribeiro de Vasconcelos, Ana Tereza.

PeerJ ; 11: e15145, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37033732

RESUMO

Background: Technological advances involving RNA-Seq and Bioinformatics allow quantifying the transcriptional levels of genes in cells, tissues, and cell lines, permitting the identification of Differentially Expressed Genes (DEGs). DESeq2 and edgeR are well-established computational tools used for this purpose and they are based upon generalized linear models (GLMs) that consider only fixed effects in modeling. However, the inclusion of random effects reduces the risk of missing potential DEGs that may be essential in the context of the biological phenomenon under investigation. The generalized linear mixed models (GLMM) can be used to include both effects. Methods: We present DEGRE (Differentially Expressed Genes with Random Effects), a user-friendly tool capable of inferring DEGs where fixed and random effects on individuals are considered in the experimental design of RNA-Seq research. DEGRE preprocesses the raw matrices before fitting GLMMs on the genes and the derived regression coefficients are analyzed using the Wald statistical test. DEGRE offers the Benjamini-Hochberg or Bonferroni techniques for P-value adjustment. Results: The datasets used for DEGRE assessment were simulated with known identification of DEGs. These have fixed effects, and the random effects were estimated and inserted to measure the impact of experimental designs with high biological variability. For DEGs' inference, preprocessing effectively prepares the data and retains overdispersed genes. The biological coefficient of variation is inferred from the counting matrices to assess variability before and after the preprocessing. The DEGRE is computationally validated through its performance by the simulation of counting matrices, which have biological variability related to fixed and random effects. DEGRE also provides improved assessment measures for detecting DEGs in cases with higher biological variability. We show that the preprocessing established here effectively removes technical variation from those matrices. This tool also detects new potential candidate DEGs in the transcriptome data of patients with bipolar disorder, presenting a promising tool to detect more relevant genes. Conclusions: DEGRE provides data preprocessing and applies GLMMs for DEGs' inference. The preprocessing allows efficient remotion of genes that could impact the inference. Also, the computational and biological validation of DEGRE has shown to be promising in identifying possible DEGs in experiments derived from complex experimental designs. This tool may help handle random effects on individuals in the inference of DEGs and presents a potential for discovering new interesting DEGs for further biological investigation.

Assuntos

Perfilação da Expressão Gênica , Transcriptoma , Humanos , Modelos Lineares , Perfilação da Expressão Gênica/métodos , Transcriptoma/genética , Biologia Computacional/métodos

6.

Emergence of Within-Host SARS-CoV-2 Recombinant Genome After Coinfection by Gamma and Delta Variants: A Case Report.

Francisco Junior, Ronaldo da Silva; de Almeida, Luiz G P; Lamarca, Alessandra P; Cavalcante, Liliane; Martins, Yasmmin; Gerber, Alexandra L; Guimarães, Ana Paula de C; Salviano, Ricardo Barbosa; Dos Santos, Fernanda Leitão; de Oliveira, Thiago Henrique; de Souza, Isabelle Vasconcellos; de Carvalho, Erika Martins; Ribeiro, Mario Sergio; Carvalho, Silvia; da Silva, Flávio Dias; Garcia, Marcio Henrique de Oliveira; de Souza, Leandro Magalhães; da Silva, Cristiane Gomes; Ribeiro, Caio Luiz Pereira; Cavalcanti, Andréa Cony; de Mello, Claudia Maria Braga; Tanuri, Amilcar; Vasconcelos, Ana Tereza R.

Front Public Health ; 10: 849978, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35273945

RESUMO

In this study, we report the first case of intra-host SARS-CoV-2 recombination during a coinfection by the variants of concern (VOC) AY.33 (Delta) and P.1 (Gamma) supported by sequencing reads harboring a mosaic of lineage-defining mutations. By using next-generation sequencing reads intersecting regions that simultaneously overlap lineage-defining mutations from Gamma and Delta, we were able to identify a total of six recombinant regions across the SARS-CoV-2 genome within a sample. Four of them mapped in the spike gene and two in the nucleocapsid gene. We detected mosaic reads harboring a combination of lineage-defining mutations from each VOC. To our knowledge, this is the first report of intra-host RNA-RNA recombination between two lineages of SARS-CoV-2, which can represent a threat to public health management during the COVID-19 pandemic due to the possibility of the emergence of viruses with recombinant phenotypes.

Assuntos

COVID-19 , Coinfecção , Humanos , Pandemias , Filogenia , SARS-CoV-2/genética

7.

Differential haplotype expression in class I MHC genes during SARS-CoV-2 infection of human lung cell lines.

Francisco Junior, Ronaldo da Silva; Temerozo, Jairo R; Ferreira, Cristina Dos Santos; Martins, Yasmmin; Souza, Thiago Moreno L; Medina-Acosta, Enrique; de Vasconcelos, Ana Tereza Ribeiro.

Front Immunol ; 13: 1101526, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36818472

RESUMO

Introduction: Cell entry of SARS-CoV-2 causes genome-wide disruption of the transcriptional profiles of genes and biological pathways involved in the pathogenesis of COVID-19. Expression allelic imbalance is characterized by a deviation from the Mendelian expected 1:1 expression ratio and is an important source of allele-specific heterogeneity. Expression allelic imbalance can be measured by allele-specific expression analysis (ASE) across heterozygous informative expressed single nucleotide variants (eSNVs). ASE reflects many regulatory biological phenomena that can be assessed by combining genome and transcriptome information. ASE contributes to the interindividual variability associated with the disease. We aim to estimate the transcriptome-wide impact of SARS-CoV-2 infection by analyzing eSNVs. Methods: We compared ASE profiles in the human lung cell lines Calu-3, A459, and H522 before and after infection with SARS-CoV-2 using RNA-Seq experiments. Results: We identified 34 differential ASE (DASE) sites in 13 genes (HLA-A, HLA-B, HLA-C, BRD2, EHD2, GFM2, GSPT1, HAVCR1, MAT2A, NQO2, SUPT6H, TNFRSF11A, UMPS), all of which are enriched in protein binding functions and play a role in COVID-19. Most DASE sites were assigned to the MHC class I locus and were predominantly upregulated upon infection. DASE sites in the MHC class I locus also occur in iPSC-derived airway epithelium basal cells infected with SARS-CoV-2. Using an RNA-Seq haplotype reconstruction approach, we found DASE sites and adjacent eSNVs in phase (i.e., predicted on the same DNA strand), demonstrating differential haplotype expression upon infection. We found a bias towards the expression of the HLA alleles with a higher binding affinity to SARS-CoV-2 epitopes. Discussion: Independent of gene expression compensation, SARS-CoV-2 infection of human lung cell lines induces transcriptional allelic switching at the MHC loci. This suggests a response mechanism to SARS-CoV-2 infection that swaps HLA alleles with poor epitope binding affinity, an expectation supported by publicly available proteome data.

Assuntos

COVID-19 , Humanos , Alelos , Epitopos , Haplótipos , Pulmão , Metionina Adenosiltransferase , SARS-CoV-2 , Antígenos de Histocompatibilidade Classe I/genética

8.

The gene regulatory network of Staphylococcus aureus ST239-SCCmecIII strain Bmb9393 and assessment of genes associated with the biofilm in diverse backgrounds.

Costa, Maiana de Oliveira Cerqueira E; do Nascimento, Ana Paula Barbosa; Martins, Yasmmin Cortes; Dos Santos, Marcelo Trindade; Figueiredo, Agnes Marie de Sá; Perez-Rueda, Ernesto; Nicolás, Marisa Fabiana.

Front Microbiol ; 13: 1049819, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36704545

RESUMO

Introduction: Staphylococcus aureus is one of the most prevalent and relevant pathogens responsible for a wide spectrum of hospital-associated or community-acquired infections. In addition, methicillin-resistant Staphylococcus aureus may display multidrug resistance profiles that complicate treatment and increase the mortality rate. The ability to produce biofilm, particularly in device-associated infections, promotes chronic and potentially more severe infections originating from the primary site. Understanding the complex mechanisms involved in planktonic and biofilm growth is critical to identifying regulatory connections and ways to overcome the global health problem of multidrug-resistant bacteria. Methods: In this work, we apply literature-based and comparative genomics approaches to reconstruct the gene regulatory network of the high biofilm-producing strain Bmb9393, belonging to one of the highly disseminating successful clones, the Brazilian epidemic clone. To the best of our knowledge, we describe for the first time the topological properties and network motifs for the Staphylococcus aureus pathogen. We performed this analysis using the ST239-SCCmecIII Bmb9393 strain. In addition, we analyzed transcriptomes available in the literature to construct a set of genes differentially expressed in the biofilm, covering different stages of the biofilms and genetic backgrounds of the strains. Results and discussion: The Bmb9393 gene regulatory network comprises 1,803 regulatory interactions between 64 transcription factors and the non-redundant set of 1,151 target genes with the inclusion of 19 new regulons compared to the N315 transcriptional regulatory network published in 2011. In the Bmb9393 network, we found 54 feed-forward loop motifs, where the most prevalent were coherent type 2 and incoherent type 2. The non-redundant set of differentially expressed genes in the biofilm consisted of 1,794 genes with functional categories relevant for adaptation to the variable microenvironments established throughout the biofilm formation process. Finally, we mapped the set of genes with altered expression in the biofilm in the Bmb9393 gene regulatory network to depict how different growth modes can alter the regulatory systems. The data revealed 45 transcription factors and 876 shared target genes. Thus, the gene regulatory network model provided represents the most up-to-date model for Staphylococcus aureus, and the set of genes altered in the biofilm provides a global view of their influence on biofilm formation from distinct experimental perspectives and different strain backgrounds.

9.

EpiCurator: an immunoinformatic workflow to predict and prioritize SARS-CoV-2 epitopes.

Ferreira, Cristina S; Martins, Yasmmin C; Souza, Rangel Celso; Vasconcelos, Ana Tereza R.

PeerJ ; 9: e12548, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34909278

RESUMO

The ongoing coronavirus 2019 (COVID-19) pandemic, triggered by the emerging SARS-CoV-2 virus, represents a global public health challenge. Therefore, the development of effective vaccines is an urgent need to prevent and control virus spread. One of the vaccine production strategies uses the in silico epitope prediction from the virus genome by immunoinformatic approaches, which assist in selecting candidate epitopes for in vitro and clinical trials research. This study introduces the EpiCurator workflow to predict and prioritize epitopes from SARS-CoV-2 genomes by combining a series of computational filtering tools. To validate the workflow effectiveness, SARS-CoV-2 genomes retrieved from the GISAID database were analyzed. We identified 11 epitopes in the receptor-binding domain (RBD) of Spike glycoprotein, an important antigenic determinant, not previously described in the literature or published on the Immune Epitope Database (IEDB). Interestingly, these epitopes have a combination of important properties: recognized in sequences of the current variants of concern, present high antigenicity, conservancy, and broad population coverage. The RBD epitopes were the source for a multi-epitope design to in silico validation of their immunogenic potential. The multi-epitope overall quality was computationally validated, endorsing its efficiency to trigger an effective immune response since it has stability, high antigenicity and strong interactions with Toll-Like Receptors (TLR). Taken together, the findings in the current study demonstrated the efficacy of the workflow for epitopes discovery, providing target candidates for immunogen development.

10.

Turnover of SARS-CoV-2 Lineages Shaped the Pandemic and Enabled the Emergence of New Variants in the State of Rio de Janeiro, Brazil.

Francisco Junior, Ronaldo da Silva; Lamarca, Alessandra P; de Almeida, Luiz G P; Cavalcante, Liliane; Machado, Douglas Terra; Martins, Yasmmin; Brustolini, Otávio; Gerber, Alexandra L; Guimarães, Ana Paula de C; Gonçalves, Reinaldo Bellini; Alves, Cassia; Mariani, Diana; Cruz, Thais Felix; de Souza, Isabelle Vasconcellos; de Carvalho, Erika Martins; Ribeiro, Mario Sergio; Carvalho, Silvia; da Silva, Flávio Dias; Garcia, Márcio Henrique de Oliveira; de Souza, Leandro Magalhães; da Silva, Cristiane Gomes; Ribeiro, Caio Luiz Pereira; Cavalcanti, Andréa Cony; de Mello, Claudia Maria Braga; Struchiner, Cláudio J; Tanuri, Amilcar; de Vasconcelos, Ana Tereza R.

Viruses ; 13(10)2021 10 07.

Artigo em Inglês | MEDLINE | ID: mdl-34696443

RESUMO

In the present study, we provide a retrospective genomic epidemiology analysis of the SARS-CoV-2 pandemic in the state of Rio de Janeiro, Brazil. We gathered publicly available data from GISAID and sequenced 1927 new genomes sampled periodically from March 2021 to June 2021 from 91 out of the 92 cities of the state. Our results showed that the pandemic was characterized by three different phases driven by a successive replacement of lineages. Interestingly, we noticed that viral supercarriers accounted for the overwhelming majority of the circulating virus (>90%) among symptomatic individuals in the state. Moreover, SARS-CoV-2 genomic surveillance also revealed the emergence and spread of two new variants (P.5 and P.1.2), firstly reported in this study. Our findings provided important lessons learned from the different epidemiological aspects of the SARS-CoV-2 dynamic in Rio de Janeiro. Altogether, this might have a strong potential to shape future decisions aiming to improve public health management and understanding mechanisms underlying virus dispersion.

Assuntos

COVID-19/epidemiologia , Genoma Viral/genética , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Brasil/epidemiologia , COVID-19/mortalidade , Criança , Pré-Escolar , Hotspot de Doença , Monitoramento Epidemiológico , Feminino , Biblioteca Gênica , Humanos , Lactente , Recém-Nascido , Masculino , Pessoa de Meia-Idade , Filogenia , Estudos Retrospectivos , Adulto Jovem

11.

Large-Scale Protein Interactions Prediction by Multiple Evidence Analysis Associated With an In-Silico Curation Strategy.

Martins, Yasmmin Côrtes; Ziviani, Artur; Nicolás, Marisa Fabiana; de Vasconcelos, Ana Tereza Ribeiro.

Front Bioinform ; 1: 731345, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-36303787

RESUMO

Predicting the physical or functional associations through protein-protein interactions (PPIs) represents an integral approach for inferring novel protein functions and discovering new drug targets during repositioning analysis. Recent advances in high-throughput data generation and multi-omics techniques have enabled large-scale PPI predictions, thus promoting several computational methods based on different levels of biological evidence. However, integrating multiple results and strategies to optimize, extract interaction features automatically and scale up the entire PPI prediction process is still challenging. Most procedures do not offer an in-silico validation process to evaluate the predicted PPIs. In this context, this paper presents the PredPrIn scientific workflow that enables PPI prediction based on multiple lines of evidence, including the structure, sequence, and functional annotation categories, by combining boosting and stacking machine learning techniques. We also present a pipeline (PPIVPro) for the validation process based on cellular co-localization filtering and a focused search of PPI evidence on scientific publications. Thus, our combined approach provides means to extensive scale training or prediction of new PPIs and a strategy to evaluate the prediction quality. PredPrIn and PPIVPro are publicly available at https://github.com/YasCoMa/predprin and https://github.com/YasCoMa/ppi_validation_process.

12.

Pervasive Inter-Individual Variation in Allele-Specific Expression in Monozygotic Twins.

da Silva Francisco Junior, Ronaldo; Dos Santos Ferreira, Cristina; Santos E Silva, Juan Carlo; Terra Machado, Douglas; Côrtes Martins, Yasmmin; Ramos, Victor; Simões Carnivali, Gustavo; Garcia, Ana Beatriz; Medina-Acosta, Enrique.

Front Genet ; 10: 1178, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31850058

RESUMO

Despite being developed from one zygote, heterokaryotypic monozygotic (MZ) co-twins exhibit discordant karyotypes. Epigenomic studies in biological samples from heterokaryotypic MZ co-twins are of the most significant value for assessing the effects on gene- and allele-specific expression of an extranumerary chromosomal copy or structural chromosomal disparities in otherwise nearly identical germline genetic contributions. Here, we use RNA-Seq data from existing repositories to establish within-pair correlations for the breadth and magnitude of allele-specific expression (ASE) in heterokaryotypic MZ co-twins discordant for trisomy 21 and maternal 21q inheritance, as well as homokaryotypic co-twins. We show that there is a genome-wide disparity at ASE sites between the heterokaryotypic MZ co-twins. Although most of the disparity corresponds to changes in the magnitude of biallelic imbalance, ASE sites switching from either strictly monoallelic to biallelic imbalance or the reverse occur in few genes that are known or predicted to be imprinted, subject to X-chromosome inactivation or A-to-I(G) RNA edited. We also uncovered comparable ASE differences between homokaryotypic MZ twins. The extent of ASE discordance in MZ twins (2.7%) was about 10-fold lower than the expected between pairs of unrelated, non-twin males or females. The results indicate that the observed within-pair dissimilarities in breadth and magnitude of ASE sites in the heterokaryotypic MZ co-twins could not solely be attributable to the aneuploidy and the missing allelic heritability at 21q.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA