Pesquisa | Portal Regional da BVS

Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients.

Ribeiro-Dantas, Marcel da Câmara; Li, Honghao; Cabeli, Vincent; Dupuis, Louise; Simon, Franck; Hettal, Liza; Hamy, Anne-Sophie; Isambert, Hervé.

iScience ; 27(5): 109736, 2024 May 17.

Artigo em Inglês | MEDLINE | ID: mdl-38711452

RESUMO

Discovering causal effects is at the core of scientific investigation but remains challenging when only observational data are available. In practice, causal networks are difficult to learn and interpret, and limited to relatively small datasets. We report a more reliable and scalable causal discovery method (iMIIC), based on a general mutual information supremum principle, which greatly improves the precision of inferred causal relations while distinguishing genuine causes from putative and latent causal effects. We showcase iMIIC on synthetic and real-world healthcare data from 396,179 breast cancer patients from the US Surveillance, Epidemiology, and End Results program. More than 90% of predicted causal effects appear correct, while the remaining unexpected direct and indirect causal effects can be interpreted in terms of diagnostic procedures, therapeutic timing, patient preference or socio-economic disparity. iMIIC's unique capabilities open up new avenues to discover reliable and interpretable causal networks across a range of research fields.

Discovering temporal scientometric knowledge in COVID-19 scholarly production.

Santos, Breno Santana; Silva, Ivanovitch; Lima, Luciana; Endo, Patricia Takako; Alves, Gisliany; Ribeiro-Dantas, Marcel da Câmara.

Scientometrics ; 127(3): 1609-1642, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35068619

RESUMO

The mapping and analysis of scientific knowledge makes it possible to identify the dynamics and/or growth of a particular field of research or to support strategic decisions related to different research entities, based on bibliometric and/or scientometric indicators. However, with the exponential growth of scientific production, a systematic and data-oriented approach to the analysis of this large set of productions becomes increasingly essential. Thus, in this work, a data-oriented methodology was proposed, combining Data Analysis, Machine Learning and Complex Network Analysis techniques, and Data Version Control (DVC) tool, for the extraction of implicit knowledge in scientific production bases. In addition, the approach was validated through a case study in a COVID-19 manuscripts dataset, which had 199,895 articles published on arXiv, bioRxiv, medRxiv, PubMed and Scopus databases. The results suggest the feasibility of the proposed methodology, indicating the most active countries and the most explored themes in each period of the pandemic. Therefore, this study has the potential to instrument and expand strategic decisions by the scientific community, aiming at extracting knowledge that supports the fight against the COVID-19 pandemic.

Reverse Engineering of Ewing Sarcoma Regulatory Network Uncovers PAX7 and RUNX3 as Master Regulators Associated with Good Prognosis.

Ribeiro-Dantas, Marcel da Câmara; Oliveira Imparato, Danilo; Dalmolin, Matheus Gibeke Siqueira; de Farias, Caroline Brunetto; Brunetto, André Tesainer; da Cunha Jaeger, Mariane; Roesler, Rafael; Sinigaglia, Marialva; Siqueira Dalmolin, Rodrigo Juliani.

Cancers (Basel) ; 13(8)2021 Apr 13.

Artigo em Inglês | MEDLINE | ID: mdl-33924679

RESUMO

Ewing Sarcoma (ES) is a rare malignant tumor occurring most frequently in adolescents and young adults. The ES hallmark is a chromosomal translocation between the chromosomes 11 and 22 that results in an aberrant transcription factor (TF) through the fusion of genes from the FET and ETS families, commonly EWSR1 and FLI1. The regulatory mechanisms behind the ES transcriptional alterations remain poorly understood. Here, we reconstruct the ES regulatory network using public available transcriptional data. Seven TFs were identified as potential MRs and clustered into two groups: one composed by PAX7 and RUNX3, and another composed by ARNT2, CREB3L1, GLI3, MEF2C, and PBX3. The MRs within each cluster act as reciprocal agonists regarding the regulation of shared genes, regulon activity, and implications in clinical outcome, while the clusters counteract each other. The regulons of all the seven MRs were differentially methylated. PAX7 and RUNX3 regulon activity were associated with good prognosis while ARNT2, CREB3L1, GLI3, and PBX3 were associated with bad prognosis. PAX7 and RUNX3 appear as highly expressed in ES biopsies and ES cell lines. This work contributes to the understanding of the ES regulome, identifying candidate MRs, analyzing their methilome and pointing to potential prognostic factors.

COVID-19: A scholarly production dataset report for research analysis.

Santos, Breno Santana; Silva, Ivanovitch; Ribeiro-Dantas, Marcel da Câmara; Alves, Gisliany; Endo, Patricia Takako; Lima, Luciana.

Data Brief ; 32: 106178, 2020 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-32837978

RESUMO

COVID-2019 has been recognized as a global threat, and several studies are being conducted in order to contribute to the fight and prevention of this pandemic. This work presents a scholarly production dataset focused on COVID-19, providing an overview of scientific research activities, making it possible to identify countries, scientists and research groups most active in this task force to combat the coronavirus disease. The dataset is composed of 40,212 records of articles' metadata collected from Scopus, PubMed, arXiv and bioRxiv databases from January 2019 to July 2020. Those data were extracted by using the techniques of Python Web Scraping and preprocessed with Pandas Data Wrangling. In addition, the pipeline to preprocess and generate the dataset are versioned with the Data Version Control tool (DVC) and are thus easily reproducible and auditable.

#StayHome: Monitoring and benchmarking social isolation trends in Caruaru and the Região Metropolitana do Recife during the COVID-19 pandemic.

Endo, Patricia Takako; Silva, Ivanovitch; Lima, Luciana; Bezerra, Leonardo; Gomes, Rafael; Ribeiro-Dantas, Marcel; Alves, Gisliany; Monteiro, Kayo Henrique de Carvalho; Lynn, Theo; Sampaio, Vanderson de Souza.

Rev Soc Bras Med Trop ; 53: e20200271, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32609249

RESUMO

This technical report presents information related to the Social Isolation Index (SII) of the city of Caruaru, Pernambuco, Brazil. The data was provided by In Loco, a technology startup that has collected the movement of around 60 million Brazilians through cell phone location.

Assuntos

Benchmarking , Telefone Celular , Infecções por Coronavirus/epidemiologia , Pandemias , Pneumonia Viral/epidemiologia , Vigilância da População/métodos , Isolamento Social , Brasil/epidemiologia , COVID-19 , Cidades/epidemiologia , Infecções por Coronavirus/prevenção & controle , Tomada de Decisões , Sistemas de Informação Geográfica , Humanos , Pandemias/prevenção & controle , Espaço Pessoal , Pneumonia Viral/prevenção & controle , Software , Fatores de Tempo

Dataset for country profile and mobility analysis in the assessment of COVID-19 pandemic.

Ribeiro-Dantas, Marcel da Câmara; Alves, Gisliany; Gomes, Rafael B; Bezerra, Leonardo C T; Lima, Luciana; Silva, Ivanovitch.

Data Brief ; 31: 105698, 2020 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-32405515

RESUMO

Understanding the COVID-19 pandemic is a multidisciplinary effort that requires a significant number of variables. This dataset comprises (i) sociodemographic characteristics, compiled from 35 datasets obtained at UN Data; (ii) mobility metrics that can assist the analysis of social distancing, from Google Community Mobility Reports and; (iii) daily counts of cases and deaths by COVID-19, from the European Centre for Disease Prevention and Control and the Johns Hopkins University Center for Systems Science and Engineering. This unified dataset ranges from February 15, 2020 to May 7, 2020, a total of 83 days, and is provided as a collection of time series for 131 countries with 192 variables. The pipeline to preprocess and generate the dataset, along with the dataset itself, are versioned with the Data Version Control tool (DVC) and are thus easily reproducible.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA