Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Sci Rep ; 13(1): 1022, 2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36658181

RESUMO

Machine learning algorithms are being increasingly used in healthcare settings but their generalizability between different regions is still unknown. This study aims to identify the strategy that maximizes the predictive performance of identifying the risk of death by COVID-19 in different regions of a large and unequal country. This is a multicenter cohort study with data collected from patients with a positive RT-PCR test for COVID-19 from March to August 2020 (n = 8477) in 18 hospitals, covering all five Brazilian regions. Of all patients with a positive RT-PCR test during the period, 2356 (28%) died. Eight different strategies were used for training and evaluating the performance of three popular machine learning algorithms (extreme gradient boosting, lightGBM, and catboost). The strategies ranged from only using training data from a single hospital, up to aggregating patients by their geographic regions. The predictive performance of the algorithms was evaluated by the area under the ROC curve (AUROC) on the test set of each hospital. We found that the best overall predictive performances were obtained when using training data from the same hospital, which was the winning strategy for 11 (61%) of the 18 participating hospitals. In this study, the use of more patient data from other regions slightly decreased predictive performance. However, models trained in other hospitals still had acceptable performances and could be a solution while data for a specific hospital is being collected.


Assuntos
COVID-19 , Humanos , COVID-19/diagnóstico , COVID-19/epidemiologia , Estudos de Coortes , Algoritmos , Aprendizado de Máquina , Avaliação de Resultados em Cuidados de Saúde , Estudos Retrospectivos
3.
Artigo em Inglês | MEDLINE | ID: mdl-35329341

RESUMO

The aim of this study is to compare the mortality rates for typical asbestos-related diseases (ARD-T: mesothelioma, asbestosis, and pleural plaques) and for lung and ovarian cancer in Brazilian municipalities where asbestos mines and asbestos-cement plants had been operating (areas with high asbestos consumption, H-ASB) compared with in other municipalities. The death records for adults aged 30+ years were retrieved from multiple health information systems. In the 2000-2017 time period, age-standardized mortality rates (standard: Brazil 2010) and standardized rate ratios (SRR; H-ASB vs. others) were estimated. The SRRs for ARD-T were 2.56 for men (257 deaths in H-ASB municipalities) and 1.19 for women (136 deaths). For lung cancer, the SRRs were 1.33 for men (32,604 deaths) and 1.19 for women (20,735 deaths). The SRR for ovarian cancer was 1.34 (8446 deaths). Except for ARD-T and lung cancer in women, the SRRs were higher in municipalities that began using asbestos before 1970 than in municipalities that began utilizing asbestos from 1970 onwards. In conclusion, the mortality rates for ARD-T, and lung and ovarian cancer in municipalities with a history of asbestos mining and asbestos-cement production exceed those of the whole country. Caution is needed when interpreting the results of this ecological study. Analytical studies are necessary to document the impact of asbestos exposure on health, particularly in the future given the long latency of asbestos-related cancers.


Assuntos
Amianto , Asbestose , Neoplasias Pulmonares , Mesotelioma , Exposição Ocupacional , Neoplasias Ovarianas , Adulto , Amianto/toxicidade , Brasil/epidemiologia , Carcinoma Epitelial do Ovário , Cidades , Feminino , Humanos , Itália , Pulmão , Masculino
4.
Rev Saude Publica ; 55: 23, 2021.
Artigo em Inglês, Português | MEDLINE | ID: mdl-34133618

RESUMO

OBJECTIVE: To predict the risk of absence from work due to morbidities of teachers working in early childhood education in the municipal public schools, using machine learning algorithms. METHODS: This is a cross-sectional study using secondary, public and anonymous data from the Relação Anual de Informações Sociais, selecting early childhood education teachers who worked in the municipal public schools of the state of São Paulo between 2014 and 2018 (n = 174,294). Data on the average number of students per class and number of inhabitants in the municipality were also linked. The data were separated into training and testing, using records from 2014 to 2016 (n = 103,357) to train five predictive models, and data from 2017 to 2018 (n = 70,937) to test their performance in new data. The predictive performance of the algorithms was evaluated using the value of the area under the ROC curve (AUROC). RESULTS: All five algorithms tested showed an area under the curve above 0.76. The algorithm with the best predictive performance (artificial neural networks) achieved 0.79 of area under the curve, with accuracy of 71.52%, sensitivity of 72.86%, specificity of 70.52%, and kappa of 0.427 in the test data. CONCLUSION: It is possible to predict cases of sickness absence in teachers of public schools with machine learning using public data. The best algorithm showed a better result of the area under the curve when compared with the reference model (logistic regression). The algorithms can contribute to more assertive predictions in the public health and worker health areas, allowing to monitor and help prevent the absence of these workers due to morbidity.


Assuntos
Absenteísmo , Aprendizado de Máquina , Brasil , Pré-Escolar , Estudos Transversais , Humanos , Curva ROC , Instituições Acadêmicas
5.
Rev Bras Epidemiol ; 24: e210011, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33825773

RESUMO

OBJECTIVE: To develop a linkage algorithm to match anonymous death records of cancer of the larynx (ICD-10 C32X), retrieved from the Mortality Information System (SIM) and the Hospital Information System of the Brazilian Unified National Health System (SIH-SUS) in Brazil. METHODOLOGY: Death records containing ICD-10 C32X codes were retrieved from SIM and SIH-SUS, limited to individuals aged 30 years and over, between 2002 and 2012, in the state of São Paulo. The databases were linked using a unique key identifier developed with sociodemographic data shared by both systems. Linkage performance was ascertained by applying the same procedure to similar non-anonymous databases. True pairs were those having the same identification variables. RESULTS: A total of 14,311 eligible death records were found. Most records, 10,674 (74.6%), were exclusive to SIM. Only 1,853 (12.9%) deaths were registered in both systems, representing true pairs. A total of 1,784 (12.5%) cases of laryngeal cancer in the SIH-SUS database were tracked in SIM with different causes of death. The linkage failed to match 167 (9.4%) records due to inconsistencies in the key identifier. CONCLUSION: The authors found that linking anonymous data from mortality and hospital records is a feasible measure to track missing records and may improve cancer statistics.


Assuntos
Armazenamento e Recuperação da Informação , Neoplasias Laríngeas , Adulto , Algoritmos , Brasil/epidemiologia , Bases de Dados Factuais , Atestado de Óbito , Estudos de Viabilidade , Sistemas de Informação Hospitalar , Humanos , Armazenamento e Recuperação da Informação/métodos , Sistemas de Informação , Neoplasias Laríngeas/mortalidade
6.
Sci Rep ; 11(1): 3343, 2021 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-33558602

RESUMO

The new coronavirus disease (COVID-19) is a challenge for clinical decision-making and the effective allocation of healthcare resources. An accurate prognostic assessment is necessary to improve survival of patients, especially in developing countries. This study proposes to predict the risk of developing critical conditions in COVID-19 patients by training multipurpose algorithms. We followed a total of 1040 patients with a positive RT-PCR diagnosis for COVID-19 from a large hospital from São Paulo, Brazil, from March to June 2020, of which 288 (28%) presented a severe prognosis, i.e. Intensive Care Unit (ICU) admission, use of mechanical ventilation or death. We used routinely-collected laboratory, clinical and demographic data to train five machine learning algorithms (artificial neural networks, extra trees, random forests, catboost, and extreme gradient boosting). We used a random sample of 70% of patients to train the algorithms and 30% were left for performance assessment, simulating new unseen data. In order to assess if the algorithms could capture general severe prognostic patterns, each model was trained by combining two out of three outcomes to predict the other. All algorithms presented very high predictive performance (average AUROC of 0.92, sensitivity of 0.92, and specificity of 0.82). The three most important variables for the multipurpose algorithms were ratio of lymphocyte per C-reactive protein, C-reactive protein and Braden Scale. The results highlight the possibility that machine learning algorithms are able to predict unspecific negative COVID-19 outcomes from routinely-collected data.


Assuntos
COVID-19/diagnóstico , COVID-19/epidemiologia , Biologia Computacional/métodos , Aprendizado de Máquina , SARS-CoV-2/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Brasil/epidemiologia , Proteína C-Reativa/análise , COVID-19/mortalidade , COVID-19/virologia , Estudos de Coortes , Feminino , Humanos , Unidades de Terapia Intensiva , Tempo de Internação , Contagem de Linfócitos , Masculino , Pessoa de Meia-Idade , Prognóstico , Respiração Artificial , Reação em Cadeia da Polimerase Via Transcriptase Reversa
7.
São Paulo; s.n; 2021. 117 p.
Tese em Português | LILACS | ID: biblio-1354317

RESUMO

Algoritmos de machine learning têm impactado a área da saúde nos últimos anos. Muita dessa popularidade deve-se aos ganhos de performance preditiva em comparação aos modelos estatísticos tradicionais, já que estes algoritmos conseguem capturar relações não-lineares e complexas, além de permitirem o uso de diferentes tipos de dados. Esta pesquisa objetiva descrever as diferentes técnicas recentes de machine learning e como elas podem ser aplicadas na saúde e segurança do trabalhador (SST). Os resultados da tese estão organizados em três artigos científicos. No primeiro artigo, foi realizada uma revisão bibliográfica para entender o panorama de uso de machine learning na saúde pública e em SST. Foram identificadas e categorizadas aplicações de aprendizagem supervisionada e não-supervisionada, e os principais problemas de pesquisa correspondentes. No segundo artigo, foram aplicados algoritmos de aprendizagem supervisionada para predição de absenteísmo por doença e doença relacionada ao trabalho em professores da rede pública municipal do Estado de São Paulo entre 2014 a 2018 (n=174.294), usando como fonte de dados a Relação Anual de Informações Sociais (RAIS). Cinco algoritmos foram comparados de acordo com o valor da área abaixo da curva (AUC). Todos os algoritmos obtiveram AUROC superior a 0,76. O melhor algoritmo (redes neurais artificiais) obteve AUROC de 0,79, com acurácia de 71,52%, sensibilidade de 72,86% e especificidade de 70,52%. Foi possível realizar predições que forneceram estimativas de risco no ensino infantil, para subsidiar a prevenção de afastamento por morbidades em professores, utilizando dados públicos e anônimos. No terceiro e último artigo, foram desenvolvidos modelos preditivos para identificar, antecipadamente, trabalhadores com risco de diagnóstico positivo para doença pulmonar obstrutiva crônica (DPOC). O estudo utilizou dados da coorte prospectiva do UK Biobank, de indivíduos acompanhados desde 2006, filtrando aqueles que preencheram o questionário de histórico ocupacional (n=120.289). Desses, 1731 (1,4%) foram diagnosticados com DPOC. Ao todo, foram selecionadas 26 variáveis, entre dados demográficos, exames laboratoriais, hábitos e sintomas, para a construção de modelos generalistas para predição de DPOC. Além disso, foi selecionado um subconjunto de participantes (n=7.628) com histórico ocupacional na indústria da construção civil e na mineração com possível exposição a poeira de sílica, para desenvolver modelos especialistas. Desses, 237 (3,11%) tiveram diagnóstico de DPOC. O modelo generalista obteve AUROC de 0.845, e o modelo especialista obteve AUROC de 0.841. As cinco principais variáveis preditoras foram idade, tosse crônica, tabagismo, histórico de asma e expectoração. Os resultados mostram que é possível predizer risco individual de diagnóstico de DPOC na população geral e nos trabalhadores expostos a poeiras minerais utilizando variáveis comumente coletadas na atenção primária. Nesta tese, mostramos a viabilidade de uso de modelos preditivos na saúde do trabalhador tanto para prognóstico quanto para diagnóstico, com boa performance preditiva. Espera-se que este estudo possa contribuir para uma maior adoção de modelos preditivos por pesquisadores em SST, permitindo identificar antecipadamente trabalhadores expostos a riscos ambientais de forma a auxiliar o início de medidas preventivas que inibam ou minimizem os riscos.


Machine learning algorithms have gained prominence in the health area in recent years. Much of this popularity is due to predictive performance gains when compared to traditional statistical models, as these algorithms are able to capture non-linear relationships and to handle different types of data. This research aims to describe the different machine learning techniques and how these techniques can be applied in occupational safety and health (OSH). The results are organized into three scientific articles. In the first manuscript, a literature review was carried out to understand the panorama of machine learning use in public health and OSH. Supervised and unsupervised learning algorithms were identified and categorized, and main research problems were listed. In the second article, supervised learning algorithms were developed to predict absenteeism due to illness and work related illness in teachers from all public municipal schools in the State of São Paulo between 2014 and 2018 (n=174.294) available from the Relação Anual de Informações Sociais (RAIS). Five algorithms were compared according to the value of the area under the receiver operating characteristic curve (AUROC). All algorithms obtained AUROC greater than 0.76. The best algorithm (artificial neural networks) obtained an AUROC of 0.79, with an accuracy of 71.52%, sensitivity of 72.86% and specificity of 70.52%. It was possible to make assertive predictions, which provide estimates of risk, providing subsidies for preventing sick leave in teachers using public and anonymous data. In the third and last article, predictive models were developed to identify workers at risk of a positive diagnosis for chronic obstructive pulmonary disease (COPD). The study used data from the UK Biobank prospective cohort from individuals followed since 2006, filtering those who completed the occupational history questionnaire (n=120.294). Of these, 1731 (1.4%) had a positive diagnosis of COPD. In all, 26 variables were selected, including demographic data, laboratory tests, habits and symptoms, for the development of generalist models for the prediction of COPD. In addition, a subset of individuals (n=7628) with an occupational background in the construction and mining industry, with possible exposure to mineral dusts was selected to develop specialized models. Of these, 237 (3.11%) were diagnosed with COPD. The generalist model obtained AUROC of 0.845, and the specialist model, an AUC of 0.841. The five main predictive variables were age, chronic cough, smoking, earlier diagnosis of asthma and chronic sputum. The results show that it is possible to predict individual risk of COPD diagnosis in the general population and in workers exposed to silica dust using variables commonly collected in primary care. In this research, we showed the feasibility of using predictive models in worker health for both prognosis and diagnosis, with good predictive performance. We believe that this study can contribute to a greater adoption of predictive models by OSH researchers, allowing the early identification of workers exposed to risks and the adoption of preventive measures that inhibit or minimize risks.


Assuntos
Prognóstico , Saúde Ocupacional , Aprendizado de Máquina
8.
Rev. bras. epidemiol ; 24: e210011, 2021. tab, graf
Artigo em Inglês | LILACS | ID: biblio-1156024

RESUMO

ABSTRACT: Objective: To develop a linkage algorithm to match anonymous death records of cancer of the larynx (ICD-10 C32X), retrieved from the Mortality Information System (SIM) and the Hospital Information System of the Brazilian Unified National Health System (SIH-SUS) in Brazil. Methodology: Death records containing ICD-10 C32X codes were retrieved from SIM and SIH-SUS, limited to individuals aged 30 years and over, between 2002 and 2012, in the state of São Paulo. The databases were linked using a unique key identifier developed with sociodemographic data shared by both systems. Linkage performance was ascertained by applying the same procedure to similar non-anonymous databases. True pairs were those having the same identification variables. Results: A total of 14,311 eligible death records were found. Most records, 10,674 (74.6%), were exclusive to SIM. Only 1,853 (12.9%) deaths were registered in both systems, representing true pairs. A total of 1,784 (12.5%) cases of laryngeal cancer in the SIH-SUS database were tracked in SIM with different causes of death. The linkage failed to match 167 (9.4%) records due to inconsistencies in the key identifier. Conclusion: The authors found that linking anonymous data from mortality and hospital records is a feasible measure to track missing records and may improve cancer statistics.


RESUMO: Objetivo: Desenvolver um algoritmo de vinculação de registros para parear registros de óbito por câncer de laringe (CID-10 C32X), recuperados do Sistema de Informação de Mortalidade (SIM) e do Sistema de Informações Hospitalares do Sistema Único de Saúde (SIH-SUS) do Brasil. Métodos: Foram filtrados registros de óbitos contendo códigos CID-10 C32X do SIM e do SIH-SUS, de indivíduos de mais de 30 anos, entre 2002 e 2012, no Estado de São Paulo. As bases de dados foram vinculadas por meio de um identificador único e de variáveis sociodemográficas comuns a ambos os sistemas. O desempenho da vinculação de dados foi aferido aplicando-se o mesmo procedimento em bancos de dados nominais. Os pares verdadeiros apresentavam os mesmos valores nas variáveis de identificação. Resultados: Ao todo, 14.311 registros elegíveis de óbito foram encontrados. A maioria dos registros, 10.674 (74.6%), era exclusiva do SIM. Apenas 1.853 (12.9%) óbitos foram registrados em ambos os sistemas, representando pares verdadeiros. Um total de 1.784 (12.5%) casos de câncer de laringe presentes no SIH-SUS constavam com diferentes causas de óbito no SIM. Houve falha na vinculação em 167 (9.4%) registros, devido a inconsistências na chave de identificação. Conclusão: Constatou-se que a vinculação de dados anônimos de registros hospitalares e registros de óbito é viável e pode auxiliar na melhoria de estatísticas de câncer.


Assuntos
Humanos , Adulto , Neoplasias Laríngeas/mortalidade , Armazenamento e Recuperação da Informação/métodos , Algoritmos , Brasil/epidemiologia , Sistemas de Informação , Atestado de Óbito , Estudos de Viabilidade , Bases de Dados Factuais , Sistemas de Informação Hospitalar
9.
Rev. bras. saúde ocup ; 44: e13, 2019. tab, graf
Artigo em Português | LILACS | ID: biblio-1042555

RESUMO

Resumo Introdução: a variedade, volume e velocidade de geração de dados (big data) possibilitam novas e mais complexas análises. Objetivo: discutir e apresentar técnicas de mineração de dados (data mining) e de aprendizado de máquina (machine learning) para auxiliar pesquisadores de Saúde e Segurança no Trabalho (SST) na escolha da técnica adequada para lidar com big data. Métodos: revisão bibliográfica com foco em data mining e no uso de análises preditivas com machine learning e suas aplicações para auxiliar diagnósticos e predição de riscos em SST. Resultados: a literatura indica que aplicações de data mining com algoritmos de machine learning para análises preditivas em saúde pública e em SST apresentam melhor desempenho em comparação com análises tradicionais. São sugeridas técnicas de acordo com o tipo de pesquisa almejada. Discussão: data mining tem se tornado uma alternativa cada vez mais comum para lidar com bancos de dados de saúde pública, possibilitando analisar grandes volumes de dados de morbidade e mortalidade. Tais técnicas não visam substituir o fator humano, mas auxiliar em processos de tomada de decisão, servir de ferramenta para a análise estatística e gerar conhecimento para subsidiar ações que possam melhorar a qualidade de vida do trabalhador.


Abstract Introduction: variety, volume and data generation speed allow for new and more complex analyses. Objective: to discuss and present data mining and machine learning techniques to aid occupational safety and health (OSH) researchers to choose the suitable technique when dealing with large volumes of data. Methods: literature review to discuss data mining and machine learning predictive applications for aiding diagnosis and risk prevention in OSH. Results: literature shows that data mining with machine learning algorithms for predictive purposes in OSH and public health present better performance when compared to traditional analysis. According to the research purpose, different techniques are recommended. Discussion: data mining has become a common alternative when dealing with large databases in public health, making it possible to analyze large volume of morbidity and mortality data. These techniques are not meant to replace the human factor, but rather to assist in decision-making processes, to work as a tool for the statistical analysis of OSH data and to build up knowledge to subsidize actions that may improve worker's quality of life.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...