RESUMO
We performed the multi-year project to collect discharge summary from multiple hospitals and made the big text database to build a common document vector space, and developed various applications. We extracted 243,907 discharge summaries from seven hospitals. There was a difference in term structure and number of terms between the hospitals, however the differences by disease were similar. We built the vector space using TF-IDF method. We performed a cross-match analysis of DPC selection among seven hospitals. About 80% cases were correctly matched. The use of model data of other hospitals reduced selection rate to around 10%; however, integrated model data from all hospitals restored the selection rate.
Assuntos
Mineração de Dados/métodos , Bases de Dados Factuais , Registros Eletrônicos de Saúde/organização & administração , Registro Médico Coordenado/métodos , Sumários de Alta do Paciente Hospitalar/estatística & dados numéricos , Vocabulário Controlado , Confiabilidade dos Dados , Sistemas de Gerenciamento de Base de Dados , Japão , Estudos Multicêntricos como Assunto , Processamento de Linguagem Natural , Integração de SistemasRESUMO
We started a multi-year project to collect discharge summaries from multiple hospitals and create a big text database to build a common document vector space, and develop various applications such as the autoselection of the disease. As the first step, we extracted discharge summary from two hospitals. Using a text mining method, we carried out a DPC selection. There was a difference in term structure and number of terms between the discharge summaries from both hospitals. Nevertheless, the selection rate of the disease is resembled closely.