ABSTRACT
AbstractIntroductionThis paper's aim is to develop a data warehouse from the integration of the files of three Brazilian health information systems concerned with the production of ambulatory and hospital procedures for cancer care, and cancer mortality. These systems do not have a unique patient identification, which makes their integration difficult even within a single system.MethodsData from the Brazilian Public Hospital Information System (SIH-SUS), the Oncology Module for the Outpatient Information System (APAC-ONCO) and the Mortality Information System (SIM) for the State of Rio de Janeiro, in the period from January 2000 to December 2004 were used. Each of the systems has the monthly data production compiled in dbase files (dbf). All the files pertaining to the same system were then read into a corresponding table in a MySQL Server 5.1. The SIH-SUS and APAC-ONCO tables were linked internally and with one another through record linkage methods. The APAC-ONCO table was linked to the SIM table. Afterwards a data warehouse was built using Pentaho and the MySQL database management system.ResultsThe sensitivities and specificities of the linkage processes were above 95% and close to 100% respectively. The data warehouse provided several analytical views that are accessed through the Pentaho Schema Workbench.ConclusionThis study presented a proposal for the integration of Brazilian Health Systems to support the building of data warehouses and provide information beyond those currently available with the individual systems.
ABSTRACT
Biological data and analysis tools are accumulated in distributed databases and web servers. For this reason, biologists who want to find information from the web should be aware of the various kinds of resources where it is located and how it is retrieved. Integrating the data from heterogeneous biological resources will enable biologists to discover new knowledge across the specific domain boundaries from sequences to expression, structure, and pathway. And inevitably biological databases contain noisy data. Therefore, consensus among databases will confirm the reliability of its contents. We have developed WeSAT that integrates distributed and heterogeneous biological databases and analysis tools, providing through Web Services protocols. In WeSAT, biologists are retrieved specific entries in SWISS-PROT/EMBL, PDB, and KEGG, which have annotated information about sequence, structure,and pathway. And further analysis is carried by integrated services for example homology search and multiple alignments. WeSAT makes it possible to retrieve real time updated data and analysis from the scattered databases in a single platform through Web Services.