Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Stud Health Technol Inform ; 316: 1233-1237, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176604

RESUMO

Generative machine learning models such as Generative Adversarial Networks (GANs) have been shown to be especially successful in generating realistic synthetic data in image and tabular domains. However, it has been shown that such generative models, as well as the generated synthetic data, can reveal information contained in their privacy-sensitive training data, and therefore must be carefully evaluated before being used. The gold standard method through which such privacy leakage can be estimated is simulating membership inference attacks (MIAs), in which an attacker attempts to learn whether a given sample was part of the training data of a generative model. The state-of-the art MIAs against generative models, however, rely on strong assumptions (knowledge of the exact training dataset size), or require a lot of computational power (to retrain many "surrogate" generative models), which make them hard to use in practice. In this work, we propose a technique for evaluating privacy risks in GANs which exploits the outputs of the discriminator part of the standard GAN architecture. We evaluate our attacks in terms of performance in two synthetic image generation applications in radiology and ophthalmology, showing that our technique provides a more complete picture of the threats by performing worst-case privacy risk estimation and by identifying attacks with higher precision than the prior work.


Assuntos
Segurança Computacional , Humanos , Redes Neurais de Computação , Aprendizado de Máquina , Confidencialidade , Privacidade
2.
Stud Health Technol Inform ; 316: 929-933, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176944

RESUMO

Predictive modeling holds a large potential in clinical decision-making, yet its effectiveness can be hindered by inherent data imbalances in clinical datasets. This study investigates the utility of synthetic data for improving the performance of predictive modeling on realistic small imbalanced clinical datasets. We compared various synthetic data generation methods including Generative Adversarial Networks, Normalizing Flows, and Variational Autoencoders to the standard baselines for correcting for class underrepresentation on four clinical datasets. Although results show improvement in F1 scores in some cases, even over multiple repetitions, we do not obtain statistically significant evidence that synthetic data generation outperforms standard baselines for correcting for class imbalance. This study challenges common beliefs about the efficacy of synthetic data for data augmentation and highlights the importance of evaluating new complex methods against simple baselines.


Assuntos
Tomada de Decisão Clínica , Humanos
3.
Stud Health Technol Inform ; 316: 853-857, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176927

RESUMO

Clinical notes contain valuable information for research and monitoring quality of care. Named Entity Recognition (NER) is the process for identifying relevant pieces of information such as diagnoses, treatments, side effects, etc., and bring them to a more structured form. Although recent advancements in deep learning have facilitated automated recognition, particularly in English, NER can still be challenging due to limited specialized training data. This exacerbated in hospital settings where annotations are costly to obtain without appropriate incentives and often dependent on local specificities. In this work, we study whether this annotation process can be effectively accelerated by combining two practical strategies. First, we convert usually passive annotation tasks into a proactive contest to motivate human annotators in performing a task often considered tedious and time-consuming. Second, we provide pre-annotations for the participants to evaluate how recall and precision of the pre-annotations can boost or deteriorate annotation performance. We applied both strategies to a text de-identification task on French clinical notes and discharge summaries at a large Swiss university hospital. Our results show that proactive contest and average quality pre-annotations can significantly speed up annotation time and increase annotation quality, enabling us to develop a text de-identification model for French clinical notes with high performance (F1 score 0.94).


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Anonimização de Dados , Suíça
5.
J Med Internet Res ; 25: e47254, 2023 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-37851984

RESUMO

BACKGROUND: Reference intervals (RIs) for patient test results are in standard use across many medical disciplines, allowing physicians to identify measurements indicating potentially pathological states with relative ease. The process of inferring cohort-specific RIs is, however, often ignored because of the high costs and cumbersome efforts associated with it. Sophisticated analysis tools are required to automatically infer relevant and locally specific RIs directly from routine laboratory data. These tools would effectively connect clinical laboratory databases to physicians and provide personalized target ranges for the respective cohort population. OBJECTIVE: This study aims to describe the BioRef infrastructure, a multicentric governance and IT framework for the estimation and assessment of patient group-specific RIs from routine clinical laboratory data using an innovative decentralized data-sharing approach and a sophisticated, clinically oriented graphical user interface for data analysis. METHODS: A common governance agreement and interoperability standards have been established, allowing the harmonization of multidimensional laboratory measurements from multiple clinical databases into a unified "big data" resource. International coding systems, such as the International Classification of Diseases, Tenth Revision (ICD-10); unique identifiers for medical devices from the Global Unique Device Identification Database; type identifiers from the Global Medical Device Nomenclature; and a universal transfer logic, such as the Resource Description Framework (RDF), are used to align the routine laboratory data of each data provider for use within the BioRef framework. With a decentralized data-sharing approach, the BioRef data can be evaluated by end users from each cohort site following a strict "no copy, no move" principle, that is, only data aggregates for the intercohort analysis of target ranges are exchanged. RESULTS: The TI4Health distributed and secure analytics system was used to implement the proposed federated and privacy-preserving approach and comply with the limitations applied to sensitive patient data. Under the BioRef interoperability consensus, clinical partners enable the computation of RIs via the TI4Health graphical user interface for query without exposing the underlying raw data. The interface was developed for use by physicians and clinical laboratory specialists and allows intuitive and interactive data stratification by patient factors (age, sex, and personal medical history) as well as laboratory analysis determinants (device, analyzer, and test kit identifier). This consolidated effort enables the creation of extremely detailed and patient group-specific queries, allowing the generation of individualized, covariate-adjusted RIs on the fly. CONCLUSIONS: With the BioRef-TI4Health infrastructure, a framework for clinical physicians and researchers to define precise RIs immediately in a convenient, privacy-preserving, and reproducible manner has been implemented, promoting a vital part of practicing precision medicine while streamlining compliance and avoiding transfers of raw patient data. This new approach can provide a crucial update on RIs and improve patient care for personalized medicine.


Assuntos
Big Data , Privacidade , Humanos , Coleta de Dados , Laboratórios , Disseminação de Informação
6.
Comput Inform Nurs ; 41(11): 884-891, 2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37279051

RESUMO

Hospital-acquired pressure injuries are a challenge for healthcare systems, and the nurse's role is essential in their prevention. The first step is risk assessment. The development of advanced data-driven methods based on machine learning techniques can improve risk assessment through the use of routinely collected data. We studied 24 227 records from 15 937 distinct patients admitted to medical and surgical units between April 1, 2019, and March 31, 2020. Two predictive models were developed: random forest and long short-term memory neural network. Model performance was then evaluated and compared with the Braden score. The areas under the receiver operating characteristic curve, the specificity, and the accuracy of the long short-term memory neural network model (0.87, 0.82, and 0.82, respectively) were higher than those of the random forest model (0.80, 0.72, and 0.72, respectively) and the Braden score (0.72, 0.61, and 0.61, respectively). The sensitivity of the Braden score (0.88) was higher than that of long short-term memory neural network model (0.74) and the random forest model (0.73). The long short-term memory neural network model has the potential to support nurses in clinical decision-making. Implementation of this model in the electronic health record could improve assessment and allow nurses to focus on higher-priority interventions.


Assuntos
Úlcera por Pressão , Humanos , Úlcera por Pressão/prevenção & controle , Medição de Risco/métodos , Hospitalização , Curva ROC , Hospitais , Estudos Retrospectivos
7.
Sci Data ; 10(1): 127, 2023 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-36899064

RESUMO

The Swiss Personalized Health Network (SPHN) is a government-funded initiative developing federated infrastructures for a responsible and efficient secondary use of health data for research purposes in compliance with the FAIR principles (Findable, Accessible, Interoperable and Reusable). We built a common standard infrastructure with a fit-for-purpose strategy to bring together health-related data and ease the work of both data providers to supply data in a standard manner and researchers by enhancing the quality of the collected data. As a result, the SPHN Resource Description Framework (RDF) schema was implemented together with a data ecosystem that encompasses data integration, validation tools, analysis helpers, training and documentation for representing health metadata and data in a consistent manner and reaching nationwide data interoperability goals. Data providers can now efficiently deliver several types of health data in a standardised and interoperable way while a high degree of flexibility is granted for the various demands of individual research projects. Researchers in Switzerland have access to FAIR health data for further use in RDF triplestores.


Assuntos
Pesquisa sobre Serviços de Saúde , Web Semântica , Metadados , Suíça , Coleta de Dados
8.
JMIR Med Inform ; 11: e38150, 2023 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-36656627

RESUMO

BACKGROUND: Medical coding is the process that converts clinical documentation into standard medical codes. Codes are used for several key purposes in a hospital (eg, insurance reimbursement and performance analysis); therefore, their optimization is crucial. With the rapid growth of natural language processing technologies, several solutions based on artificial intelligence have been proposed to aid in medical coding by automatically suggesting relevant codes for clinical documents. However, their effectiveness is still limited to simple cases, and it is not yet clear how much value they can bring in improving coding efficiency and accuracy. OBJECTIVE: This study aimed to bring more efficiency to the coding process to improve the selection of codes by medical coders. To achieve this, we developed an innovative multimodal machine learning-based solution that, instead of predicting codes, detects the degree of coding complexity before coding is performed. The notion of coding complexity was used to better dispatch work among medical coders to eventually minimize errors and improve throughput. METHODS: To train and evaluate our approach, we collected 2060 cases rated by coders in terms of coding complexity from 1 (simplest) to 4 (most complex). We asked 2 expert coders to rate 3.01% (62/2060) of the cases as the gold standard. The agreements between experts were used as benchmarks for model evaluation. A case contains both clinical text and patient metadata from the hospital electronic health record. We extracted both text features and metadata features, then concatenated and fed them into several machine learning models. Finally, we selected 2 models. The first used cross-validated training on 1751 cases and testing on 309 cases aiming to assess the predictive power of the proposed approach and its generalizability. The second model was trained on 1998 cases and tested on the gold standard to validate the best model performance against human benchmarks. RESULTS: Our first model achieved a macro-F1-score of 0.51 and an accuracy of 0.59 on classifying the 4-scale complexity. The model distinguished well between the simple (combined complexity 1-2) and complex (combined complexity 3-4) cases with a macro-F1-score of 0.65 and an accuracy of 0.71. Our second model achieved 61% agreement with experts' ratings and a macro-F1-score of 0.62 on the gold standard, whereas the 2 experts had a 66% (41/62) agreement ratio with a macro-F1-score of 0.67. CONCLUSIONS: We propose a multimodal machine learning approach that leverages information from both clinical text and patient metadata to predict the complexity of coding a case in the precoding phase. By integrating this model into the hospital coding system, distribution of cases among coders can be done automatically with performance comparable with that of human expert coders, thus improving coding efficiency and accuracy at scale.

9.
Stud Health Technol Inform ; 294: 141-142, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35612040

RESUMO

In this study, we propose a unified evaluation framework for systematically assessing the utility-privacy trade-off of synthetic data generation (SDG) models. These SDG models are adapted to deal with longitudinal or tabular data stemming from electronic health records (EHR) containing both discrete and numeric features. Our evaluation framework considers different data sharing scenarios and attacker models.


Assuntos
Registros Eletrônicos de Saúde , Privacidade , Hospitais Universitários , Humanos
11.
Nat Commun ; 12(1): 5910, 2021 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-34635645

RESUMO

Using real-world evidence in biomedical research, an indispensable complement to clinical trials, requires access to large quantities of patient data that are typically held separately by multiple healthcare institutions. We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-preserving analyses of distributed datasets by yielding highly accurate results without revealing any intermediate data. We demonstrate the applicability of FAMHE to essential biomedical analysis tasks, including Kaplan-Meier survival analysis in oncology and genome-wide association studies in medical genetics. Using our system, we accurately and efficiently reproduce two published centralized studies in a federated setting, enabling biomedical insights that are not possible from individual institutions alone. Our work represents a necessary key step towards overcoming the privacy hurdle in enabling multi-centric scientific collaborations.


Assuntos
Medicina de Precisão , Privacidade , Algoritmos , Segurança Computacional , Atenção à Saúde , Estudo de Associação Genômica Ampla , Humanos , Estimativa de Kaplan-Meier , Análise de Sobrevida
12.
JMIR Med Inform ; 9(6): e27591, 2021 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-34185008

RESUMO

BACKGROUND: Interoperability is a well-known challenge in medical informatics. Current trends in interoperability have moved from a data model technocentric approach to sustainable semantics, formal descriptive languages, and processes. Despite many initiatives and investments for decades, the interoperability challenge remains crucial. The need for data sharing for most purposes ranging from patient care to secondary uses, such as public health, research, and quality assessment, faces unmet problems. OBJECTIVE: This work was performed in the context of a large Swiss Federal initiative aiming at building a national infrastructure for reusing consented data acquired in the health care and research system to enable research in the field of personalized medicine in Switzerland. The initiative is the Swiss Personalized Health Network (SPHN). This initiative is providing funding to foster use and exchange of health-related data for research. As part of the initiative, a national strategy to enable a semantically interoperable clinical data landscape was developed and implemented. METHODS: A deep analysis of various approaches to address interoperability was performed at the start, including large frameworks in health care, such as Health Level Seven (HL7) and Integrating Healthcare Enterprise (IHE), and in several domains, such as regulatory agencies (eg, Clinical Data Interchange Standards Consortium [CDISC]) and research communities (eg, Observational Medical Outcome Partnership [OMOP]), to identify bottlenecks and assess sustainability. Based on this research, a strategy composed of three pillars was designed. It has strong multidimensional semantics, descriptive formal language for exchanges, and as many data models as needed to comply with the needs of various communities. RESULTS: This strategy has been implemented stepwise in Switzerland since the middle of 2019 and has been adopted by all university hospitals and high research organizations. The initiative is coordinated by a central organization, the SPHN Data Coordination Center of the SIB Swiss Institute of Bioinformatics. The semantics is mapped by domain experts on various existing standards, such as Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), Logical Observation Identifiers Names and Codes (LOINC), and International Classification of Diseases (ICD). The resource description framework (RDF) is used for storing and transporting data, and to integrate information from different sources and standards. Data transformers based on SPARQL query language are implemented to convert RDF representations to the numerous data models required by the research community or bridge with other systems, such as electronic case report forms. CONCLUSIONS: The SPHN strategy successfully implemented existing standards in a pragmatic and applicable way. It did not try to build any new standards but used existing ones in a nondogmatic way. It has now been funded for another 4 years, bringing the Swiss landscape into a new dimension to support research in the field of personalized medicine and large interoperable clinical data.

13.
J Med Internet Res ; 23(2): e25120, 2021 02 25.
Artigo em Inglês | MEDLINE | ID: mdl-33629963

RESUMO

Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data utility. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely upon bespoke data sharing contracts. The lengthy process and administration induced by these contracts increases the inefficiency of data sharing and may disincentivize important clinical treatment and medical research. This paper provides a synthesis between 2 novel advanced privacy-enhancing technologies-homomorphic encryption and secure multiparty computation (defined together as multiparty homomorphic encryption). These privacy-enhancing technologies provide a mathematical guarantee of privacy, with multiparty homomorphic encryption providing a performance advantage over separately using homomorphic encryption or secure multiparty computation. We argue multiparty homomorphic encryption fulfills legal requirements for medical data sharing under the European Union's General Data Protection Regulation which has set a global benchmark for data protection. Specifically, the data processed and shared using multiparty homomorphic encryption can be considered anonymized data. We explain how multiparty homomorphic encryption can reduce the reliance upon customized contractual measures between institutions. The proposed approach can accelerate the pace of medical research while offering additional incentives for health care and research institutes to employ common data interoperability standards.


Assuntos
Segurança Computacional/ética , Disseminação de Informação/ética , Privacidade/legislação & jurisprudência , Tecnologia/métodos , Humanos
14.
Nat Comput Sci ; 1(3): 192-198, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38183193

RESUMO

The growing number of health-data breaches, the use of genomic databases for law enforcement purposes and the lack of transparency of personal genomics companies are raising unprecedented privacy concerns. To enable a secure exploration of genomic datasets with controlled and transparent data access, we propose a citizen-centric approach that combines cryptographic privacy-preserving technologies, such as homomorphic encryption and secure multi-party computation, with the auditability of blockchains. Our open-source implementation supports queries on the encrypted genomic data of hundreds of thousands of individuals, with minimal overhead. We show that real-world adoption of our system alleviates widespread privacy concerns and encourages data access sharing with researchers.

15.
Rev Med Suisse ; 16(704): 1574-1578, 2020 Sep 02.
Artigo em Francês | MEDLINE | ID: mdl-32880115

RESUMO

Precision medicine aims to tailor prevention and treatment to individual data. Although different markers can be used (e.g. transcriptome or proteome), its rise is closely linked to that of genomics, owing to the henceforth reasonable cost of DNA sequencing. The enormous datasets thus generated can be exploited due to remarkable advances in bioinformatics and information sciences. However, beyond the technological endeavor, humanities and social sciences also play a central role to redefine health and illness. The precision medicine unit at CHUV gathers stakeholders from these various domains in order to demonstrate the utility of precision medicine and catalyze its integration into healthcare, to the benefit of the patient.


La médecine de précision a pour but d'ajuster la prévention et les traitements aux données individuelles. La génomique en est un moteur du fait du coût désormais raisonnable des analyses ADN, malgré l'utilisation possible d'autres marqueurs (transcriptome, protéome etc.). Les données massives ainsi générées peuvent être analysées grâce aux progrès de la bioinformatique et des sciences de l'information. La médecine de précision ne se résume pas à une aventure technologique : les sciences humaines et sociales y jouent un rôle central car elles promettent une redéfinition du rapport à la santé et à la maladie. L'Unité de médecine de précision du CHUV réunit les acteurs de ces différents domaines afin de démontrer l'utilité de la médecine de précision et d'accélérer son incorporation dans le parcours de soins, au bénéfice du patient.


Assuntos
Genômica , Ciências Humanas , Informática Médica , Biologia Molecular , Medicina de Precisão/tendências , Humanos
16.
J Law Biosci ; 7(1): lsaa010, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32733683

RESUMO

Personalised medicine can improve both public and individual health by providing targeted preventative and therapeutic healthcare. However, patient health data must be shared between institutions and across jurisdictions for the benefits of personalised medicine to be realised. Whilst data protection, privacy, and research ethics laws protect patient confidentiality and safety they also may impede multisite research, particularly across jurisdictions. Accordingly, we compare the concept of data accessibility in data protection and research ethics laws across seven jurisdictions. These jurisdictions include Switzerland, Italy, Spain, the United Kingdom (which have implemented the General Data Protection Regulation), the United States, Canada, and Australia. Our paper identifies the requirements for consent, the standards for anonymisation or pseudonymisation, and adequacy of protection between jurisdictions as barriers for sharing. We also identify differences between the European Union and other jurisdictions as a significant barrier for data accessibility in cross jurisdictional multisite research. Our paper concludes by considering solutions to overcome these legislative differences. These solutions include data transfer agreements and organisational collaborations designed to `front load' the process of ethics approval, so that subsequent research protocols are standardised. We also allude to technical solutions, such as distributed computing, secure multiparty computation and homomorphic encryption.

17.
Stud Health Technol Inform ; 270: 238-241, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570382

RESUMO

One major obstacle to developing precision medicine to its full potential is the privacy concerns related to genomic-data sharing. Even though the academic community has proposed many solutions to protect genomic privacy, these so far have not been adopted in practice, mainly due to their impact on the data utility. We introduce GenoShare, a framework that enables individual citizens to understand and quantify the risks of revealing genome-related privacy-sensitive attributes (e.g., health status, kinship, physical traits) from sharing their genomic data with (potentially untrusted) third parties. GenoShare enables informed decision-making about sharing exact genomic data, by jointly simulating genome-based inference attacks and quantifying the risk stemming from a potential data disclosure.


Assuntos
Bases de Dados Genéticas/ética , Privacidade Genética , Genômica/ética , Disseminação de Informação/ética , Consentimento Livre e Esclarecido , Confidencialidade , Revelação , Genoma , Humanos , Registro Médico Coordenado
18.
Stud Health Technol Inform ; 270: 317-321, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570398

RESUMO

Medical studies are usually time consuming, cumbersome and extremely costly to perform, and for exploratory research, their results are also difficult to predict a priori. This is particularly the case for rare diseases, for which finding enough patients is difficult and usually requires an international-scale research. In this case, the process can be even more difficult due to the heterogeneity of data-protection regulations, making the data sharing process particularly hard. In this short paper, we propose MedCo2 (pronounced MedCo square), a distributed system that streamlines the process of a medical study by bridging and enabling both data discovery and data analysis among multiple databases, while protecting data confidentiality and patients' privacy. MedCo2 relies on interactive protocols, homomorphic encryption and differential privacy. It enables the privacy-preserving computations of multiple statistics such as cosine similarity and variance, and the training of machine learning models, on patients that are obliviously selected according to specific criteria among multiple databases.


Assuntos
Privacidade , Estudos de Coortes , Segurança Computacional , Confidencialidade , Humanos , Aprendizado de Máquina
19.
Stud Health Technol Inform ; 270: 1161-1162, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570563

RESUMO

MedCo is the first operational system that makes sensitive medical-data available for research in a simple, privacy-conscious and secure way. It enables a consortium of clinical sites to collectively protect their data and to securely share them with investigators, without single points of failure. In this short paper, we report on our ongoing effort for the operational deployment of MedCo within the context of the Swiss Personalized Health Network (SPHN) for the Swiss Molecular Tumor Board.


Assuntos
Neoplasias , Privacidade , Segurança Computacional , Confidencialidade , Registros Eletrônicos de Saúde , Humanos , Poder Psicológico , Suíça
20.
IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1328-1341, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30010584

RESUMO

The increasing number of health-data breaches is creating a complicated environment for medical-data sharing and, consequently, for medical progress. Therefore, the development of new solutions that can reassure clinical sites by enabling privacy-preserving sharing of sensitive medical data in compliance with stringent regulations (e.g., HIPAA, GDPR) is now more urgent than ever. In this work, we introduce MedCo, the first operational system that enables a group of clinical sites to federate and collectively protect their data in order to share them with external investigators without worrying about security and privacy concerns. MedCo uses (a) collective homomorphic encryption to provide trust decentralization and end-to-end confidentiality protection, and (b) obfuscation techniques to achieve formal notions of privacy, such as differential privacy. A critical feature of MedCo is that it is fully integrated within the i2b2 (Informatics for Integrating Biology and the Bedside) framework, currently used in more than 300 hospitals worldwide. Therefore, it is easily adoptable by clinical sites. We demonstrate MedCo's practicality by testing it on data from The Cancer Genome Atlas in a simulated network of three institutions. Its performance is comparable to the ones of SHRINE (networked i2b2), which, in contrast, does not provide any data protection guarantee.


Assuntos
Segurança Computacional , Registros Eletrônicos de Saúde , Genômica , Informática Médica/métodos , Algoritmos , Confidencialidade , Genoma Humano , Hospitais , Humanos , Internet , Mutação , Neoplasias/genética , Proteínas Proto-Oncogênicas B-raf/genética , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA