Pesquisa | Portal Regional da BVS (teste)

Privacy-Preserving Federated Survival Support Vector Machines for Cross-Institutional Time-To-Event Analysis: Algorithm Development and Validation.

Späth, Julian; Sewald, Zeno; Probul, Niklas; Berland, Magali; Almeida, Mathieu; Pons, Nicolas; Le Chatelier, Emmanuelle; Ginès, Pere; Solé, Cristina; Juanola, Adrià; Pauling, Josch; Baumbach, Jan.

JMIR AI ; 3: e47652, 2024 Mar 29.

Artigo em Inglês | MEDLINE | ID: mdl-38875678

RESUMO

BACKGROUND: Central collection of distributed medical patient data is problematic due to strict privacy regulations. Especially in clinical environments, such as clinical time-to-event studies, large sample sizes are critical but usually not available at a single institution. It has been shown recently that federated learning, combined with privacy-enhancing technologies, is an excellent and privacy-preserving alternative to data sharing. OBJECTIVE: This study aims to develop and validate a privacy-preserving, federated survival support vector machine (SVM) and make it accessible for researchers to perform cross-institutional time-to-event analyses. METHODS: We extended the survival SVM algorithm to be applicable in federated environments. We further implemented it as a FeatureCloud app, enabling it to run in the federated infrastructure provided by the FeatureCloud platform. Finally, we evaluated our algorithm on 3 benchmark data sets, a large sample size synthetic data set, and a real-world microbiome data set and compared the results to the corresponding central method. RESULTS: Our federated survival SVM produces highly similar results to the centralized model on all data sets. The maximal difference between the model weights of the central model and the federated model was only 0.001, and the mean difference over all data sets was 0.0002. We further show that by including more data in the analysis through federated learning, predictions are more accurate even in the presence of site-dependent batch effects. CONCLUSIONS: The federated survival SVM extends the palette of federated time-to-event analysis methods by a robust machine learning approach. To our knowledge, the implemented FeatureCloud app is the first publicly available implementation of a federated survival SVM, is freely accessible for all kinds of researchers, and can be directly used within the FeatureCloud platform.

Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing.

Abellan-Schneyder, Isabel; Matchado, Monica S; Reitmeier, Sandra; Sommer, Alina; Sewald, Zeno; Baumbach, Jan; List, Markus; Neuhaus, Klaus.

mSphere ; 6(1)2021 02 24.

Artigo em Inglês | MEDLINE | ID: mdl-33627512

RESUMO

Short-amplicon 16S rRNA gene sequencing is currently the method of choice for studies investigating microbiomes. However, comparative studies on differences in procedures are scarce. We sequenced human stool samples and mock communities with increasing complexity using a variety of commonly used protocols. Short amplicons targeting different variable regions (V-regions) or ranges thereof (V1-V2, V1-V3, V3-V4, V4, V4-V5, V6-V8, and V7-V9) were investigated for differences in the composition outcome due to primer choices. Next, the influence of clustering (operational taxonomic units [OTUs], zero-radius OTUs [zOTUs], and amplicon sequence variants [ASVs]), different databases (GreenGenes, the Ribosomal Database Project, Silva, the genomic-based 16S rRNA Database, and The All-Species Living Tree), and bioinformatic settings on taxonomic assignment were also investigated. We present a systematic comparison across all typically used V-regions using well-established primers. While it is known that the primer choice has a significant influence on the resulting microbial composition, we show that microbial profiles generated using different primer pairs need independent validation of performance. Further, comparing data sets across V-regions using different databases might be misleading due to differences in nomenclature (e.g., Enterorhabdus versus Adlercreutzia) and varying precisions in classification down to genus level. Overall, specific but important taxa are not picked up by certain primer pairs (e.g., Bacteroidetes is missed using primers 515F-944R) or due to the database used (e.g., Acetatifactor in GreenGenes and the genomic-based 16S rRNA Database). We found that appropriate truncation of amplicons is essential and different truncated-length combinations should be tested for each study. Finally, specific mock communities of sufficient and adequate complexity are highly recommended.IMPORTANCE In 16S rRNA gene sequencing, certain bacterial genera were found to be underrepresented or even missing in taxonomic profiles when using unsuitable primer combinations, outdated reference databases, or inadequate pipeline settings. Concerning the last, quality thresholds as well as bioinformatic settings (i.e., clustering approach, analysis pipeline, and specific adjustments such as truncation) are responsible for a number of observed differences between studies. Conclusions drawn by comparing one data set to another (e.g., between publications) appear to be problematic and require independent cross-validation using matching V-regions and uniform data processing. Therefore, we highlight the importance of a thought-out study design including sufficiently complex mock standards and appropriate V-region choice for the sample of interest. The use of processing pipelines and parameters must be tested beforehand.

Assuntos

Primers do DNA/genética , DNA Bacteriano/genética , Microbioma Gastrointestinal/genética , Sequenciamento de Nucleotídeos em Larga Escala/normas , RNA Ribossômico 16S/genética , Biologia Computacional , Fezes/microbiologia , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Filogenia , Análise de Sequência de DNA

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA