Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Nature ; 630(8015): 181-188, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38778098

ABSTRACT

Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles1-3. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context4. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet5 method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision-language pretraining for pathology7,8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling.


Subject(s)
Datasets as Topic , Image Processing, Computer-Assisted , Machine Learning , Pathology, Clinical , Humans , Benchmarking , Image Processing, Computer-Assisted/methods , Neoplasms/classification , Neoplasms/diagnosis , Neoplasms/pathology , Pathology, Clinical/methods , Male , Female
2.
Patterns (N Y) ; 4(4): 100726, 2023 Apr 14.
Article in English | MEDLINE | ID: mdl-37123439

ABSTRACT

Most detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence generation. We propose leveraging patient-level supervision from medical registries, which are often readily available and capture key patient information, for general RWD applications. We conduct an extensive study on 135,107 patients from the cancer registry of a large integrated delivery network (IDN) comprising healthcare systems in five western US states. Our deep-learning methods attain test area under the receiver operating characteristic curve (AUROC) values of 94%-99% for key tumor attributes and comparable performance on held-out data from separate health systems and states. Ablation results demonstrate the superiority of these advanced deep-learning methods. Error analysis shows that our NLP system sometimes even corrects errors in registrar labels.

3.
Patterns (N Y) ; 4(4): 100729, 2023 Apr 14.
Article in English | MEDLINE | ID: mdl-37123444

ABSTRACT

Large neural language models have transformed modern natural language processing (NLP) applications. However, fine-tuning such models for specific tasks remains challenging as model size increases, especially with small labeled datasets, which are common in biomedical NLP. We conduct a systematic study on fine-tuning stability in biomedical NLP. We show that fine-tuning performance may be sensitive to pretraining settings and conduct an exploration of techniques for addressing fine-tuning instability. We show that these techniques can substantially improve fine-tuning performance for low-resource biomedical NLP applications. Specifically, freezing lower layers is helpful for standard BERT- B A S E models, while layerwise decay is more effective for BERT- L A R G E and ELECTRA models. For low-resource text similarity tasks, such as BIOSSES, reinitializing the top layers is the optimal strategy. Overall, domain-specific vocabulary and pretraining facilitate robust models for fine-tuning. Based on these findings, we establish a new state of the art on a wide range of biomedical NLP applications.

4.
NPJ Digit Med ; 2: 10, 2019.
Article in English | MEDLINE | ID: mdl-31304359

ABSTRACT

Much of the AI work in healthcare is focused around disease prediction in clinical settings, which is an important application that has yet to deliver in earnest. However, there are other fundamental aspects like helping patients and care teams interact and communicate in efficient and meaningful ways, which could deliver quadruple-aim improvements. After heart disease and cancer, preventable medical errors are the third leading cause of death in the United States. The largest subset of medical errors is medication error. Providing the right treatment plan for patients includes knowledge about their current medications and drug allergies, an often challenging task. The widespread growth of prescribing and consuming medications has increased the need for applications that support medication reconciliation. We show a deep-learning application that can help reduce avoidable errors with their attendant risk, i.e., correctly identifying prescription medication, which is currently a tedious and error-prone task. We demonstrate prescription-pill identification from mobile images in the NIH NLM Pill Image Recognition Challenge dataset. Our application recognizes the correct pill within the top-5 results at 94% accuracy, which compares favorably to the original competition winner at 83.3% for top-5 under comparable, though not identical configurations. The Institute of Medicine claims that better use of information technology can be an important step in reducing medication errors. Therefore, we believe that a more immediate impact of AI in healthcare will occur with a seamless integration of AI into clinical workflows, readily addressing the quadruple aim of healthcare.

5.
Bioinformatics ; 30(23): 3302-9, 2014 Dec 01.
Article in English | MEDLINE | ID: mdl-25123903

ABSTRACT

MOTIVATION: Identifying somatic changes from tumor and matched normal sequences has become a standard approach in cancer research. More specifically, this requires accurate detection of somatic point mutations with low allele frequencies in impure and heterogeneous cancer samples. Although haplotype phasing information derived by using heterozygous germ line variants near candidate mutations would improve accuracy, no somatic mutation caller that uses such information is currently available. RESULTS: We propose a Bayesian hierarchical method, termed HapMuC, in which power is increased by using available information on heterozygous germ line variants located near candidate mutations. We first constructed two generative models (the mutation model and the error model). In the generative models, we prepared candidate haplotypes, considering a heterozygous germ line variant if available, and the observed reads were realigned to the haplotypes. We then inferred the haplotype frequencies and computed the marginal likelihoods using a variational Bayesian algorithm. Finally, we derived a Bayes factor for evaluating the possibility of the existence of somatic mutations. We also demonstrated that our algorithm has superior specificity and sensitivity compared with existing methods, as determined based on a simulation, the TCGA Mutation Calling Benchmark 4 datasets and data from the COLO-829 cell line. AVAILABILITY AND IMPLEMENTATION: The HapMuC source code is available from http://github.com/usuyama/hapmuc.


Subject(s)
DNA Mutational Analysis/methods , Heterozygote , Neoplasms/genetics , Algorithms , Bayes Theorem , Cell Line, Tumor , Gene Frequency , Haplotypes , Humans , Mutation , Polymorphism, Single Nucleotide
SELECTION OF CITATIONS
SEARCH DETAIL
...