Search | VHL Regional Portal

SillyPutty: Improved clustering by optimizing the silhouette width.

Bombina, Polina; Tally, Dwayne; Abrams, Zachary B; Coombes, Kevin R.

PLoS One ; 19(6): e0300358, 2024.

Article in English | MEDLINE | ID: mdl-38848330

ABSTRACT

Clustering is an important task in biomedical science, and it is widely believed that different data sets are best clustered using different algorithms. When choosing between clustering algorithms on the same data set, reseachers typically rely on global measures of quality, such as the mean silhouette width, and overlook the fine details of clustering. However, the silhouette width actually computes scores that describe how well each individual element is clustered. Inspired by this observation, we developed a novel clustering method, called SillyPutty. Unlike existing methods, SillyPutty uses the silhouette width for individual elements as a tool to optimize the mean silhouette width. This shift in perspective allows for a more granular evaluation of clustering quality, potentially addressing limitations in current methodologies. To test the SillyPutty algorithm, we first simulated a series of data sets using the Umpire R package and then used real-workd data from The Cancer Genome Atlas. Using these data sets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed. Availability: The SillyPutty R package can be downloaded from the Comprehensive R Archive Network (CRAN).

Subject(s)

Algorithms , Cluster Analysis , Humans , Neoplasms/pathology , Software

SillyPutty: Improved clustering by optimizing the silhouette width.

Bombina, Polina; Tally, Dwayne; Abrams, Zachary B; Coombes, Kevin R.

bioRxiv ; 2023 Nov 11.

Article in English | MEDLINE | ID: mdl-37986817

ABSTRACT

Unsupervised clustering is an important task in biomedical science. We developed a new clustering method, called SillyPutty, for unsupervised clustering. As test data, we generated a series of datasets using the Umpire R package. Using these datasets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed.

RCytoGPS: an R package for reading and visualizing cytogenetics data.

Abrams, Zachary B; Tally, Dwayne G; Abruzzo, Lynne V; Coombes, Kevin R.

Bioinformatics ; 37(23): 4589-4590, 2021 12 07.

Article in English | MEDLINE | ID: mdl-34601554

ABSTRACT

SUMMARY: Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these data have not been used in large-scale computational analyses due to limitations in the ISCN text format and structure. Recently developed computational tools such as CytoGPS have enabled large-scale computational analyses of karyotypes. To further enable such analyses, we have now developed RCytoGPS, an R package that takes JSON files generated from CytoGPS.org and converts them into objects in R. This conversion facilitates the analysis and visualizations of karyotype data. In effect this tool streamlines the process of performing large-scale karyotype analyses, thus advancing the field of computational cytogenetic pathology. AVAILABILITY AND IMPLEMENTATION: Freely available at https://CRAN.R-project.org/package=RCytoGPS. The code for the underlying CytoGPS software can be found at https://github.com/i2-wustl/CytoGPS.

Subject(s)

Reading , Software , Humans , Karyotyping , Karyotype

Pattern recognition in lymphoid malignancies using CytoGPS and Mercator.

Abrams, Zachary B; Tally, Dwayne G; Zhang, Lin; Coombes, Caitlin E; Payne, Philip R O; Abruzzo, Lynne V; Coombes, Kevin R.

BMC Bioinformatics ; 22(1): 100, 2021 Mar 01.

Article in English | MEDLINE | ID: mdl-33648439

ABSTRACT

BACKGROUND: There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. We use Mercator to address biomedical questions of cytogenetic patterns relating to lymphoid hematologic malignancies, which include a broad set of leukemias and lymphomas. Karyotype data are one of the most common form of genetic data collected on lymphoid malignancies, because karyotyping is part of the standard of care in these cancers. RESULTS: In this paper we combine the analytic power of CytoGPS and Mercator to perform a large-scale multidimensional pattern recognition study on 22,741 karyotype samples in 47 different hematologic malignancies obtained from the public Mitelman database. CONCLUSION: Our findings indicate that Mercator was able to identify both known and novel cytogenetic patterns across different lymphoid malignancies, furthering our understanding of the genetics of these diseases.

Subject(s)

Hematologic Diseases , Karyotyping , Neoplasms , Chromosome Aberrations , Humans , Karyotype

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL