Search | VHL Regional Portal

ilastik: interactive machine learning for (bio)image analysis.

Berg, Stuart; Kutra, Dominik; Kroeger, Thorben; Straehle, Christoph N; Kausler, Bernhard X; Haubold, Carsten; Schiegg, Martin; Ales, Janez; Beier, Thorsten; Rudy, Markus; Eren, Kemal; Cervantes, Jaime I; Xu, Buote; Beuttenmueller, Fynn; Wolny, Adrian; Zhang, Chong; Koethe, Ullrich; Hamprecht, Fred A; Kreshuk, Anna.

Nat Methods ; 16(12): 1226-1232, 2019 12.

Article in English | MEDLINE | ID: mdl-31570887

ABSTRACT

We present ilastik, an easy-to-use interactive tool that brings machine-learning-based (bio)image analysis to end users without substantial computational expertise. It contains pre-defined workflows for image segmentation, object classification, counting and tracking. Users adapt the workflows to the problem at hand by interactively providing sparse training annotations for a nonlinear classifier. ilastik can process data in up to five dimensions (3D, time and number of channels). Its computational back end runs operations on-demand wherever possible, allowing for interactive prediction on data larger than RAM. Once the classifiers are trained, ilastik workflows can be applied to new data from the command line without further user interaction. We describe all ilastik workflows in detail, including three case studies and a discussion on the expected performance.

Subject(s)

Image Processing, Computer-Assisted/methods , Machine Learning , Aryl Hydrocarbon Receptor Nuclear Translocator/physiology , Cell Proliferation , Collagen/metabolism , Endoplasmic Reticulum/ultrastructure , Humans

Overcoming species boundaries in peptide identification with Bayesian information criterion-driven error-tolerant peptide search (BICEPS).

Renard, Bernhard Y; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W; Tzur, Amit; Hamprecht, Fred A; Steen, Hanno.

Mol Cell Proteomics ; 11(7): M111.014167, 2012 Jul.

Article in English | MEDLINE | ID: mdl-22493179

ABSTRACT

Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695-716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis.

Subject(s)

Chickens/genetics , Filarioidea/genetics , Peptides/chemistry , Proteomics/methods , Software , Algorithms , Amino Acid Sequence , Animals , Bayes Theorem , Biological Evolution , Databases, Protein , Humans , Internet , Mass Spectrometry , Molecular Sequence Data , Reproducibility of Results , Sequence Analysis, Protein

libfbi: a C++ implementation for fast box intersection and application to sparse mass spectrometry data.

Kirchner, Marc; Xu, Buote; Steen, Hanno; Steen, Judith A J.

Bioinformatics ; 27(8): 1166-7, 2011 Apr 15.

Article in English | MEDLINE | ID: mdl-21330291

ABSTRACT

MOTIVATION: Algorithms for sparse data require fast search and subset selection capabilities for the determination of point neighborhoods. A natural data representation for such cases are space partitioning data structures. However, the associated range queries assume noise-free observations and cannot take into account observation-specific uncertainty estimates that are present in e.g. modern mass spectrometry data. In order to accommodate the inhomogeneous noise characteristics of sparse real-world datasets, point queries need to be reformulated in terms of box intersection queries, where box sizes correspond to uncertainty regions for each observation. RESULTS: This contribution introduces libfbi, a standard C++, header-only template implementation for fast box intersection in an arbitrary number of dimensions, with arbitrary data types in each dimension. The implementation is applied to a data aggregation task on state-of-the-art liquid chromatography/mass spectrometry data, where it shows excellent run time properties. AVAILABILITY: The library is available under an MIT license and can be downloaded from http://software.steenlab.org/libfbi. CONTACT: marc.kirchner@childrens.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Mass Spectrometry/methods , Chromatography, Liquid , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL