Search | VHL Regional Portal

Learning meaningful representations of protein sequences.

Detlefsen, Nicki Skafte; Hauberg, Søren; Boomsma, Wouter.

Nat Commun ; 13(1): 1914, 2022 04 08.

Article in English | MEDLINE | ID: mdl-35395843

ABSTRACT

How we choose to represent our data has a fundamental impact on our ability to subsequently extract information from them. Machine learning promises to automatically determine efficient representations from large unstructured datasets, such as those arising in biology. However, empirical evidence suggests that seemingly minor changes to these machine learning models yield drastically different data representations that result in different biological interpretations of data. This begs the question of what even constitutes the most meaningful representation. Here, we approach this question for representations of protein sequences, which have received considerable attention in the recent literature. We explore two key contexts in which representations naturally arise: transfer learning and interpretable learning. In the first context, we demonstrate that several contemporary practices yield suboptimal performance, and in the latter we demonstrate that taking representation geometry into account significantly improves interpretability and lets the models reveal biological information that is otherwise obscured.

Subject(s)

Machine Learning , Amino Acid Sequence

Automated Quantification of sTIL Density with H&E-Based Digital Image Analysis Has Prognostic Potential in Triple-Negative Breast Cancers.

Thagaard, Jeppe; Stovgaard, Elisabeth Specht; Vognsen, Line Grove; Hauberg, Søren; Dahl, Anders; Ebstrup, Thomas; Doré, Johan; Vincentz, Rikke Egede; Jepsen, Rikke Karlin; Roslind, Anne; Kümler, Iben; Nielsen, Dorte; Balslev, Eva.

Cancers (Basel) ; 13(12)2021 Jun 18.

Article in English | MEDLINE | ID: mdl-34207414

ABSTRACT

Triple-negative breast cancer (TNBC) is an aggressive and difficult-to-treat cancer type that represents approximately 15% of all breast cancers. Recently, stromal tumor-infiltrating lymphocytes (sTIL) resurfaced as a strong prognostic biomarker for overall survival (OS) for TNBC patients. Manual assessment has innate limitations that hinder clinical adoption, and the International Immuno-Oncology Biomarker Working Group (TIL-WG) has therefore envisioned that computational assessment of sTIL could overcome these limitations and recommended that any algorithm should follow the manual guidelines where appropriate. However, no existing studies capture all the concepts of the guideline or have shown the same prognostic evidence as manual assessment. In this study, we present a fully automated digital image analysis pipeline and demonstrate that our hematoxylin and eosin (H&E)-based pipeline can provide a quantitative and interpretable score that correlates with the manual pathologist-derived sTIL status, and importantly, can stratify a retrospective cohort into two significant distinct prognostic groups. We found our score to be prognostic for OS (HR: 0.81 CI: 0.72-0.92 p = 0.001) independent of age, tumor size, nodal status, and tumor type in statistical modeling. While prior studies have followed fragments of the TIL-WG guideline, our approach is the first to follow all complex aspects, where appropriate, supporting the TIL-WG vision of computational assessment of sTIL in the future clinical setting.

Intrinsic Grassmann Averages for Online Linear, Robust and Nonlinear Subspace Learning.

Chakraborty, Rudrasis; Yang, Liu; Hauberg, Soren; Vemuri, Baba C.

IEEE Trans Pattern Anal Mach Intell ; 43(11): 3904-3917, 2021 Nov.

Article in English | MEDLINE | ID: mdl-32386140

ABSTRACT

Principal component analysis (PCA) and Kernel principal component analysis (KPCA) are fundamental methods in machine learning for dimensionality reduction. The former is a technique for finding this approximation in finite dimensions and the latter is often in an infinite dimensional reproducing Kernel Hilbert-space (RKHS). In this paper, we present a geometric framework for computing the principal linear subspaces in both (finite and infinite) situations as well as for the robust PCA case, that amounts to computing the intrinsic average on the space of all subspaces: the Grassmann manifold. Points on this manifold are defined as the subspaces spanned by K-tuples of observations. The intrinsic Grassmann average of these subspaces are shown to coincide with the principal components of the observations when they are drawn from a Gaussian distribution. We show similar results in the RKHS case and provide an efficient algorithm for computing the projection onto the this average subspace. The result is a method akin to KPCA which is substantially faster. Further, we present a novel online version of the KPCA using our geometric framework. Competitive performance of all our algorithms are demonstrated on a variety of real and synthetic data sets.

Transformations Based on Continuous Piecewise-Affine Velocity Fields.

Freifeld, Oren; Hauberg, Soren; Batmanghelich, Kayhan; Fisher, Jonn W.

IEEE Trans Pattern Anal Mach Intell ; 39(12): 2496-2509, 2017 12.

Article in English | MEDLINE | ID: mdl-28092517

ABSTRACT

We propose novel finite-dimensional spaces of well-behaved transformations. The latter are obtained by (fast and highly-accurate) integration of continuous piecewise-affine velocity fields. The proposed method is simple yet highly expressive, effortlessly handles optional constraints (e.g., volume preservation and/or boundary conditions), and supports convenient modeling choices such as smoothing priors and coarse-to-fine analysis. Importantly, the proposed approach, partly due to its rapid likelihood evaluations and partly due to its other properties, facilitates tractable inference over rich transformation spaces, including using Markov-Chain Monte-Carlo methods. Its applications include, but are not limited to: monotonic regression (more generally, optimization over monotonic functions); modeling cumulative distribution functions or histograms; time-warping; image warping; image registration; real-time diffeomorphic image editing; data augmentation for image classifiers. Our GPU-based code is publicly available.

Data-driven forward model inference for EEG brain imaging.

Hansen, Sofie Therese; Hauberg, Søren; Hansen, Lars Kai.

Neuroimage ; 139: 249-258, 2016 Oct 01.

Article in English | MEDLINE | ID: mdl-27307192

ABSTRACT

Electroencephalography (EEG) is a flexible and accessible tool with excellent temporal resolution but with a spatial resolution hampered by volume conduction. Reconstruction of the cortical sources of measured EEG activity partly alleviates this problem and effectively turns EEG into a brain imaging device. The quality of the source reconstruction depends on the forward model which details head geometry and conductivities of different head compartments. These person-specific factors are complex to determine, requiring detailed knowledge of the subject's anatomy and physiology. In this proof-of-concept study, we show that, even when anatomical knowledge is unavailable, a suitable forward model can be estimated directly from the EEG. We propose a data-driven approach that provides a low-dimensional parametrization of head geometry and compartment conductivities, built using a corpus of forward models. Combined with only a recorded EEG signal, we are able to estimate both the brain sources and a person-specific forward model by optimizing this parametrization. We thus not only solve an inverse problem, but also optimize over its specification. Our work demonstrates that personalized EEG brain imaging is possible, even when the head geometry and conductivities are unknown.

Subject(s)

Brain Mapping/methods , Cerebral Cortex/physiology , Electroencephalography , Models, Neurological , Adult , Female , Humans , Male , Signal Processing, Computer-Assisted , Young Adult

Principal Curves on Riemannian Manifolds.

Hauberg, Soren.

IEEE Trans Pattern Anal Mach Intell ; 38(9): 1915-21, 2016 09.

Article in English | MEDLINE | ID: mdl-26540674

ABSTRACT

Euclidean statistics are often generalized to Riemannian manifolds by replacing straight-line interpolations with geodesic ones. While these Riemannian models are familiar-looking, they are restricted by the inflexibility of geodesics, and they rely on constructions which are optimal only in Euclidean domains. We consider extensions of Principal Component Analysis (PCA) to Riemannian manifolds. Classic Riemannian approaches seek a geodesic curve passing through the mean that optimizes a criteria of interest. The requirements that the solution both is geodesic and must pass through the mean tend to imply that the methods only work well when the manifold is mostly flat within the support of the generating distribution. We argue that instead of generalizing linear Euclidean models, it is more fruitful to generalize non-linear Euclidean models. Specifically, we extend the classic Principal Curves from Hastie & Stuetzle to data residing on a complete Riemannian manifold. We show that for elliptical distributions in the tangent of spaces of constant curvature, the standard principal geodesic is a principal curve. The proposed model is simple to compute and avoids many of the pitfalls of traditional geodesic approaches. We empirically demonstrate the effectiveness of the Riemannian principal curves on several manifolds and datasets.

Probabilistic shortest path tractography in DTI using Gaussian Process ODE solvers.

Schober, Michael; Kasenburg, Niklas; Feragen, Aasa; Hennig, Philipp; Hauberg, Soren.

Med Image Comput Comput Assist Interv ; 17(Pt 3): 265-72, 2014.

Article in English | MEDLINE | ID: mdl-25320808

ABSTRACT

Tractography in diffusion tensor imaging estimates connectivity in the brain through observations of local diffusivity. These observations are noisy and of low resolution and, as a consequence, connections cannot be found with high precision. We use probabilistic numerics to estimate connectivity between regions of interest and contribute a Gaussian Process tractography algorithm which allows for both quantification and visualization of its posterior uncertainty. We use the uncertainty both in visualization of individual tracts as well as in heat maps of tract locations. Finally, we provide a quantitative evaluation of different metrics and algorithms showing that the adjoint metric (8] combined with our algorithm produces paths which agree most often with experts.

Subject(s)

Algorithms , Brain/cytology , Connectome/methods , Diffusion Tensor Imaging/methods , Image Interpretation, Computer-Assisted/methods , Nerve Fibers, Myelinated/ultrastructure , Pattern Recognition, Automated/methods , Data Interpretation, Statistical , Humans , Image Enhancement/methods , Normal Distribution , Reproducibility of Results , Sensitivity and Specificity

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL