Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38261858

ABSTRACT

Gabor phase retrieval is the problem of reconstructing a signal from only the magnitudes of its Gabor transform. Previous findings suggest a possible link between unique solvability of the discrete problem (recovery from measurements on a lattice) and stability of the continuous problem (recovery from measurements on an open subset of R2). In this paper, we close this gap by proving that such a link cannot be made. More precisely, we establish the existence of functions which break uniqueness from samples without affecting stability of the continuous problem. Furthermore, we prove the novel result that counterexamples to unique recovery from samples are dense in L2(R). Finally, we develop an intuitive argument on the connection between directions of instability in phase retrieval and certain Laplacian eigenfunctions associated to small eigenvalues.

2.
Mach Learn Knowl Discov Databases ; 11906: 124-139, 2020.
Article in English | MEDLINE | ID: mdl-33103160

ABSTRACT

T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the 'crowding problem' of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom ν, with ν → ∞ corresponding to SNE and ν = 1 corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that ν < 1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.

3.
J Appl Probab ; 57(2): 458-476, 2020 Jun.
Article in English | MEDLINE | ID: mdl-32913373

ABSTRACT

If we pick n random points uniformly in [0, 1] d and connect each point to its c d log n-nearest neighbors, where d ≥ 2 is the dimension and c d is a constant depending on the dimension, then it is well known that the graph is connected with high probability. We prove that it suffices to connect every point to c d,1 log log n points chosen randomly among its c d,2 log n-nearest neighbors to ensure a giant component of size n - o(n) with high probability. This construction yields a much sparser random graph with ~ n log log n instead of ~ n log n edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of connecting each point to its k nearest neighbors, one can often pick k' ≪ k random points out of the k nearest neighbors and only connect to those without sacrificing quality of results. This approach can simplify and accelerate computation; we illustrate this with experimental results in spectral clustering of large-scale datasets.

4.
Article in English | MEDLINE | ID: mdl-34504892

ABSTRACT

Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.

5.
Math Comput ; 89(324): 1933-1952, 2020.
Article in English | MEDLINE | ID: mdl-33927452

ABSTRACT

Let G = (V,E,w) be a finite, connected graph with weighted edges. We are interested in the problem of finding a subset W ⊂ V of vertices and weights aw such that 1 | V | ∑ v ∈ V f ( v ) ∼ ∑ w ∈ W a w f ( w ) for functions f : V → ℝ that are 'smooth' with respect to the geometry of the graph; here ~ indicates that we want the right-hand side to be as close to the left-hand side as possible. The main application are problems where f is known to vary smoothly over the underlying graph but is expensive to evaluate on even a single vertex. We prove an inequality showing that the integration problem can be rewritten as a geometric problem ('the optimal packing of heat balls'). We discuss how one would construct approximate solutions of the heat ball packing problem; numerical examples demonstrate the efficiency of the method.

6.
J Fourier Anal Appl ; 25(5): 2690-2696, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31772490

ABSTRACT

It is known that if (p n ) n ∈ℕ is a sequence of orthogonal polynomials in L 2([-1,1],w(x)dx), then the roots are distributed according to an arcsine distribution π -1(1 - x 2)-1 dx for a wide variety of weights w(x). We connect this to a result of the Hilbert transform due to Tricomi: if f(x)(1 - x 2)1/4 ∈ L 2(-1,1) and its Hilbert transform Hf vanishes on (-1,1), then the function f is a multiple of the arcsine distribution f ( x ) = c 1 - x 2 χ ( - 1 , 1 ) where c ∈ ℝ . We also prove a localized Parseval-type identity that seems to be new: if f(x)(1-x 2)1/4 ∈ L2(-1, 1) and f ( x ) 1 - x 2 has mean value 0 on (-1, 1), then ∫ - 1 1 ( H f ) ( x ) 2 1 - x 2 d x = ∫ - 1 1 f ( x ) 2 1 - x 2 d x . .

7.
Cognition ; 189: 242-259, 2019 08.
Article in English | MEDLINE | ID: mdl-31015078

ABSTRACT

Can an idea be beautiful? Mathematicians often describe arguments as "beautiful" or "dull," and famous scientists have claimed that mathematical beauty is a guide toward the truth. Do laypeople, like mathematicians and scientists, experience mathematics aesthetically? Three studies suggest that they do. When people rated the similarity of simple mathematical arguments to landscape paintings (Study 1) or pieces of classical piano music (Study 2), their similarity rankings were internally consistent across participants. Moreover, when participants rated beauty and various other potentially aesthetic dimensions for artworks and mathematical arguments, they relied mainly on the same three dimensions for judging beauty-elegance, profundity, and clarity (Study 3). These aesthetic judgments, made separately for artworks and arguments, could be used to predict similarity judgments out-of-sample. These studies also suggest a role for expertise in sharpening aesthetic intuitions about mathematics. We argue that these results shed light on broader issues in how and why humans have aesthetic experiences of abstract ideas.


Subject(s)
Beauty , Esthetics , Intuition , Judgment , Mathematical Concepts , Adult , Female , Humans , Male
8.
Nat Methods ; 16(3): 243-245, 2019 03.
Article in English | MEDLINE | ID: mdl-30742040

ABSTRACT

t-distributed stochastic neighbor embedding (t-SNE) is widely used for visualizing single-cell RNA-sequencing (scRNA-seq) data, but it scales poorly to large datasets. We dramatically accelerate t-SNE, obviating the need for data downsampling, and hence allowing visualization of rare cell populations. Furthermore, we implement a heatmap-style visualization for scRNA-seq based on one-dimensional t-SNE for simultaneously visualizing the expression patterns of thousands of genes. Software is available at https://github.com/KlugerLab/FIt-SNE and https://github.com/KlugerLab/t-SNE-Heatmaps .


Subject(s)
Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Algorithms , Animals , Base Sequence , Computational Biology/methods , Gene Expression Profiling/methods , Genetic Markers , Mice , RNA/genetics , Stochastic Processes
9.
SIAM J Math Data Sci ; 1(2): 313-332, 2019.
Article in English | MEDLINE | ID: mdl-33073204

ABSTRACT

t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the 'early exaggeration' phase, an optimization technique proposed by van der Maaten & Hinton (2008) and van der Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter α and step size h. Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g. the swiss roll) improves. We also discuss a connection to spectral clustering methods.

10.
Sci Context ; 32(4): 361-380, 2019 12.
Article in English | MEDLINE | ID: mdl-32202238

ABSTRACT

In this paper we comparatively explore three claims concerning the disciplinary character of economics by means of citation analysis. The three claims under study are: (1) economics exhibits strong forms of institutional stratification and, as a byproduct, a rather pronounced internal hierarchy; (2) economists strongly conform to institutional incentives; and (3) modern mainstream economics is a largely self-referential intellectual project mostly inaccessible to disciplinary or paradigmatic outsiders. The validity of these claims is assessed by means of an interdisciplinary comparison of citation patterns aiming to identify peculiar characteristics of economic discourse. In doing so, we emphasize that citation data can always be interpreted in different ways, thereby focusing on the contrast between a "cognitive" and an "evaluative" approach towards citation data.

11.
Proc Math Phys Eng Sci ; 473(2204): 20160877, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28878551

ABSTRACT

The purpose of this short paper is to give a variation on the classical Donsker-Varadhan inequality, which bounds the first eigenvalue of a second-order elliptic operator on a bounded domain Ω by the largest mean first exit time of the associated drift-diffusion process via [Formula: see text]Instead of looking at the mean of the first exit time, we study quantiles: let [Formula: see text] be the smallest time t such that the likelihood of exiting within that time is p, then [Formula: see text]Moreover, as [Formula: see text], this lower bound converges to λ1.

SELECTION OF CITATIONS
SEARCH DETAIL
...