Búsqueda | Portal Regional de la BVS

Predicting consumer behavior with Web search.

Goel, Sharad; Hofman, Jake M; Lahaie, Sébastien; Pennock, David M; Watts, Duncan J.

Proc Natl Acad Sci U S A ; 107(41): 17486-90, 2010 Oct 12.

Artículo en Inglés | MEDLINE | ID: mdl-20876140

RESUMEN

Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.

Asunto(s)

Conducta/fisiología , Comportamiento del Consumidor , Predicción/métodos , Motor de Búsqueda/estadística & datos numéricos , Humanos , Modelos Teóricos , Motor de Búsqueda/economía

Using internet searches for influenza surveillance.

Polgreen, Philip M; Chen, Yiling; Pennock, David M; Nelson, Forrest D.

Clin Infect Dis ; 47(11): 1443-8, 2008 Dec 01.

Artículo en Inglés | MEDLINE | ID: mdl-18954267

RESUMEN

The Internet is an important source of health information. Thus, the frequency of Internet searches may provide information regarding infectious disease activity. As an example, we examined the relationship between searches for influenza and actual influenza occurrence. Using search queries from the Yahoo! search engine ( http://search.yahoo.com ) from March 2004 through May 2008, we counted daily unique queries originating in the United States that contained influenza-related search terms. Counts were divided by the total number of searches, and the resulting daily fraction of searches was averaged over the week. We estimated linear models, using searches with 1-10-week lead times as explanatory variables to predict the percentage of cultures positive for influenza and deaths attributable to pneumonia and influenza in the United States. With use of the frequency of searches, our models predicted an increase in cultures positive for influenza 1-3 weeks in advance of when they occurred (P < .001), and similar models predicted an increase in mortality attributable to pneumonia and influenza up to 5 weeks in advance (P < .001). Search-term surveillance may provide an additional tool for disease surveillance.

Asunto(s)

Gripe Humana/epidemiología , Internet , Vigilancia de la Población/métodos , Vigilancia de Guardia , Predicción , Humanos , Orthomyxoviridae/aislamiento & purificación , Neumonía/mortalidad , Estadística como Asunto , Estados Unidos/epidemiología

Winners don't take all: Characterizing the competition for links on the web.

Pennock, David M; Flake, Gary W; Lawrence, Steve; Glover, Eric J; Giles, C Lee.

Proc Natl Acad Sci U S A ; 99(8): 5207-11, 2002 Apr 16.

Artículo en Inglés | MEDLINE | ID: mdl-16578867

RESUMEN

As a whole, the World Wide Web displays a striking "rich get richer" behavior, with a relatively small number of sites receiving a disproportionately large share of hyperlink references and traffic. However, hidden in this skewed global distribution, we discover a qualitatively different and considerably less biased link distribution among subcategories of pages-for example, among all university homepages or all newspaper homepages. Although the connectivity distribution over the entire web is close to a pure power law, we find that the distribution within specific categories is typically unimodal on a log scale, with the location of the mode, and thus the extent of the rich get richer phenomenon, varying across different categories. Similar distributions occur in many other naturally occurring networks, including research paper citations, movie actor collaborations, and United States power grid connections. A simple generative model, incorporating a mixture of preferential and uniform attachment, quantifies the degree to which the rich nodes grow richer, and how new (and poorly connected) nodes can compete. The model accurately accounts for the true connectivity distributions of category-specific web pages, the web as a whole, and other social networks.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA