Comments on Redirecting to http://thenoisychannel.com...: Thinking about IR Evaluation

Indeed, I think diversity of results is important ...

2008-05-14T22:39:00.000-04:00

Indeed, I think diversity of results is important and underemphasized by today's search engines, at least based on what I infer from personal experience. Interfaces that promote query refinement (e.g., faceted search, clustering) may offer a more diverse or risk-mitigating experience. And formal models like MMR or Zhai/Lafferty are certainly aiming for the benefits of diversity.

But Chen and Karger, who restrict themselves to returning a list of results rather than suggesting query refinements, aren't just talking about diversity. The measure they propose, k-call at n is binary: the measure returns 1 if at least k of the top n results are relevant, 0 otherwise. Hence, at k=1, an algorithm does well to return a diverse set of results, in the hopes that at least one will be relevant. But at k=n, the algorithm does better to return a homogeneous set of results, at least if Van Rijsbergen's cluster hypothesis holds.

those of you interested in formal models justifyin...

2008-05-14T09:49:00.000-04:00

those of you interested in formal models justifying diversity should look at Hal Varian's "Economics of Search" talk from SIGIR or Zhai/Lafferty's risk minimization framework for IR.

sorry - correction - gary's student had a poster a...

2008-05-14T04:30:00.000-04:00

sorry - correction - gary's student had a poster at sigir07.

This concept of diversity in search results, i bel...

2008-05-14T04:29:00.000-04:00

This concept of diversity in search results, i believe, is especially important in faceted search, as every result item it usually equally weighted to a facet value. so when you make a selecetion in a facet, every result is equally relevant. so - how do we organise the results?

There was a paper at SIGIR07 by gary marchionini's student that was showing how they had created a matrix of how similar/different each result was to every other result in the list. Sadly it didnt show WHY it was different.

but lets take that WHY for a second. if the results list showed what made each result novel according to the rest, then could we more accurately choose what was important to show in the per-result summaries? 'if you choose this one - you find this information that is not in the rest of the results'. it could not only affect which text snippit to show, but which other facet-values (in other facets) it belongs to that makes it unique etc.

id love to see a decent representation of novelty in a results set - id be doing it if i had the time/resources. there are some people working on novelty - david losada in this IPandM article appear to be talking about this sort of topic for example. i havent read the paper yet though.

Diversity seems to be an increasingly important me...

2008-05-13T23:46:00.000-04:00

Diversity seems to be an increasingly important metric for many techniques that return a ranked list of results: given that we cannot have perfect personalization and that we cannot figure out exactly what the user wants to see, let's offer a variety of different results, and let the user pick. We may not get all the results right, but we will have something for everyone.

It is very close to the idea of faceted search (or guided summarization that you mentioned in an earlier post). But instead of exposing the diversity of the results using multiple orthogonal browsing components, you try to "embed" diversity in the ranked list. (Or, even better, you expose both the facets and generate diverse results.)

I also like to connect this idea (conceptually) to the ideas of minimizing risk in finance: In Finance we know that stocks have high expected performance. However,
they tend to go all up or all down together (high correlation). Therefore you want to mix these with some uncorrelated or anti-correlated investments (commodities, bonds), so that you can have slightly lower expected performance, but much lower risk of complete failure.

Perhaps we should start having risk-sensitive evaluations in IR, in the same way that people in Finance measure the "value at risk" in investment portfolios?