Showing posts with label Library and Information Science. Show all posts
Showing posts with label Library and Information Science. Show all posts

Monday, September 8, 2008

Quick Bites: Taxonomy Directed Folksonomies

Props to Gwen Harris at Taxonomy Watch for posting a paper by Sarah Hayman and Nick Lothian on Taxonomy Directed Folksonomies.

The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.

I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.

Wednesday, August 27, 2008

Transparency in Information Retrieval

It's been hard to find time to write another post while keeping up with the comment stream on my previous post about set retrieval! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.

Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.

The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.

Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.

What does this have to do with set retrieval vs. ranked retrieval? Plenty!

Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.

The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.

In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.

But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.

Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.

If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.

But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?

To be continued...

Tuesday, July 8, 2008

Librarian 2.0

Many of the words that mark milestones in the history of technology, such as calculator and word processor, originally corresponded to people. Calculating had at least two lives as a technology breakthrough--first as a process, and then as a automatic means for executing that process. Thanks to inventions like calculators and computers, human beings have moved up the value chain to become scientists and engineers who take low-level details for granted.

Similarly, the advances in information science and retrieval have dramatically changed the role of a reference librarian.

Hopefully some of you old enough to remember card catalogs, They were certainly functional if you knew the exact title or author you were looking for, assuming the title wasn't too generic or author too prolific. Where card catalogs fell short was in supporting exploratory search. In many cases, your best bet was to quite literally explore the stacks and hope that locality within the Dewey Decimal system sufficed for to support your information seeking needs. Alternatively, you could follow citation paths--the dead-tree precursor of surfing a hypertext collection.

For exploratory tasks, library patrons would turn to reference librarians, who would clarify the patrons' needs through a process called the reference interview. According to Wikipedia:
A reference interview is composed of two segments:

1. An initial segment in which the librarian encourages the user to fully discuss the request.
2. A final segment in which the librarian asks questions to relate the request to the materials available in the library

A reference interview is structured (ideally) according to the following series of steps. First the library user states a question or describes a problem. The librarian then clarifies the user's information need, sometimes leading him or her back from a request for a specific resource (which may not be the best one for the problem at hand) to the actual information need as it manifests in the library user's life. Following that, the librarian suggests information resources that address the user's information need, explaining the nature and scope of information they contain and soliciting feedback. The reference interview closes when the librarian has provided the appropriate information or a referral to an outside resource where it can be found, and the user confirms that he or she has received the information needed.
Fast forward to the present day. Thanks to modern search engines, title and author search are no longer tedious processes. Moreover, search engines are somewhat forgiving of users, offering spelling correction and inexact query matching. Libraries are still catching up with advances in technology, but the evolution is clearly under way.

However, search engines have not obviated the need for a reference interview. Excepting the simple cases of known item search, the typical information seeker needs help translating an information need into one or more search queries. And that information need may change as the seeker learns from the process.

But it should come as no surprise that information seeking support systems need to be more than search engines. The ideal information seeking support system emulates a reference librarian, stepping users through a structured process of clarification. Indeed, this is exactly what my colleagues and I at Endeca are trying to do in our work with libraries and more broadly in pursuing a vision of human computer information retrieval.

What then becomes of librarians? Much as calculators and computers did not obviate the need for mathematicians, I don't see technology obviating the need for information scientists. Library schools have already evolved into information schools, and I have no doubt that their graduates will help establish the next generation of information seeking technology that makes today's search engines seem as quaint as card catalogs.

Sunday, July 6, 2008

Resolving the Battle Royale between Information Retrieval and Information Science

The following is the position paper I submitted to the NSF Information Seeking Support Systems Workshop last month. The workshop report is still being assembled, but I wanted to share my own contribution to the discussion, since it is particularly appropriate to the themes of The Noisy Channel.


Resolving the Battle Royale between Information Retrieval and Information Science


Daniel Tunkelang

Endeca

ABSTRACT

We propose an approach to help resolve the “battle royale” between the information retrieval and information science communities. The information retrieval side favors the Cranfield paradigm of batch evaluation, criticized by the information science side for its neglect of the user. The information science side favors user studies, criticized by the information retrieval side for their scale and repeatability challenges. Our approach aims to satisfy the primary concerns of both sides.

Categories and Subject Descriptors

H.1.2 [Human Factors]: Human information processing.

H.3.3 [Information Systems]: Information Search and Retrieval - Information Filtering, Retrieval Models

H.5.2 [Information Systems]: Information Interfaces and Presentation - User Interfaces

General Terms

Design, Experimentation, Human Factors

Keywords

Information science, information retrieval, information seeking, evaluation, user studies

1. INTRODUCTION

Over the past few decades, a growing community of researchers has called for the information retrieval community to think outside the Cranfield box. Perhaps the most vocal advocate is Nick Belkin, whose "grand challenges" in his keynote at the 2008 European Conference on Information Retrieval [1] all pertained to the interactive nature of information seeking he claims the Cranfield approach neglects. Belkin cited similar calls to action going back as far as Karen Spärck Jones, in her 1988 acceptance speech for the Gerald Salton award [2], and again from Tefko Saracevic, when he received the same award in 1997 [3]. More recently, we have the Information Seeking and Retrieval research program proposed by Peter Ingwersen and Kalervo Järvelin in The Turn, published in 2005 [4].

2. IMPASSE BETWEEN IR AND IS

Given the advocacy of Belkin and others, why hasn't there been more progress? As Ellen Voorhees noted in defense of Cranfield at the 2006 Workshop on Adaptive Information Retrieval, "changing the abstraction slightly to include just a bit more characterization of the user will result in a dramatic loss of power or increase in cost of retrieval experiments" [5]. Despite user studies that have sought to challenge the Cranfield emphasis on batch information retrieval measures like mean average precision—such as those of Andrew Turpin and Bill Hersh [6]—the information retrieval community, on the whole, remains unconvinced by these experiments because they are smaller in scale and less repeatable than the TREC evaluations.

As Tefko Saracevic has said, there is a "battle royale" between the information retrieval community, which favors the Cranfield paradigm of batch evaluation despite its neglect of the user, and the information science community, which favors user studies despite their scale and repeatability challenges [7]. How do we move forward?

3. PRIMARY CONCERNS OF IR AND IS

Both sides have compelling arguments. If an evaluation procedure is not repeatable and cost-effective, it has little practical value. Nonetheless, it is essential that an evaluation procedure measure the interactive nature of information seeking.

If we are to find common ground to resolve this dispute, we need to satisfy the primary concerns of both sides:

· Real information seeking tasks are interstice, so the results of the evaluation procedure must be meaningful in an interactive context.

· The evaluation procedure must be repeatable and cost-effective.

In order to move beyond the battle royale and resolve the impasse between the IR and IS communities, we need to address both of these concerns.

4. PROPOSED APPROACH


A key point of contention in the battle royale is whether we should evaluate systems by studying individual users or measuring system performance against test collections.

The short answer is that we need to do both. In order to ground the results of evaluation in realistic contexts, we need to conduct user studies that relate proposed measures to success in interactive information seeking tasks. Otherwise, we optimize under the artificial constraint that a task involves only a single user query.

Such an approach presumes that we have a characterization of information seeking tasks. This characterization is an open problem that is beyond the scope of this position paper but has been addressed by other information seeking researchers, including Ingwersen and Järvelin [4]. We presume access to a set of tasks that, if not exhaustive, at least applies to a valuable subset of real information seeking problems.

Consider, as a concrete example, the task of a researcher who, given a comprehensive digital library of technical publications, wants to determine with confidence whether his or her idea is novel. In other words, the researcher want to either discover prior art that anticipates the idea, or to state with confidence that there is no such art. Patent inventors and lawyers performing e-discovery perform analogous tasks. We can measure task performance objectively as a combination of accuracy and efficiency, and we can also consider subject measures like user confidence and satisfaction. Let us assume that we are able to quantify a task success measure that incorporates these factors.

Given this task and success measure, we would like to know how well an information retrieval system supports the user performing it. As the information scientists correctly argue, user studies are indispensable. But, as we employ user studies to determine which systems are most helpful to users, we need to go a step further and correlate user success to one or more system measures. We can then evaluate these system measures in a repeatable, cost-effective process that does not require user involvement.

For example, let us hypothesize that mean average precision (MAP) on a given TREC collection is such a measure. We hypothesize that users pursuing the prior art search task are more successful using a system with higher MAP than those using a system with lower MAP. In order to test this hypothesis, we can present users with a family of systems that, insofar as possible, vary only in MAP, and see how well user success correlates to the system’s MAP. If the correlation is strong, then we validate the utility of MAP as a system measure and invest in evaluating systems using MAP against the specified collection in order to predict their utility for the prior art task.

The principle here is a general one, and can even be used not only to compare different algorithms, but also to evaluate more sophisticated interfaces, such as document clustering [8] or faceted search [9]. The only requirement is that we hypothesize and validate system measures that correlate to user success.

5. WEAKNESSES OF APPROACH

Our proposed approach has two major weaknesses.

The first weakness is that, in a realistic interactive information retrieval context, distinct queries are not independent. Rather, a typical user executes a sequence of queries in pursuit of an information need, each query informed by the results of the previous ones.

In a batch test, we must decide the query sequence in advance, and cannot model how the user’s queries depend on system response. Hence, we are limited to computing measures that can be evaluated for each query independently. Nonetheless, we can choose measures which correlate to effectiveness in realistic settings. Hopefully these measures are still meaningful, even when we remove the test queries from their realistic context.

The second challenge is that we do not envision a way to compare different interfaces in a batch setting. It seems that testing the relative merits of different interfaces requires real—or at least simulated—users.

If, however, we hold the interface constant, then we can define performance measures that apply to those interfaces. For example, we can develop standardized versions of well-studied interfaces, such as faceted search and clustering. We can then compare the performance of different systems that use these interfaces, e.g., different clustering algorithms.

6. AN ALTERNATIVE APPROACH

An alternative way to tackle the evaluation problem leverages the “human computation” approach championed by Luis Von Ahn [10]. This approach uses “games with a purpose” to motivate people to perform information-related tasks, such as image tagging and optical character recognition (OCR).

A particularly interesting "game" in our present context is Phetch, in which in which one or more "Seekers" compete to find an image based on a text description provided by a "Describer" [11]. The Describer’s goal is to help the Seekers succeed, while the Seekers compete with one another to find the target image within a fixed time limit, using search engine that has indexed the images based on tagging results from the ESP Game. In order to discourage a shotgun approach, the game penalizes Seekers for wrong guesses.

This game goes quite far in capturing the essence of interactive information retrieval. If we put aside the competition among the Seekers, then we see that an individual Seeker, aided by the human Describer and the algorithmic--but human indexed--search engine--is pursuing an information retrieval task. Moreover, the Seeker is incented to be both effective and efficient.

How can we leverage this framework for information retrieval evaluation? Even though the game envisions both Describers and Seekers to be human beings, there is no reason we cannot allow computers to play too--in either or both roles. Granted, the game, as currently designed, focuses on image retrieval without giving the human players direct access to the image tags, but we could imagine a framework that is more amenable to machine participation, e.g., providing a machine player with a set of tags derived from those in the index when that player is presented with an image. Alternatively, there may be a domain more suited than image retrieval to incorporating computer players.

The main appeal of the game framework is that it allows all participants to be judged based on an objective criterion that reflects the effectiveness and efficiency of the interactive information retrieval process. A good Describer should, on average, outscore a bad Describer over the long term; likewise, a good Seeker should outscore a bad one. We can even vary the search engine available to Seekers, in order to compare competing search engine algorithms or interfaces.

7. CONCLUSION

Our goal is ambitious: we aspire towards an evaluation framework that satisfies information scientists as relevant to real-world information seeking, but nonetheless offers the practicality of the Cranfield paradigm that dominates information retrieval. The near absence of collaboration between the information science and information retrieval communities has been a greatly missed opportunity not only for both researcher communities but also for the rest of the world who could benefit from practical advances in our understanding of information seeking. We hope that the approach we propose takes at least a small step towards resolving this battle royale.

8. REFERENCES

[1] Belkin, N. J., 2008. Some(What) Grand Challenges for Information Retrieval. ACM SIGIR Forum 42, 1 (June 2008), 47-54.

[2] Spärck Jones, K. 1988. A look back and a look forward. In: SIGIR ’88. In Proceedings of the 11th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval, 13-29.

[3] Saracevic, T. 1997. Users lost: reflections of the past, future and limits of information science. ACM SIGIR Forum 31, 2 (July 1997), 16-27.

[4] Ingwersen, P. and Järvelin, K. 2005. The turn. Integration of information seeking and retrieval in context. Springer.

[5] Voorhees, E. 2006. Building Test Collections for Adaptive Information Retrieval: What to Abstract for What cost? In First International Workshop on Adaptive Information Retrieval (AIR).

[6] Turpin, A. and Scholer, F. 2006. User performance versus precision measures for simple search tasks. In Proceedings
of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval
, 11-18.

[7] Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. Journal of the American Society for Information Science and Technology 58(3), 1915-1933.

[8] Cutting, D., Karger, D., Pedersen, J., and Tukey, J. 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings of the 15th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval, 318-329.

[9] Workshop on Faceted Search. 2006. In Proceedings of the 29th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval.

[10] Von Ahn, L. 2006. Games with a Purpose. IEEE Computer 39, 6 (June 2006), 92-94.

[11] Von Ahn, L., Ginosar, S., Kedia, M., Liu, R., and Blum, M. 2006. Improving accessibility of the web with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 79-82.


Sunday, June 29, 2008

Back from ISSS Workshop

My apologies for the sparsity of posts lately; it's been a busy week!

I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.
We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.

One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.
I'll let folks know as more information is released from the workshop.

Tuesday, June 24, 2008

What is (not) Exploratory Search?

One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.

In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).

But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.

Should we conclude then that exploratory search is, in fact, a fringe use case?

According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).
If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.

Let me offer the following characterization of non-exploratory search:
  • You know exactly what you want.
  • You know exactly how to ask for it.
  • You expect a search query to yield one of two responses:
    - Success: you are presented with the object of your search.
    - Failure: you learn that the object of your search is unavailable.
If any of these assumptions fails to hold, then the search problem is, to some extent, exploratory.

There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.

Tuesday, June 17, 2008

Information Retrieval Systems, 1896 - 1966

My colleague and Endeca co-founder Pete Bell just pointed me to a great post by Kevin Kelly about what may be the earliest implementation of a faceted navigation system. Like every good Endecan, I'm familiar with Ranganathan's struggle to sell the library world on colon classification. But it is still striking to see this struggle played out through technology artifacts from a pre-Internet world.

Tuesday, June 10, 2008

Seeking Opinions about Information Seeking

In a couple of weeks, I'll be participating in an invitational workshop sponsored by the National Science Foundation on Information Seeking Support Systems at the University of North Carolina - Chapel Hill. The participants are an impressive bunch--I feel like I'm the only person attending whom I've never heard of!

So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.

I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.

Thursday, June 5, 2008

HCIR '08

It's my pleasure to announce...

HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008

About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.

In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.

This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.

Keynote speaker: Susan Dumais, Microsoft Research

Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.

Possible topics include, but are not limited to:
  • Novel interaction techniques for information retrieval.
  • Modeling and evaluation of interactive information retrieval.
  • Exploratory search and information discovery.
  • Information visualization and visual analytics.
  • Applications of HCI techniques to information retrieval needs in specific domains.
  • Ethnography and user studies relevant to information retrieval and access.
  • Scale and efficiency considerations for interactive information retrieval systems.
  • Relevance feedback and active learning approaches for information retrieval.

Important Dates
  • Aug 22 - Papers/abstracts due
  • Sep 12 - Decisions to authors
  • Oct 3 - Final copy due for printing
  • Oct 23 - Workshop date
Contributions will be peer-reviewed by two members of the program committee. For information on paper submission, see http://research.microsoft.com/~ryenw/hcir2008/submit.html or contact cua-hcir2008@cua.edu.


Workshop Organization

Workshop chairs:
Program chair:
Program Committee:
Supporters

Monday, May 5, 2008

Saracevic on Relevance and Interaction

There is no Nobel Prize in computer science, despite computer science having done more than any other discipline in the past fifty years to change the world. Instead, there is the Turing Award, which serves as a Nobel Prize of computing.

But the Turing Award has never been given to anyone in information retrieval. Instead, there is the Gerald Salton Award, which serves as a Turing Award of information retrieval. Its recipients represent an A-list of information retrieval researchers.

Last week, I had the opportunity to talk with Salton Award recipient Tefko Saracevic. If you are not familiar with Saracevic, I suggest you take an hour to watch his 2007 lecture on "Relevance in information science".

I won't try to capture an hour of conversation in a blog post, but here are a few highlights:
  • We learn from philosophers, particularly Alfred Schütz, that we cannot reduce relevance to a single concept, but rather have to consider a system of interdependent relevancies, such as topical relevance, interpretational relevance, and motivational relevance.

  • When we talk about relevance measures, such as precision and recall, we evaluate results from the perspective of a user. But information retrieval approaches necessarily take a systems perspective, making assumptions about what people will want and encoding those assumptions in models and algorithms.

  • A major challenge in the information retrieval is that users--particularly web search users--often formulate queries that are ineffective, particularly because they are too short. Studies have shown that reference interviews can lead to improved retrieval effectiveness (typically through longer, more informative queries). He said that automated systems could help too, but he wasn't aware of any that had achieved traction.

  • A variety of factors affect interactive information retrieval, including task context, intent, expertise. Moreover, people react to certain relevance clues more than others, and more within some populations than others.
As I expected, I walked away with more questions than answers. But I did walk away reassured that my colleagues and I at Endeca , along with others in the HCIR community, are attacking the right problem: helping users formulate better queries.

I'd like to close with an anecdote that Saracevic recounts in his 2007 lecture. Bruce Croft had just delivered an information retrieval talk, and Nick Belkin raised the objection that users need to be incorporated into the study. Croft's conversation-ending response: "Tell us what to do, and we will do it."

We're halfway there. We've built interactive information retrieval systems, and we see from deployment after deployment that they work. Not that there isn't plenty of room for improvement, but the unmet challenge, as Ellen Voorhees makes clear, is evaluation. We need to address Nick Belkin's grand challenge and establish a paradigm suitable for evaluation of interactive IR systems.

Thursday, April 17, 2008

Ellen Voorhees defends Cranfield

I was extremely flattered to receive an email from Ellen Voorhees responding to my post about Nick Belkin's keynote. Then I was a little bit scared, since she is a strong advocate of the Cranfield tradition, and I braced myself for her rebuttal.

She pointed me to a talk she gave at the First International Workshop on Adaptive Information Retrieval (AIR) in 2006. I'd paraphrase her argument as follows: Nick and others (including me) are right to push for a paradigm that supports AIR research, but are being naïve regarding what is necessary for such research to deliver effective--and cost-effective--results. It's a strong case, and I'd be the first to concede that the advocates for AIR research have not (at least to my knowledge) produced a plausible abstract task that is amenable to efficient evaluation.

To quote Nick again, it's a grand challenge. And Ellen makes it clear that what we've learned so far is not encouraging.

Sunday, April 6, 2008

Nick Belkin at ECIR '08

Last week, I had the pleasure to attend the 30th European Conference on Information Retrieval, chaired by Iadh Ounis at the University of Glasgow. The conference was outstanding in several respects, not least of which was a keynote address by Nick Belkin, one the world's leading researchers on interactive information retrieval.

Nick's keynote, entitled "Some(what) Grand Challenges for Information Retrieval," was a full frontal attack on the Cranfield evaluation paradigm that has dominated IR research for the past half century. I am hoping to see his keynote published and posted online, but in the meantime here is a choice excerpt:
in accepting the [Gerald Salton] award at the 1997 SIGIR meeting, Tefko Saracevic stressed the significance of integrating research in information seeking behavior with research in IR system models and algorithms, saying: "if we consider that unlike art IR is not there for its own sake, that is, IR systems are researched and built to be used, then IR is far, far more than a branch of computer science, concerned primarily with issues of algorithms, computers, and computing."

...

Nevertheless, we can still see the dominance of the TREC (i.e. Cranfield) evaluation paradigm in most IR research, the inability of this paradigm to accommodate study of people in interaction with information systems (cf. the death of the TREC Interactive Track), and a dearth of research which integrates study of users’ goals, tasks and behaviors with research on models and methods which respond to results of such studies and supports those goals, tasks and behaviors.

This situation is especially striking for several reasons. First, it is clearly the case that IR as practiced is inherently interactive; secondly, it is clearly the case that the new models and associated representation and ranking techniques lead to only incremental (if that) improvement in performance over previous models and techniques, which is generally not statistically significant; and thirdly, that such improvement, as determined in TREC-style evaluation, rarely, if ever, leads to improved performance by human searchers in interactive IR systems.
Nick has long been critical of the IR community's neglect of users and interaction. But this keynote was significant for two reasons. First, the ECIR program committee's decision to invite a keynote speaker from the information science community acknowledges the need for collaboration between these two communities. Second, Nick reciprocated this overture by calling for interdisciplinary efforts to bridge the gap between the formal study of information retrieval and the practical understanding of information behavior. As an avid proponent of HCIR, I am heartily encouraged by steps like these.
Showing posts with label Library and Information Science. Show all posts
Showing posts with label Library and Information Science. Show all posts

Monday, September 8, 2008

Quick Bites: Taxonomy Directed Folksonomies

Props to Gwen Harris at Taxonomy Watch for posting a paper by Sarah Hayman and Nick Lothian on Taxonomy Directed Folksonomies.

The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.

I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.

Wednesday, August 27, 2008

Transparency in Information Retrieval

It's been hard to find time to write another post while keeping up with the comment stream on my previous post about set retrieval! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.

Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.

The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.

Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.

What does this have to do with set retrieval vs. ranked retrieval? Plenty!

Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.

The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.

In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.

But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.

Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.

If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.

But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?

To be continued...

Tuesday, July 8, 2008

Librarian 2.0

Many of the words that mark milestones in the history of technology, such as calculator and word processor, originally corresponded to people. Calculating had at least two lives as a technology breakthrough--first as a process, and then as a automatic means for executing that process. Thanks to inventions like calculators and computers, human beings have moved up the value chain to become scientists and engineers who take low-level details for granted.

Similarly, the advances in information science and retrieval have dramatically changed the role of a reference librarian.

Hopefully some of you old enough to remember card catalogs, They were certainly functional if you knew the exact title or author you were looking for, assuming the title wasn't too generic or author too prolific. Where card catalogs fell short was in supporting exploratory search. In many cases, your best bet was to quite literally explore the stacks and hope that locality within the Dewey Decimal system sufficed for to support your information seeking needs. Alternatively, you could follow citation paths--the dead-tree precursor of surfing a hypertext collection.

For exploratory tasks, library patrons would turn to reference librarians, who would clarify the patrons' needs through a process called the reference interview. According to Wikipedia:
A reference interview is composed of two segments:

1. An initial segment in which the librarian encourages the user to fully discuss the request.
2. A final segment in which the librarian asks questions to relate the request to the materials available in the library

A reference interview is structured (ideally) according to the following series of steps. First the library user states a question or describes a problem. The librarian then clarifies the user's information need, sometimes leading him or her back from a request for a specific resource (which may not be the best one for the problem at hand) to the actual information need as it manifests in the library user's life. Following that, the librarian suggests information resources that address the user's information need, explaining the nature and scope of information they contain and soliciting feedback. The reference interview closes when the librarian has provided the appropriate information or a referral to an outside resource where it can be found, and the user confirms that he or she has received the information needed.
Fast forward to the present day. Thanks to modern search engines, title and author search are no longer tedious processes. Moreover, search engines are somewhat forgiving of users, offering spelling correction and inexact query matching. Libraries are still catching up with advances in technology, but the evolution is clearly under way.

However, search engines have not obviated the need for a reference interview. Excepting the simple cases of known item search, the typical information seeker needs help translating an information need into one or more search queries. And that information need may change as the seeker learns from the process.

But it should come as no surprise that information seeking support systems need to be more than search engines. The ideal information seeking support system emulates a reference librarian, stepping users through a structured process of clarification. Indeed, this is exactly what my colleagues and I at Endeca are trying to do in our work with libraries and more broadly in pursuing a vision of human computer information retrieval.

What then becomes of librarians? Much as calculators and computers did not obviate the need for mathematicians, I don't see technology obviating the need for information scientists. Library schools have already evolved into information schools, and I have no doubt that their graduates will help establish the next generation of information seeking technology that makes today's search engines seem as quaint as card catalogs.

Sunday, July 6, 2008

Resolving the Battle Royale between Information Retrieval and Information Science

The following is the position paper I submitted to the NSF Information Seeking Support Systems Workshop last month. The workshop report is still being assembled, but I wanted to share my own contribution to the discussion, since it is particularly appropriate to the themes of The Noisy Channel.


Resolving the Battle Royale between Information Retrieval and Information Science


Daniel Tunkelang

Endeca

ABSTRACT

We propose an approach to help resolve the “battle royale” between the information retrieval and information science communities. The information retrieval side favors the Cranfield paradigm of batch evaluation, criticized by the information science side for its neglect of the user. The information science side favors user studies, criticized by the information retrieval side for their scale and repeatability challenges. Our approach aims to satisfy the primary concerns of both sides.

Categories and Subject Descriptors

H.1.2 [Human Factors]: Human information processing.

H.3.3 [Information Systems]: Information Search and Retrieval - Information Filtering, Retrieval Models

H.5.2 [Information Systems]: Information Interfaces and Presentation - User Interfaces

General Terms

Design, Experimentation, Human Factors

Keywords

Information science, information retrieval, information seeking, evaluation, user studies

1. INTRODUCTION

Over the past few decades, a growing community of researchers has called for the information retrieval community to think outside the Cranfield box. Perhaps the most vocal advocate is Nick Belkin, whose "grand challenges" in his keynote at the 2008 European Conference on Information Retrieval [1] all pertained to the interactive nature of information seeking he claims the Cranfield approach neglects. Belkin cited similar calls to action going back as far as Karen Spärck Jones, in her 1988 acceptance speech for the Gerald Salton award [2], and again from Tefko Saracevic, when he received the same award in 1997 [3]. More recently, we have the Information Seeking and Retrieval research program proposed by Peter Ingwersen and Kalervo Järvelin in The Turn, published in 2005 [4].

2. IMPASSE BETWEEN IR AND IS

Given the advocacy of Belkin and others, why hasn't there been more progress? As Ellen Voorhees noted in defense of Cranfield at the 2006 Workshop on Adaptive Information Retrieval, "changing the abstraction slightly to include just a bit more characterization of the user will result in a dramatic loss of power or increase in cost of retrieval experiments" [5]. Despite user studies that have sought to challenge the Cranfield emphasis on batch information retrieval measures like mean average precision—such as those of Andrew Turpin and Bill Hersh [6]—the information retrieval community, on the whole, remains unconvinced by these experiments because they are smaller in scale and less repeatable than the TREC evaluations.

As Tefko Saracevic has said, there is a "battle royale" between the information retrieval community, which favors the Cranfield paradigm of batch evaluation despite its neglect of the user, and the information science community, which favors user studies despite their scale and repeatability challenges [7]. How do we move forward?

3. PRIMARY CONCERNS OF IR AND IS

Both sides have compelling arguments. If an evaluation procedure is not repeatable and cost-effective, it has little practical value. Nonetheless, it is essential that an evaluation procedure measure the interactive nature of information seeking.

If we are to find common ground to resolve this dispute, we need to satisfy the primary concerns of both sides:

· Real information seeking tasks are interstice, so the results of the evaluation procedure must be meaningful in an interactive context.

· The evaluation procedure must be repeatable and cost-effective.

In order to move beyond the battle royale and resolve the impasse between the IR and IS communities, we need to address both of these concerns.

4. PROPOSED APPROACH


A key point of contention in the battle royale is whether we should evaluate systems by studying individual users or measuring system performance against test collections.

The short answer is that we need to do both. In order to ground the results of evaluation in realistic contexts, we need to conduct user studies that relate proposed measures to success in interactive information seeking tasks. Otherwise, we optimize under the artificial constraint that a task involves only a single user query.

Such an approach presumes that we have a characterization of information seeking tasks. This characterization is an open problem that is beyond the scope of this position paper but has been addressed by other information seeking researchers, including Ingwersen and Järvelin [4]. We presume access to a set of tasks that, if not exhaustive, at least applies to a valuable subset of real information seeking problems.

Consider, as a concrete example, the task of a researcher who, given a comprehensive digital library of technical publications, wants to determine with confidence whether his or her idea is novel. In other words, the researcher want to either discover prior art that anticipates the idea, or to state with confidence that there is no such art. Patent inventors and lawyers performing e-discovery perform analogous tasks. We can measure task performance objectively as a combination of accuracy and efficiency, and we can also consider subject measures like user confidence and satisfaction. Let us assume that we are able to quantify a task success measure that incorporates these factors.

Given this task and success measure, we would like to know how well an information retrieval system supports the user performing it. As the information scientists correctly argue, user studies are indispensable. But, as we employ user studies to determine which systems are most helpful to users, we need to go a step further and correlate user success to one or more system measures. We can then evaluate these system measures in a repeatable, cost-effective process that does not require user involvement.

For example, let us hypothesize that mean average precision (MAP) on a given TREC collection is such a measure. We hypothesize that users pursuing the prior art search task are more successful using a system with higher MAP than those using a system with lower MAP. In order to test this hypothesis, we can present users with a family of systems that, insofar as possible, vary only in MAP, and see how well user success correlates to the system’s MAP. If the correlation is strong, then we validate the utility of MAP as a system measure and invest in evaluating systems using MAP against the specified collection in order to predict their utility for the prior art task.

The principle here is a general one, and can even be used not only to compare different algorithms, but also to evaluate more sophisticated interfaces, such as document clustering [8] or faceted search [9]. The only requirement is that we hypothesize and validate system measures that correlate to user success.

5. WEAKNESSES OF APPROACH

Our proposed approach has two major weaknesses.

The first weakness is that, in a realistic interactive information retrieval context, distinct queries are not independent. Rather, a typical user executes a sequence of queries in pursuit of an information need, each query informed by the results of the previous ones.

In a batch test, we must decide the query sequence in advance, and cannot model how the user’s queries depend on system response. Hence, we are limited to computing measures that can be evaluated for each query independently. Nonetheless, we can choose measures which correlate to effectiveness in realistic settings. Hopefully these measures are still meaningful, even when we remove the test queries from their realistic context.

The second challenge is that we do not envision a way to compare different interfaces in a batch setting. It seems that testing the relative merits of different interfaces requires real—or at least simulated—users.

If, however, we hold the interface constant, then we can define performance measures that apply to those interfaces. For example, we can develop standardized versions of well-studied interfaces, such as faceted search and clustering. We can then compare the performance of different systems that use these interfaces, e.g., different clustering algorithms.

6. AN ALTERNATIVE APPROACH

An alternative way to tackle the evaluation problem leverages the “human computation” approach championed by Luis Von Ahn [10]. This approach uses “games with a purpose” to motivate people to perform information-related tasks, such as image tagging and optical character recognition (OCR).

A particularly interesting "game" in our present context is Phetch, in which in which one or more "Seekers" compete to find an image based on a text description provided by a "Describer" [11]. The Describer’s goal is to help the Seekers succeed, while the Seekers compete with one another to find the target image within a fixed time limit, using search engine that has indexed the images based on tagging results from the ESP Game. In order to discourage a shotgun approach, the game penalizes Seekers for wrong guesses.

This game goes quite far in capturing the essence of interactive information retrieval. If we put aside the competition among the Seekers, then we see that an individual Seeker, aided by the human Describer and the algorithmic--but human indexed--search engine--is pursuing an information retrieval task. Moreover, the Seeker is incented to be both effective and efficient.

How can we leverage this framework for information retrieval evaluation? Even though the game envisions both Describers and Seekers to be human beings, there is no reason we cannot allow computers to play too--in either or both roles. Granted, the game, as currently designed, focuses on image retrieval without giving the human players direct access to the image tags, but we could imagine a framework that is more amenable to machine participation, e.g., providing a machine player with a set of tags derived from those in the index when that player is presented with an image. Alternatively, there may be a domain more suited than image retrieval to incorporating computer players.

The main appeal of the game framework is that it allows all participants to be judged based on an objective criterion that reflects the effectiveness and efficiency of the interactive information retrieval process. A good Describer should, on average, outscore a bad Describer over the long term; likewise, a good Seeker should outscore a bad one. We can even vary the search engine available to Seekers, in order to compare competing search engine algorithms or interfaces.

7. CONCLUSION

Our goal is ambitious: we aspire towards an evaluation framework that satisfies information scientists as relevant to real-world information seeking, but nonetheless offers the practicality of the Cranfield paradigm that dominates information retrieval. The near absence of collaboration between the information science and information retrieval communities has been a greatly missed opportunity not only for both researcher communities but also for the rest of the world who could benefit from practical advances in our understanding of information seeking. We hope that the approach we propose takes at least a small step towards resolving this battle royale.

8. REFERENCES

[1] Belkin, N. J., 2008. Some(What) Grand Challenges for Information Retrieval. ACM SIGIR Forum 42, 1 (June 2008), 47-54.

[2] Spärck Jones, K. 1988. A look back and a look forward. In: SIGIR ’88. In Proceedings of the 11th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval, 13-29.

[3] Saracevic, T. 1997. Users lost: reflections of the past, future and limits of information science. ACM SIGIR Forum 31, 2 (July 1997), 16-27.

[4] Ingwersen, P. and Järvelin, K. 2005. The turn. Integration of information seeking and retrieval in context. Springer.

[5] Voorhees, E. 2006. Building Test Collections for Adaptive Information Retrieval: What to Abstract for What cost? In First International Workshop on Adaptive Information Retrieval (AIR).

[6] Turpin, A. and Scholer, F. 2006. User performance versus precision measures for simple search tasks. In Proceedings
of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval
, 11-18.

[7] Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. Journal of the American Society for Information Science and Technology 58(3), 1915-1933.

[8] Cutting, D., Karger, D., Pedersen, J., and Tukey, J. 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings of the 15th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval, 318-329.

[9] Workshop on Faceted Search. 2006. In Proceedings of the 29th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval.

[10] Von Ahn, L. 2006. Games with a Purpose. IEEE Computer 39, 6 (June 2006), 92-94.

[11] Von Ahn, L., Ginosar, S., Kedia, M., Liu, R., and Blum, M. 2006. Improving accessibility of the web with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 79-82.


Sunday, June 29, 2008

Back from ISSS Workshop

My apologies for the sparsity of posts lately; it's been a busy week!

I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.
We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.

One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.
I'll let folks know as more information is released from the workshop.

Tuesday, June 24, 2008

What is (not) Exploratory Search?

One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.

In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).

But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.

Should we conclude then that exploratory search is, in fact, a fringe use case?

According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).
If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.

Let me offer the following characterization of non-exploratory search:
  • You know exactly what you want.
  • You know exactly how to ask for it.
  • You expect a search query to yield one of two responses:
    - Success: you are presented with the object of your search.
    - Failure: you learn that the object of your search is unavailable.
If any of these assumptions fails to hold, then the search problem is, to some extent, exploratory.

There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.

Tuesday, June 17, 2008

Information Retrieval Systems, 1896 - 1966

My colleague and Endeca co-founder Pete Bell just pointed me to a great post by Kevin Kelly about what may be the earliest implementation of a faceted navigation system. Like every good Endecan, I'm familiar with Ranganathan's struggle to sell the library world on colon classification. But it is still striking to see this struggle played out through technology artifacts from a pre-Internet world.

Tuesday, June 10, 2008

Seeking Opinions about Information Seeking

In a couple of weeks, I'll be participating in an invitational workshop sponsored by the National Science Foundation on Information Seeking Support Systems at the University of North Carolina - Chapel Hill. The participants are an impressive bunch--I feel like I'm the only person attending whom I've never heard of!

So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.

I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.

Thursday, June 5, 2008

HCIR '08

It's my pleasure to announce...

HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008

About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.

In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.

This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.

Keynote speaker: Susan Dumais, Microsoft Research

Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.

Possible topics include, but are not limited to:
  • Novel interaction techniques for information retrieval.
  • Modeling and evaluation of interactive information retrieval.
  • Exploratory search and information discovery.
  • Information visualization and visual analytics.
  • Applications of HCI techniques to information retrieval needs in specific domains.
  • Ethnography and user studies relevant to information retrieval and access.
  • Scale and efficiency considerations for interactive information retrieval systems.
  • Relevance feedback and active learning approaches for information retrieval.

Important Dates
  • Aug 22 - Papers/abstracts due
  • Sep 12 - Decisions to authors
  • Oct 3 - Final copy due for printing
  • Oct 23 - Workshop date
Contributions will be peer-reviewed by two members of the program committee. For information on paper submission, see http://research.microsoft.com/~ryenw/hcir2008/submit.html or contact cua-hcir2008@cua.edu.


Workshop Organization

Workshop chairs:
Program chair:
Program Committee:
Supporters

Monday, May 5, 2008

Saracevic on Relevance and Interaction

There is no Nobel Prize in computer science, despite computer science having done more than any other discipline in the past fifty years to change the world. Instead, there is the Turing Award, which serves as a Nobel Prize of computing.

But the Turing Award has never been given to anyone in information retrieval. Instead, there is the Gerald Salton Award, which serves as a Turing Award of information retrieval. Its recipients represent an A-list of information retrieval researchers.

Last week, I had the opportunity to talk with Salton Award recipient Tefko Saracevic. If you are not familiar with Saracevic, I suggest you take an hour to watch his 2007 lecture on "Relevance in information science".

I won't try to capture an hour of conversation in a blog post, but here are a few highlights:
  • We learn from philosophers, particularly Alfred Schütz, that we cannot reduce relevance to a single concept, but rather have to consider a system of interdependent relevancies, such as topical relevance, interpretational relevance, and motivational relevance.

  • When we talk about relevance measures, such as precision and recall, we evaluate results from the perspective of a user. But information retrieval approaches necessarily take a systems perspective, making assumptions about what people will want and encoding those assumptions in models and algorithms.

  • A major challenge in the information retrieval is that users--particularly web search users--often formulate queries that are ineffective, particularly because they are too short. Studies have shown that reference interviews can lead to improved retrieval effectiveness (typically through longer, more informative queries). He said that automated systems could help too, but he wasn't aware of any that had achieved traction.

  • A variety of factors affect interactive information retrieval, including task context, intent, expertise. Moreover, people react to certain relevance clues more than others, and more within some populations than others.
As I expected, I walked away with more questions than answers. But I did walk away reassured that my colleagues and I at Endeca , along with others in the HCIR community, are attacking the right problem: helping users formulate better queries.

I'd like to close with an anecdote that Saracevic recounts in his 2007 lecture. Bruce Croft had just delivered an information retrieval talk, and Nick Belkin raised the objection that users need to be incorporated into the study. Croft's conversation-ending response: "Tell us what to do, and we will do it."

We're halfway there. We've built interactive information retrieval systems, and we see from deployment after deployment that they work. Not that there isn't plenty of room for improvement, but the unmet challenge, as Ellen Voorhees makes clear, is evaluation. We need to address Nick Belkin's grand challenge and establish a paradigm suitable for evaluation of interactive IR systems.

Thursday, April 17, 2008

Ellen Voorhees defends Cranfield

I was extremely flattered to receive an email from Ellen Voorhees responding to my post about Nick Belkin's keynote. Then I was a little bit scared, since she is a strong advocate of the Cranfield tradition, and I braced myself for her rebuttal.

She pointed me to a talk she gave at the First International Workshop on Adaptive Information Retrieval (AIR) in 2006. I'd paraphrase her argument as follows: Nick and others (including me) are right to push for a paradigm that supports AIR research, but are being naïve regarding what is necessary for such research to deliver effective--and cost-effective--results. It's a strong case, and I'd be the first to concede that the advocates for AIR research have not (at least to my knowledge) produced a plausible abstract task that is amenable to efficient evaluation.

To quote Nick again, it's a grand challenge. And Ellen makes it clear that what we've learned so far is not encouraging.

Sunday, April 6, 2008

Nick Belkin at ECIR '08

Last week, I had the pleasure to attend the 30th European Conference on Information Retrieval, chaired by Iadh Ounis at the University of Glasgow. The conference was outstanding in several respects, not least of which was a keynote address by Nick Belkin, one the world's leading researchers on interactive information retrieval.

Nick's keynote, entitled "Some(what) Grand Challenges for Information Retrieval," was a full frontal attack on the Cranfield evaluation paradigm that has dominated IR research for the past half century. I am hoping to see his keynote published and posted online, but in the meantime here is a choice excerpt:
in accepting the [Gerald Salton] award at the 1997 SIGIR meeting, Tefko Saracevic stressed the significance of integrating research in information seeking behavior with research in IR system models and algorithms, saying: "if we consider that unlike art IR is not there for its own sake, that is, IR systems are researched and built to be used, then IR is far, far more than a branch of computer science, concerned primarily with issues of algorithms, computers, and computing."

...

Nevertheless, we can still see the dominance of the TREC (i.e. Cranfield) evaluation paradigm in most IR research, the inability of this paradigm to accommodate study of people in interaction with information systems (cf. the death of the TREC Interactive Track), and a dearth of research which integrates study of users’ goals, tasks and behaviors with research on models and methods which respond to results of such studies and supports those goals, tasks and behaviors.

This situation is especially striking for several reasons. First, it is clearly the case that IR as practiced is inherently interactive; secondly, it is clearly the case that the new models and associated representation and ranking techniques lead to only incremental (if that) improvement in performance over previous models and techniques, which is generally not statistically significant; and thirdly, that such improvement, as determined in TREC-style evaluation, rarely, if ever, leads to improved performance by human searchers in interactive IR systems.
Nick has long been critical of the IR community's neglect of users and interaction. But this keynote was significant for two reasons. First, the ECIR program committee's decision to invite a keynote speaker from the information science community acknowledges the need for collaboration between these two communities. Second, Nick reciprocated this overture by calling for interdisciplinary efforts to bridge the gap between the formal study of information retrieval and the practical understanding of information behavior. As an avid proponent of HCIR, I am heartily encouraged by steps like these.
Showing posts with label Library and Information Science. Show all posts
Showing posts with label Library and Information Science. Show all posts

Monday, September 8, 2008

Quick Bites: Taxonomy Directed Folksonomies

Props to Gwen Harris at Taxonomy Watch for posting a paper by Sarah Hayman and Nick Lothian on Taxonomy Directed Folksonomies.

The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.

I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.

Wednesday, August 27, 2008

Transparency in Information Retrieval

It's been hard to find time to write another post while keeping up with the comment stream on my previous post about set retrieval! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.

Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.

The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.

Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.

What does this have to do with set retrieval vs. ranked retrieval? Plenty!

Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.

The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.

In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.

But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.

Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.

If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.

But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?

To be continued...

Tuesday, July 8, 2008

Librarian 2.0

Many of the words that mark milestones in the history of technology, such as calculator and word processor, originally corresponded to people. Calculating had at least two lives as a technology breakthrough--first as a process, and then as a automatic means for executing that process. Thanks to inventions like calculators and computers, human beings have moved up the value chain to become scientists and engineers who take low-level details for granted.

Similarly, the advances in information science and retrieval have dramatically changed the role of a reference librarian.

Hopefully some of you old enough to remember card catalogs, They were certainly functional if you knew the exact title or author you were looking for, assuming the title wasn't too generic or author too prolific. Where card catalogs fell short was in supporting exploratory search. In many cases, your best bet was to quite literally explore the stacks and hope that locality within the Dewey Decimal system sufficed for to support your information seeking needs. Alternatively, you could follow citation paths--the dead-tree precursor of surfing a hypertext collection.

For exploratory tasks, library patrons would turn to reference librarians, who would clarify the patrons' needs through a process called the reference interview. According to Wikipedia:
A reference interview is composed of two segments:

1. An initial segment in which the librarian encourages the user to fully discuss the request.
2. A final segment in which the librarian asks questions to relate the request to the materials available in the library

A reference interview is structured (ideally) according to the following series of steps. First the library user states a question or describes a problem. The librarian then clarifies the user's information need, sometimes leading him or her back from a request for a specific resource (which may not be the best one for the problem at hand) to the actual information need as it manifests in the library user's life. Following that, the librarian suggests information resources that address the user's information need, explaining the nature and scope of information they contain and soliciting feedback. The reference interview closes when the librarian has provided the appropriate information or a referral to an outside resource where it can be found, and the user confirms that he or she has received the information needed.
Fast forward to the present day. Thanks to modern search engines, title and author search are no longer tedious processes. Moreover, search engines are somewhat forgiving of users, offering spelling correction and inexact query matching. Libraries are still catching up with advances in technology, but the evolution is clearly under way.

However, search engines have not obviated the need for a reference interview. Excepting the simple cases of known item search, the typical information seeker needs help translating an information need into one or more search queries. And that information need may change as the seeker learns from the process.

But it should come as no surprise that information seeking support systems need to be more than search engines. The ideal information seeking support system emulates a reference librarian, stepping users through a structured process of clarification. Indeed, this is exactly what my colleagues and I at Endeca are trying to do in our work with libraries and more broadly in pursuing a vision of human computer information retrieval.

What then becomes of librarians? Much as calculators and computers did not obviate the need for mathematicians, I don't see technology obviating the need for information scientists. Library schools have already evolved into information schools, and I have no doubt that their graduates will help establish the next generation of information seeking technology that makes today's search engines seem as quaint as card catalogs.

Sunday, July 6, 2008

Resolving the Battle Royale between Information Retrieval and Information Science

The following is the position paper I submitted to the NSF Information Seeking Support Systems Workshop last month. The workshop report is still being assembled, but I wanted to share my own contribution to the discussion, since it is particularly appropriate to the themes of The Noisy Channel.


Resolving the Battle Royale between Information Retrieval and Information Science


Daniel Tunkelang

Endeca

ABSTRACT

We propose an approach to help resolve the “battle royale” between the information retrieval and information science communities. The information retrieval side favors the Cranfield paradigm of batch evaluation, criticized by the information science side for its neglect of the user. The information science side favors user studies, criticized by the information retrieval side for their scale and repeatability challenges. Our approach aims to satisfy the primary concerns of both sides.

Categories and Subject Descriptors

H.1.2 [Human Factors]: Human information processing.

H.3.3 [Information Systems]: Information Search and Retrieval - Information Filtering, Retrieval Models

H.5.2 [Information Systems]: Information Interfaces and Presentation - User Interfaces

General Terms

Design, Experimentation, Human Factors

Keywords

Information science, information retrieval, information seeking, evaluation, user studies

1. INTRODUCTION

Over the past few decades, a growing community of researchers has called for the information retrieval community to think outside the Cranfield box. Perhaps the most vocal advocate is Nick Belkin, whose "grand challenges" in his keynote at the 2008 European Conference on Information Retrieval [1] all pertained to the interactive nature of information seeking he claims the Cranfield approach neglects. Belkin cited similar calls to action going back as far as Karen Spärck Jones, in her 1988 acceptance speech for the Gerald Salton award [2], and again from Tefko Saracevic, when he received the same award in 1997 [3]. More recently, we have the Information Seeking and Retrieval research program proposed by Peter Ingwersen and Kalervo Järvelin in The Turn, published in 2005 [4].

2. IMPASSE BETWEEN IR AND IS

Given the advocacy of Belkin and others, why hasn't there been more progress? As Ellen Voorhees noted in defense of Cranfield at the 2006 Workshop on Adaptive Information Retrieval, "changing the abstraction slightly to include just a bit more characterization of the user will result in a dramatic loss of power or increase in cost of retrieval experiments" [5]. Despite user studies that have sought to challenge the Cranfield emphasis on batch information retrieval measures like mean average precision—such as those of Andrew Turpin and Bill Hersh [6]—the information retrieval community, on the whole, remains unconvinced by these experiments because they are smaller in scale and less repeatable than the TREC evaluations.

As Tefko Saracevic has said, there is a "battle royale" between the information retrieval community, which favors the Cranfield paradigm of batch evaluation despite its neglect of the user, and the information science community, which favors user studies despite their scale and repeatability challenges [7]. How do we move forward?

3. PRIMARY CONCERNS OF IR AND IS

Both sides have compelling arguments. If an evaluation procedure is not repeatable and cost-effective, it has little practical value. Nonetheless, it is essential that an evaluation procedure measure the interactive nature of information seeking.

If we are to find common ground to resolve this dispute, we need to satisfy the primary concerns of both sides:

· Real information seeking tasks are interstice, so the results of the evaluation procedure must be meaningful in an interactive context.

· The evaluation procedure must be repeatable and cost-effective.

In order to move beyond the battle royale and resolve the impasse between the IR and IS communities, we need to address both of these concerns.

4. PROPOSED APPROACH


A key point of contention in the battle royale is whether we should evaluate systems by studying individual users or measuring system performance against test collections.

The short answer is that we need to do both. In order to ground the results of evaluation in realistic contexts, we need to conduct user studies that relate proposed measures to success in interactive information seeking tasks. Otherwise, we optimize under the artificial constraint that a task involves only a single user query.

Such an approach presumes that we have a characterization of information seeking tasks. This characterization is an open problem that is beyond the scope of this position paper but has been addressed by other information seeking researchers, including Ingwersen and Järvelin [4]. We presume access to a set of tasks that, if not exhaustive, at least applies to a valuable subset of real information seeking problems.

Consider, as a concrete example, the task of a researcher who, given a comprehensive digital library of technical publications, wants to determine with confidence whether his or her idea is novel. In other words, the researcher want to either discover prior art that anticipates the idea, or to state with confidence that there is no such art. Patent inventors and lawyers performing e-discovery perform analogous tasks. We can measure task performance objectively as a combination of accuracy and efficiency, and we can also consider subject measures like user confidence and satisfaction. Let us assume that we are able to quantify a task success measure that incorporates these factors.

Given this task and success measure, we would like to know how well an information retrieval system supports the user performing it. As the information scientists correctly argue, user studies are indispensable. But, as we employ user studies to determine which systems are most helpful to users, we need to go a step further and correlate user success to one or more system measures. We can then evaluate these system measures in a repeatable, cost-effective process that does not require user involvement.

For example, let us hypothesize that mean average precision (MAP) on a given TREC collection is such a measure. We hypothesize that users pursuing the prior art search task are more successful using a system with higher MAP than those using a system with lower MAP. In order to test this hypothesis, we can present users with a family of systems that, insofar as possible, vary only in MAP, and see how well user success correlates to the system’s MAP. If the correlation is strong, then we validate the utility of MAP as a system measure and invest in evaluating systems using MAP against the specified collection in order to predict their utility for the prior art task.

The principle here is a general one, and can even be used not only to compare different algorithms, but also to evaluate more sophisticated interfaces, such as document clustering [8] or faceted search [9]. The only requirement is that we hypothesize and validate system measures that correlate to user success.

5. WEAKNESSES OF APPROACH

Our proposed approach has two major weaknesses.

The first weakness is that, in a realistic interactive information retrieval context, distinct queries are not independent. Rather, a typical user executes a sequence of queries in pursuit of an information need, each query informed by the results of the previous ones.

In a batch test, we must decide the query sequence in advance, and cannot model how the user’s queries depend on system response. Hence, we are limited to computing measures that can be evaluated for each query independently. Nonetheless, we can choose measures which correlate to effectiveness in realistic settings. Hopefully these measures are still meaningful, even when we remove the test queries from their realistic context.

The second challenge is that we do not envision a way to compare different interfaces in a batch setting. It seems that testing the relative merits of different interfaces requires real—or at least simulated—users.

If, however, we hold the interface constant, then we can define performance measures that apply to those interfaces. For example, we can develop standardized versions of well-studied interfaces, such as faceted search and clustering. We can then compare the performance of different systems that use these interfaces, e.g., different clustering algorithms.

6. AN ALTERNATIVE APPROACH

An alternative way to tackle the evaluation problem leverages the “human computation” approach championed by Luis Von Ahn [10]. This approach uses “games with a purpose” to motivate people to perform information-related tasks, such as image tagging and optical character recognition (OCR).

A particularly interesting "game" in our present context is Phetch, in which in which one or more "Seekers" compete to find an image based on a text description provided by a "Describer" [11]. The Describer’s goal is to help the Seekers succeed, while the Seekers compete with one another to find the target image within a fixed time limit, using search engine that has indexed the images based on tagging results from the ESP Game. In order to discourage a shotgun approach, the game penalizes Seekers for wrong guesses.

This game goes quite far in capturing the essence of interactive information retrieval. If we put aside the competition among the Seekers, then we see that an individual Seeker, aided by the human Describer and the algorithmic--but human indexed--search engine--is pursuing an information retrieval task. Moreover, the Seeker is incented to be both effective and efficient.

How can we leverage this framework for information retrieval evaluation? Even though the game envisions both Describers and Seekers to be human beings, there is no reason we cannot allow computers to play too--in either or both roles. Granted, the game, as currently designed, focuses on image retrieval without giving the human players direct access to the image tags, but we could imagine a framework that is more amenable to machine participation, e.g., providing a machine player with a set of tags derived from those in the index when that player is presented with an image. Alternatively, there may be a domain more suited than image retrieval to incorporating computer players.

The main appeal of the game framework is that it allows all participants to be judged based on an objective criterion that reflects the effectiveness and efficiency of the interactive information retrieval process. A good Describer should, on average, outscore a bad Describer over the long term; likewise, a good Seeker should outscore a bad one. We can even vary the search engine available to Seekers, in order to compare competing search engine algorithms or interfaces.

7. CONCLUSION

Our goal is ambitious: we aspire towards an evaluation framework that satisfies information scientists as relevant to real-world information seeking, but nonetheless offers the practicality of the Cranfield paradigm that dominates information retrieval. The near absence of collaboration between the information science and information retrieval communities has been a greatly missed opportunity not only for both researcher communities but also for the rest of the world who could benefit from practical advances in our understanding of information seeking. We hope that the approach we propose takes at least a small step towards resolving this battle royale.

8. REFERENCES

[1] Belkin, N. J., 2008. Some(What) Grand Challenges for Information Retrieval. ACM SIGIR Forum 42, 1 (June 2008), 47-54.

[2] Spärck Jones, K. 1988. A look back and a look forward. In: SIGIR ’88. In Proceedings of the 11th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval, 13-29.

[3] Saracevic, T. 1997. Users lost: reflections of the past, future and limits of information science. ACM SIGIR Forum 31, 2 (July 1997), 16-27.

[4] Ingwersen, P. and Järvelin, K. 2005. The turn. Integration of information seeking and retrieval in context. Springer.

[5] Voorhees, E. 2006. Building Test Collections for Adaptive Information Retrieval: What to Abstract for What cost? In First International Workshop on Adaptive Information Retrieval (AIR).

[6] Turpin, A. and Scholer, F. 2006. User performance versus precision measures for simple search tasks. In Proceedings
of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval
, 11-18.

[7] Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. Journal of the American Society for Information Science and Technology 58(3), 1915-1933.

[8] Cutting, D., Karger, D., Pedersen, J., and Tukey, J. 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings of the 15th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval, 318-329.

[9] Workshop on Faceted Search. 2006. In Proceedings of the 29th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval.

[10] Von Ahn, L. 2006. Games with a Purpose. IEEE Computer 39, 6 (June 2006), 92-94.

[11] Von Ahn, L., Ginosar, S., Kedia, M., Liu, R., and Blum, M. 2006. Improving accessibility of the web with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 79-82.


Sunday, June 29, 2008

Back from ISSS Workshop

My apologies for the sparsity of posts lately; it's been a busy week!

I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.
We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.

One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.
I'll let folks know as more information is released from the workshop.

Tuesday, June 24, 2008

What is (not) Exploratory Search?

One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.

In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).

But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.

Should we conclude then that exploratory search is, in fact, a fringe use case?

According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).
If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.

Let me offer the following characterization of non-exploratory search:
  • You know exactly what you want.
  • You know exactly how to ask for it.
  • You expect a search query to yield one of two responses:
    - Success: you are presented with the object of your search.
    - Failure: you learn that the object of your search is unavailable.
If any of these assumptions fails to hold, then the search problem is, to some extent, exploratory.

There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.

Tuesday, June 17, 2008

Information Retrieval Systems, 1896 - 1966

My colleague and Endeca co-founder Pete Bell just pointed me to a great post by Kevin Kelly about what may be the earliest implementation of a faceted navigation system. Like every good Endecan, I'm familiar with Ranganathan's struggle to sell the library world on colon classification. But it is still striking to see this struggle played out through technology artifacts from a pre-Internet world.

Tuesday, June 10, 2008

Seeking Opinions about Information Seeking

In a couple of weeks, I'll be participating in an invitational workshop sponsored by the National Science Foundation on Information Seeking Support Systems at the University of North Carolina - Chapel Hill. The participants are an impressive bunch--I feel like I'm the only person attending whom I've never heard of!

So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.

I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.

Thursday, June 5, 2008

HCIR '08

It's my pleasure to announce...

HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008

About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.

In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.

This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.

Keynote speaker: Susan Dumais, Microsoft Research

Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.

Possible topics include, but are not limited to:
  • Novel interaction techniques for information retrieval.
  • Modeling and evaluation of interactive information retrieval.
  • Exploratory search and information discovery.
  • Information visualization and visual analytics.
  • Applications of HCI techniques to information retrieval needs in specific domains.
  • Ethnography and user studies relevant to information retrieval and access.
  • Scale and efficiency considerations for interactive information retrieval systems.
  • Relevance feedback and active learning approaches for information retrieval.

Important Dates
  • Aug 22 - Papers/abstracts due
  • Sep 12 - Decisions to authors
  • Oct 3 - Final copy due for printing
  • Oct 23 - Workshop date
Contributions will be peer-reviewed by two members of the program committee. For information on paper submission, see http://research.microsoft.com/~ryenw/hcir2008/submit.html or contact cua-hcir2008@cua.edu.


Workshop Organization

Workshop chairs:
Program chair:
Program Committee:
Supporters

Monday, May 5, 2008

Saracevic on Relevance and Interaction

There is no Nobel Prize in computer science, despite computer science having done more than any other discipline in the past fifty years to change the world. Instead, there is the Turing Award, which serves as a Nobel Prize of computing.

But the Turing Award has never been given to anyone in information retrieval. Instead, there is the Gerald Salton Award, which serves as a Turing Award of information retrieval. Its recipients represent an A-list of information retrieval researchers.

Last week, I had the opportunity to talk with Salton Award recipient Tefko Saracevic. If you are not familiar with Saracevic, I suggest you take an hour to watch his 2007 lecture on "Relevance in information science".

I won't try to capture an hour of conversation in a blog post, but here are a few highlights:
  • We learn from philosophers, particularly Alfred Schütz, that we cannot reduce relevance to a single concept, but rather have to consider a system of interdependent relevancies, such as topical relevance, interpretational relevance, and motivational relevance.

  • When we talk about relevance measures, such as precision and recall, we evaluate results from the perspective of a user. But information retrieval approaches necessarily take a systems perspective, making assumptions about what people will want and encoding those assumptions in models and algorithms.

  • A major challenge in the information retrieval is that users--particularly web search users--often formulate queries that are ineffective, particularly because they are too short. Studies have shown that reference interviews can lead to improved retrieval effectiveness (typically through longer, more informative queries). He said that automated systems could help too, but he wasn't aware of any that had achieved traction.

  • A variety of factors affect interactive information retrieval, including task context, intent, expertise. Moreover, people react to certain relevance clues more than others, and more within some populations than others.
As I expected, I walked away with more questions than answers. But I did walk away reassured that my colleagues and I at Endeca , along with others in the HCIR community, are attacking the right problem: helping users formulate better queries.

I'd like to close with an anecdote that Saracevic recounts in his 2007 lecture. Bruce Croft had just delivered an information retrieval talk, and Nick Belkin raised the objection that users need to be incorporated into the study. Croft's conversation-ending response: "Tell us what to do, and we will do it."

We're halfway there. We've built interactive information retrieval systems, and we see from deployment after deployment that they work. Not that there isn't plenty of room for improvement, but the unmet challenge, as Ellen Voorhees makes clear, is evaluation. We need to address Nick Belkin's grand challenge and establish a paradigm suitable for evaluation of interactive IR systems.

Thursday, April 17, 2008

Ellen Voorhees defends Cranfield

I was extremely flattered to receive an email from Ellen Voorhees responding to my post about Nick Belkin's keynote. Then I was a little bit scared, since she is a strong advocate of the Cranfield tradition, and I braced myself for her rebuttal.

She pointed me to a talk she gave at the First International Workshop on Adaptive Information Retrieval (AIR) in 2006. I'd paraphrase her argument as follows: Nick and others (including me) are right to push for a paradigm that supports AIR research, but are being naïve regarding what is necessary for such research to deliver effective--and cost-effective--results. It's a strong case, and I'd be the first to concede that the advocates for AIR research have not (at least to my knowledge) produced a plausible abstract task that is amenable to efficient evaluation.

To quote Nick again, it's a grand challenge. And Ellen makes it clear that what we've learned so far is not encouraging.

Sunday, April 6, 2008

Nick Belkin at ECIR '08

Last week, I had the pleasure to attend the 30th European Conference on Information Retrieval, chaired by Iadh Ounis at the University of Glasgow. The conference was outstanding in several respects, not least of which was a keynote address by Nick Belkin, one the world's leading researchers on interactive information retrieval.

Nick's keynote, entitled "Some(what) Grand Challenges for Information Retrieval," was a full frontal attack on the Cranfield evaluation paradigm that has dominated IR research for the past half century. I am hoping to see his keynote published and posted online, but in the meantime here is a choice excerpt:
in accepting the [Gerald Salton] award at the 1997 SIGIR meeting, Tefko Saracevic stressed the significance of integrating research in information seeking behavior with research in IR system models and algorithms, saying: "if we consider that unlike art IR is not there for its own sake, that is, IR systems are researched and built to be used, then IR is far, far more than a branch of computer science, concerned primarily with issues of algorithms, computers, and computing."

...

Nevertheless, we can still see the dominance of the TREC (i.e. Cranfield) evaluation paradigm in most IR research, the inability of this paradigm to accommodate study of people in interaction with information systems (cf. the death of the TREC Interactive Track), and a dearth of research which integrates study of users’ goals, tasks and behaviors with research on models and methods which respond to results of such studies and supports those goals, tasks and behaviors.

This situation is especially striking for several reasons. First, it is clearly the case that IR as practiced is inherently interactive; secondly, it is clearly the case that the new models and associated representation and ranking techniques lead to only incremental (if that) improvement in performance over previous models and techniques, which is generally not statistically significant; and thirdly, that such improvement, as determined in TREC-style evaluation, rarely, if ever, leads to improved performance by human searchers in interactive IR systems.
Nick has long been critical of the IR community's neglect of users and interaction. But this keynote was significant for two reasons. First, the ECIR program committee's decision to invite a keynote speaker from the information science community acknowledges the need for collaboration between these two communities. Second, Nick reciprocated this overture by calling for interdisciplinary efforts to bridge the gap between the formal study of information retrieval and the practical understanding of information behavior. As an avid proponent of HCIR, I am heartily encouraged by steps like these.