Dear friends in the information access community,
I am reaching out to you with this open letter because I believe we, the leading providers and analysts in the information access community, share a common goal of helping companies understand, evaluate, and differentiate the technologies in this space.
Frankly, I feel that we as a community can do much better at achieving this goal. In my experience talking with CTOs, CIOs, and other decision makers in enterprises, I've found that too many people fail to understand either the state of current technology or the processes they need to put in place to leverage that technology. Indeed, a recent AIIM report confirms what I already knew anecdotally--that there is a widespread failure in the enterprise to understand and derive value from information access.
In order to advance the state of knowledge, I propose that we engage an underutilized resource: the scholarly community of information retrieval and information science researchers. Not only has this community brought us many of the foundations of the technology we provide, but it has also developed a rigorous tradition of evaluation and peer review.
In addition, this community has been increasingly interested in connection with practitioners, as demonstrated by the industry days held at top-tier scholarly conferences, such as SIGIR, CIKM, and ECIR. I have participated in a few of these, and I was impressed with the quality of both the presenters and the attendees. Web search leaders, such as Google, Yahoo, and Microsoft, have embraced these events, as have smaller companies that specialize in search and related technologies, such as information extraction. Enterprise information access providers, however, have been largely absent at these events, as have industry analysts.
I suggest that we take at least the following steps to engage the scholarly community of information retrieval and information science researchers:The rigor and independence of the conferences and workshops makes them ideal as vendor-neutral forums. I hope that you all will join me in working to strengthen the connection between the commercial and scholarly communities, thus furthering everyone's understanding of the technology that drives our community forward.
- Collaborate with the organizers of academic conferences such as SIGIR, CIKM, and ECIR to promote participation of enterprise information access providers and analysts in conference industry days.
- Participate in workshops that are particularly relevant to enterprise information access providers, such as the annual HCIR and exploratory search workshops.
Please contact me at dt@endeca.com or join in an open discussion at http://thenoisychannel.blogspot.com/2008/07/call-to-action.html if you are interested in participating in this effort.
Sincerely,
Daniel Tunkelang
Wednesday, July 2, 2008
A Call to Action
Tuesday, July 1, 2008
Clarification before Refinement on Amazon
While I find this interface less than ideal (e.g. even if all of your search are in a single category, it still makes you select that category explicitly), I do commend them for recognizing the need to have users clarify before they refine. The implication--one we've been pursuing at Endeca--is that it is incumbent on the system to detect when its understanding of the user's intent is ambiguous enough to require a clarification dialogue.
Sunday, June 29, 2008
Back from ISSS Workshop
I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.
One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.I'll let folks know as more information is released from the workshop.
Tuesday, June 24, 2008
What is (not) Exploratory Search?
In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).
But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.
Should we conclude then that exploratory search is, in fact, a fringe use case?
According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.
Let me offer the following characterization of non-exploratory search:
- You know exactly what you want.
- You know exactly how to ask for it.
- You expect a search query to yield one of two responses:
- Success: you are presented with the object of your search.
- Failure: you learn that the object of your search is unavailable.
There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.
Friday, June 20, 2008
Enterprise Search Done Right
Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.
Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.
Tuesday, June 17, 2008
Information Retrieval Systems, 1896 - 1966
Monday, June 16, 2008
A Game to Evaluate Browsing Interfaces?
Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.
As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.
As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.
Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.
Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.
Thursday, June 12, 2008
Max Wilson's Blog
His post on exhibiting exploratory behaviour (that's the Queen's English to you!) raises an issue at the heart of many of our discussions on this blog: what is exploratory behavior? Is it clarification or refinement? Are users exploring in order to resolve imperfect communication with the information retrieval system, or are they exploring in order to learn?
These are burning questions, and I look forward to learning more about how Max, m.c. schraefel, and others are addressing them.
Wednesday, June 11, 2008
How Google Measures Search Quality
The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.
I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.
More questions for Amit. :)
Tuesday, June 10, 2008
Seeking Opinions about Information Seeking
So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.
I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.
Sunday, June 8, 2008
Exploratory search is relevant too!
Between the readership here, the folks involved in HCIR '08, and the participants in the IS3 workshop, I would think we have more than enough expertise in exploratory search to fix these up.
Any volunteers? For those of you who are doing research in exploratory search, consider that those two Wikipedia pages are the top hits returned when people search for exploratory search on Google.
Thursday, June 5, 2008
HCIR '08
HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008
About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.
In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.
This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.
Keynote speaker: Susan Dumais, Microsoft Research
Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.
Possible topics include, but are not limited to:
- Novel interaction techniques for information retrieval.
- Modeling and evaluation of interactive information retrieval.
- Exploratory search and information discovery.
- Information visualization and visual analytics.
- Applications of HCI techniques to information retrieval needs in specific domains.
- Ethnography and user studies relevant to information retrieval and access.
- Scale and efficiency considerations for interactive information retrieval systems.
- Relevance feedback and active learning approaches for information retrieval.
Important Dates
- Aug 22 - Papers/abstracts due
- Sep 12 - Decisions to authors
- Oct 3 - Final copy due for printing
- Oct 23 - Workshop date
Workshop Organization
Workshop chairs:
- Daniel Tunkelang, Endeca
- Ryen White, Microsoft Research
- Bill Kules, Catholic University of America
- James Allan, University of Massachusetts, USA
- Peter Anick, Yahoo!, USA
- Peter Bailey, Live Search, USA
- Peter Brusilovsky, University of Pittsburgh, USA
- Pia Borlund, Royal School of Library and Information Science, Denmark
- Robert Capra, University of North Carolina at Chapel Hill, USA
- Ed Chi, Palo Alto Research Center (PARC), USA
- Ed Cutrell, Microsoft Research, USA
- Ed Fox, Virginia Tech, USA
- Gene Golovchinsky, FX Palo Alto Laboratory, USA
- Marti Hearst, University of California at Berkeley, USA
- Jim Jansen, Pennsylvania State University, USA
- Diane Kelly, University of North Carolina at Chapel Hill, USA
- Gary Marchionini, University of North Carolina at Chapel Hill, USA
- Merrie Morris, Microsoft Research, USA
- Jeremy Pickens, FX Palo Alto Laboratory, USA
- Yan Qu, University of Maryland at College Park, USA
- Amanda Spink, Queensland University of Technology, Australia
- Elaine Toms, Dalhousie University, Canada
- Martin Wattenberg, IBM Research, USA
- Ross Wilkinson, CSIRO, Australia
Wednesday, June 4, 2008
Idea Navigation
We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.
Click on the frame below to see the presentation they delivered at CHI '08.

Idea Navigation: Structured Browsing for Unstructured Text
Monday, June 2, 2008
Clarification vs. Refinement
What is the difference? I think it's easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you're dropped somewhere at random, you're really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.
How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.
"Did you mean..." is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I'm glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.
Sunday, June 1, 2008
Your Input Really is Relevant!
Check out:
- The entry before I edited it.
- The entry after I edited it.
- The current entry, revised by Jon and Bob, and Fernando.
Friday, May 30, 2008
Is Search Broken?
Last night, I had the privilege of speaking to fellow CMU School of Computer Science alumni at Fidelity's Center for Advanced Technology in Boston. Dean Randy Bryant, Associate Director of Corporate Relations Dan Jenkins, and Director of Alumni Relations Tina Carr, organized the event, and they encouraged me to pick a provocative subject.
Thus encouraged, I decided to ask the question: Is Search Broken?
Slides are here as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.
Wednesday, May 28, 2008
Another HCIR Game
I'm not sure how well it will catch on with casual gamers--but that is hardly its primary motivation. Rather, the challenge was designed to help provide a foundation for evaluating interactive information retrieval--in a cross-language setting, no less. Details available at the iCLEF 2008 site or in this paper.
I'm thrilled to see efforts like these emerging to evaluate interactive retrieval--indeed, this feels like a solitaire version of Phetch.
Tuesday, May 27, 2008
The Magic Shelf
Now back to our commercial-free programming...
Monday, May 26, 2008
Your Input is Relevant!
As some of you may know, I am the primary author of the Human Computer Information Retrieval entry on Wikipedia. I created this entry last November, shortly after the HCIR '07 workshop. One of the ideas we've tossed around for HCIR '08 is to collaboratively edit the page. But why wait? With apologies to Isaac Asimov, I/you/we are Wikipedia, so let's improve the entry now!
And, while you've got Wikipedia on the brain, please take a look at the Relevance (Information Retrieval) entry. After an unsuccessful attempt to have this entry folded into the main Information Retrieval entry, I've tried to rewrite it to conform to what I perceive as Wikipedia's standards of quality and non-partisanship. While I tried my best, I'm sure there's still room for improving it, and I suspect that some of you reading this are among the best qualified folks to do so!
As Lawrence Lessig says, it's a read-write society. So readers, please help out a bit with the writing.
Saturday, May 24, 2008
Games With an HCIR Purpose?
A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,
Here is a brief explanation from the site:
When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.
Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.
I've been interested in Von Ahn's work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site:
Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.
If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.
A few important details that this description leaves out:
- The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
- A Seeker loses points (I can't recall how many) for wrong guesses.
- The game has a time limit (hence the "Quick!").
Now, let's unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.
A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.
Assuming these simplifications, here is how a Seeker plays Phetch:
- Read the description provided by the Describer and uses it to compose a search.
- Scan the results sequentially, interrupting either to make a guess or to reformulate the search.
The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.
Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR '07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin's grand challenge.
Wednesday, July 2, 2008
A Call to Action
Dear friends in the information access community,
I am reaching out to you with this open letter because I believe we, the leading providers and analysts in the information access community, share a common goal of helping companies understand, evaluate, and differentiate the technologies in this space.
Frankly, I feel that we as a community can do much better at achieving this goal. In my experience talking with CTOs, CIOs, and other decision makers in enterprises, I've found that too many people fail to understand either the state of current technology or the processes they need to put in place to leverage that technology. Indeed, a recent AIIM report confirms what I already knew anecdotally--that there is a widespread failure in the enterprise to understand and derive value from information access.
In order to advance the state of knowledge, I propose that we engage an underutilized resource: the scholarly community of information retrieval and information science researchers. Not only has this community brought us many of the foundations of the technology we provide, but it has also developed a rigorous tradition of evaluation and peer review.
In addition, this community has been increasingly interested in connection with practitioners, as demonstrated by the industry days held at top-tier scholarly conferences, such as SIGIR, CIKM, and ECIR. I have participated in a few of these, and I was impressed with the quality of both the presenters and the attendees. Web search leaders, such as Google, Yahoo, and Microsoft, have embraced these events, as have smaller companies that specialize in search and related technologies, such as information extraction. Enterprise information access providers, however, have been largely absent at these events, as have industry analysts.
I suggest that we take at least the following steps to engage the scholarly community of information retrieval and information science researchers:The rigor and independence of the conferences and workshops makes them ideal as vendor-neutral forums. I hope that you all will join me in working to strengthen the connection between the commercial and scholarly communities, thus furthering everyone's understanding of the technology that drives our community forward.
- Collaborate with the organizers of academic conferences such as SIGIR, CIKM, and ECIR to promote participation of enterprise information access providers and analysts in conference industry days.
- Participate in workshops that are particularly relevant to enterprise information access providers, such as the annual HCIR and exploratory search workshops.
Please contact me at dt@endeca.com or join in an open discussion at http://thenoisychannel.blogspot.com/2008/07/call-to-action.html if you are interested in participating in this effort.
Sincerely,
Daniel Tunkelang
Tuesday, July 1, 2008
Clarification before Refinement on Amazon
While I find this interface less than ideal (e.g. even if all of your search are in a single category, it still makes you select that category explicitly), I do commend them for recognizing the need to have users clarify before they refine. The implication--one we've been pursuing at Endeca--is that it is incumbent on the system to detect when its understanding of the user's intent is ambiguous enough to require a clarification dialogue.
Sunday, June 29, 2008
Back from ISSS Workshop
I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.
One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.I'll let folks know as more information is released from the workshop.
Tuesday, June 24, 2008
What is (not) Exploratory Search?
In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).
But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.
Should we conclude then that exploratory search is, in fact, a fringe use case?
According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.
Let me offer the following characterization of non-exploratory search:
- You know exactly what you want.
- You know exactly how to ask for it.
- You expect a search query to yield one of two responses:
- Success: you are presented with the object of your search.
- Failure: you learn that the object of your search is unavailable.
There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.
Friday, June 20, 2008
Enterprise Search Done Right
Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.
Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.
Tuesday, June 17, 2008
Information Retrieval Systems, 1896 - 1966
Monday, June 16, 2008
A Game to Evaluate Browsing Interfaces?
Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.
As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.
As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.
Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.
Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.
Thursday, June 12, 2008
Max Wilson's Blog
His post on exhibiting exploratory behaviour (that's the Queen's English to you!) raises an issue at the heart of many of our discussions on this blog: what is exploratory behavior? Is it clarification or refinement? Are users exploring in order to resolve imperfect communication with the information retrieval system, or are they exploring in order to learn?
These are burning questions, and I look forward to learning more about how Max, m.c. schraefel, and others are addressing them.
Wednesday, June 11, 2008
How Google Measures Search Quality
The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.
I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.
More questions for Amit. :)
Tuesday, June 10, 2008
Seeking Opinions about Information Seeking
So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.
I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.
Sunday, June 8, 2008
Exploratory search is relevant too!
Between the readership here, the folks involved in HCIR '08, and the participants in the IS3 workshop, I would think we have more than enough expertise in exploratory search to fix these up.
Any volunteers? For those of you who are doing research in exploratory search, consider that those two Wikipedia pages are the top hits returned when people search for exploratory search on Google.
Thursday, June 5, 2008
HCIR '08
HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008
About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.
In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.
This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.
Keynote speaker: Susan Dumais, Microsoft Research
Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.
Possible topics include, but are not limited to:
- Novel interaction techniques for information retrieval.
- Modeling and evaluation of interactive information retrieval.
- Exploratory search and information discovery.
- Information visualization and visual analytics.
- Applications of HCI techniques to information retrieval needs in specific domains.
- Ethnography and user studies relevant to information retrieval and access.
- Scale and efficiency considerations for interactive information retrieval systems.
- Relevance feedback and active learning approaches for information retrieval.
Important Dates
- Aug 22 - Papers/abstracts due
- Sep 12 - Decisions to authors
- Oct 3 - Final copy due for printing
- Oct 23 - Workshop date
Workshop Organization
Workshop chairs:
- Daniel Tunkelang, Endeca
- Ryen White, Microsoft Research
- Bill Kules, Catholic University of America
- James Allan, University of Massachusetts, USA
- Peter Anick, Yahoo!, USA
- Peter Bailey, Live Search, USA
- Peter Brusilovsky, University of Pittsburgh, USA
- Pia Borlund, Royal School of Library and Information Science, Denmark
- Robert Capra, University of North Carolina at Chapel Hill, USA
- Ed Chi, Palo Alto Research Center (PARC), USA
- Ed Cutrell, Microsoft Research, USA
- Ed Fox, Virginia Tech, USA
- Gene Golovchinsky, FX Palo Alto Laboratory, USA
- Marti Hearst, University of California at Berkeley, USA
- Jim Jansen, Pennsylvania State University, USA
- Diane Kelly, University of North Carolina at Chapel Hill, USA
- Gary Marchionini, University of North Carolina at Chapel Hill, USA
- Merrie Morris, Microsoft Research, USA
- Jeremy Pickens, FX Palo Alto Laboratory, USA
- Yan Qu, University of Maryland at College Park, USA
- Amanda Spink, Queensland University of Technology, Australia
- Elaine Toms, Dalhousie University, Canada
- Martin Wattenberg, IBM Research, USA
- Ross Wilkinson, CSIRO, Australia
Wednesday, June 4, 2008
Idea Navigation
We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.
Click on the frame below to see the presentation they delivered at CHI '08.

Idea Navigation: Structured Browsing for Unstructured Text
Monday, June 2, 2008
Clarification vs. Refinement
What is the difference? I think it's easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you're dropped somewhere at random, you're really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.
How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.
"Did you mean..." is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I'm glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.
Sunday, June 1, 2008
Your Input Really is Relevant!
Check out:
- The entry before I edited it.
- The entry after I edited it.
- The current entry, revised by Jon and Bob, and Fernando.
Friday, May 30, 2008
Is Search Broken?
Last night, I had the privilege of speaking to fellow CMU School of Computer Science alumni at Fidelity's Center for Advanced Technology in Boston. Dean Randy Bryant, Associate Director of Corporate Relations Dan Jenkins, and Director of Alumni Relations Tina Carr, organized the event, and they encouraged me to pick a provocative subject.
Thus encouraged, I decided to ask the question: Is Search Broken?
Slides are here as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.
Wednesday, May 28, 2008
Another HCIR Game
I'm not sure how well it will catch on with casual gamers--but that is hardly its primary motivation. Rather, the challenge was designed to help provide a foundation for evaluating interactive information retrieval--in a cross-language setting, no less. Details available at the iCLEF 2008 site or in this paper.
I'm thrilled to see efforts like these emerging to evaluate interactive retrieval--indeed, this feels like a solitaire version of Phetch.
Tuesday, May 27, 2008
The Magic Shelf
Now back to our commercial-free programming...
Monday, May 26, 2008
Your Input is Relevant!
As some of you may know, I am the primary author of the Human Computer Information Retrieval entry on Wikipedia. I created this entry last November, shortly after the HCIR '07 workshop. One of the ideas we've tossed around for HCIR '08 is to collaboratively edit the page. But why wait? With apologies to Isaac Asimov, I/you/we are Wikipedia, so let's improve the entry now!
And, while you've got Wikipedia on the brain, please take a look at the Relevance (Information Retrieval) entry. After an unsuccessful attempt to have this entry folded into the main Information Retrieval entry, I've tried to rewrite it to conform to what I perceive as Wikipedia's standards of quality and non-partisanship. While I tried my best, I'm sure there's still room for improving it, and I suspect that some of you reading this are among the best qualified folks to do so!
As Lawrence Lessig says, it's a read-write society. So readers, please help out a bit with the writing.
Saturday, May 24, 2008
Games With an HCIR Purpose?
A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,
Here is a brief explanation from the site:
When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.
Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.
I've been interested in Von Ahn's work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site:
Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.
If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.
A few important details that this description leaves out:
- The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
- A Seeker loses points (I can't recall how many) for wrong guesses.
- The game has a time limit (hence the "Quick!").
Now, let's unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.
A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.
Assuming these simplifications, here is how a Seeker plays Phetch:
- Read the description provided by the Describer and uses it to compose a search.
- Scan the results sequentially, interrupting either to make a guess or to reformulate the search.
The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.
Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR '07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin's grand challenge.
Wednesday, July 2, 2008
A Call to Action
Dear friends in the information access community,
I am reaching out to you with this open letter because I believe we, the leading providers and analysts in the information access community, share a common goal of helping companies understand, evaluate, and differentiate the technologies in this space.
Frankly, I feel that we as a community can do much better at achieving this goal. In my experience talking with CTOs, CIOs, and other decision makers in enterprises, I've found that too many people fail to understand either the state of current technology or the processes they need to put in place to leverage that technology. Indeed, a recent AIIM report confirms what I already knew anecdotally--that there is a widespread failure in the enterprise to understand and derive value from information access.
In order to advance the state of knowledge, I propose that we engage an underutilized resource: the scholarly community of information retrieval and information science researchers. Not only has this community brought us many of the foundations of the technology we provide, but it has also developed a rigorous tradition of evaluation and peer review.
In addition, this community has been increasingly interested in connection with practitioners, as demonstrated by the industry days held at top-tier scholarly conferences, such as SIGIR, CIKM, and ECIR. I have participated in a few of these, and I was impressed with the quality of both the presenters and the attendees. Web search leaders, such as Google, Yahoo, and Microsoft, have embraced these events, as have smaller companies that specialize in search and related technologies, such as information extraction. Enterprise information access providers, however, have been largely absent at these events, as have industry analysts.
I suggest that we take at least the following steps to engage the scholarly community of information retrieval and information science researchers:The rigor and independence of the conferences and workshops makes them ideal as vendor-neutral forums. I hope that you all will join me in working to strengthen the connection between the commercial and scholarly communities, thus furthering everyone's understanding of the technology that drives our community forward.
- Collaborate with the organizers of academic conferences such as SIGIR, CIKM, and ECIR to promote participation of enterprise information access providers and analysts in conference industry days.
- Participate in workshops that are particularly relevant to enterprise information access providers, such as the annual HCIR and exploratory search workshops.
Please contact me at dt@endeca.com or join in an open discussion at http://thenoisychannel.blogspot.com/2008/07/call-to-action.html if you are interested in participating in this effort.
Sincerely,
Daniel Tunkelang
Tuesday, July 1, 2008
Clarification before Refinement on Amazon
While I find this interface less than ideal (e.g. even if all of your search are in a single category, it still makes you select that category explicitly), I do commend them for recognizing the need to have users clarify before they refine. The implication--one we've been pursuing at Endeca--is that it is incumbent on the system to detect when its understanding of the user's intent is ambiguous enough to require a clarification dialogue.
Sunday, June 29, 2008
Back from ISSS Workshop
I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.
One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.I'll let folks know as more information is released from the workshop.
Tuesday, June 24, 2008
What is (not) Exploratory Search?
In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).
But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.
Should we conclude then that exploratory search is, in fact, a fringe use case?
According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.
Let me offer the following characterization of non-exploratory search:
- You know exactly what you want.
- You know exactly how to ask for it.
- You expect a search query to yield one of two responses:
- Success: you are presented with the object of your search.
- Failure: you learn that the object of your search is unavailable.
There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.
Friday, June 20, 2008
Enterprise Search Done Right
Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.
Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.
Tuesday, June 17, 2008
Information Retrieval Systems, 1896 - 1966
Monday, June 16, 2008
A Game to Evaluate Browsing Interfaces?
Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.
As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.
As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.
Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.
Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.
Thursday, June 12, 2008
Max Wilson's Blog
His post on exhibiting exploratory behaviour (that's the Queen's English to you!) raises an issue at the heart of many of our discussions on this blog: what is exploratory behavior? Is it clarification or refinement? Are users exploring in order to resolve imperfect communication with the information retrieval system, or are they exploring in order to learn?
These are burning questions, and I look forward to learning more about how Max, m.c. schraefel, and others are addressing them.
Wednesday, June 11, 2008
How Google Measures Search Quality
The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.
I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.
More questions for Amit. :)
Tuesday, June 10, 2008
Seeking Opinions about Information Seeking
So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.
I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.
Sunday, June 8, 2008
Exploratory search is relevant too!
Between the readership here, the folks involved in HCIR '08, and the participants in the IS3 workshop, I would think we have more than enough expertise in exploratory search to fix these up.
Any volunteers? For those of you who are doing research in exploratory search, consider that those two Wikipedia pages are the top hits returned when people search for exploratory search on Google.
Thursday, June 5, 2008
HCIR '08
HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008
About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.
In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.
This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.
Keynote speaker: Susan Dumais, Microsoft Research
Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.
Possible topics include, but are not limited to:
- Novel interaction techniques for information retrieval.
- Modeling and evaluation of interactive information retrieval.
- Exploratory search and information discovery.
- Information visualization and visual analytics.
- Applications of HCI techniques to information retrieval needs in specific domains.
- Ethnography and user studies relevant to information retrieval and access.
- Scale and efficiency considerations for interactive information retrieval systems.
- Relevance feedback and active learning approaches for information retrieval.
Important Dates
- Aug 22 - Papers/abstracts due
- Sep 12 - Decisions to authors
- Oct 3 - Final copy due for printing
- Oct 23 - Workshop date
Workshop Organization
Workshop chairs:
- Daniel Tunkelang, Endeca
- Ryen White, Microsoft Research
- Bill Kules, Catholic University of America
- James Allan, University of Massachusetts, USA
- Peter Anick, Yahoo!, USA
- Peter Bailey, Live Search, USA
- Peter Brusilovsky, University of Pittsburgh, USA
- Pia Borlund, Royal School of Library and Information Science, Denmark
- Robert Capra, University of North Carolina at Chapel Hill, USA
- Ed Chi, Palo Alto Research Center (PARC), USA
- Ed Cutrell, Microsoft Research, USA
- Ed Fox, Virginia Tech, USA
- Gene Golovchinsky, FX Palo Alto Laboratory, USA
- Marti Hearst, University of California at Berkeley, USA
- Jim Jansen, Pennsylvania State University, USA
- Diane Kelly, University of North Carolina at Chapel Hill, USA
- Gary Marchionini, University of North Carolina at Chapel Hill, USA
- Merrie Morris, Microsoft Research, USA
- Jeremy Pickens, FX Palo Alto Laboratory, USA
- Yan Qu, University of Maryland at College Park, USA
- Amanda Spink, Queensland University of Technology, Australia
- Elaine Toms, Dalhousie University, Canada
- Martin Wattenberg, IBM Research, USA
- Ross Wilkinson, CSIRO, Australia
Wednesday, June 4, 2008
Idea Navigation
We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.
Click on the frame below to see the presentation they delivered at CHI '08.

Idea Navigation: Structured Browsing for Unstructured Text
Monday, June 2, 2008
Clarification vs. Refinement
What is the difference? I think it's easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you're dropped somewhere at random, you're really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.
How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.
"Did you mean..." is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I'm glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.
Sunday, June 1, 2008
Your Input Really is Relevant!
Check out:
- The entry before I edited it.
- The entry after I edited it.
- The current entry, revised by Jon and Bob, and Fernando.
Friday, May 30, 2008
Is Search Broken?
Last night, I had the privilege of speaking to fellow CMU School of Computer Science alumni at Fidelity's Center for Advanced Technology in Boston. Dean Randy Bryant, Associate Director of Corporate Relations Dan Jenkins, and Director of Alumni Relations Tina Carr, organized the event, and they encouraged me to pick a provocative subject.
Thus encouraged, I decided to ask the question: Is Search Broken?
Slides are here as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.
Wednesday, May 28, 2008
Another HCIR Game
I'm not sure how well it will catch on with casual gamers--but that is hardly its primary motivation. Rather, the challenge was designed to help provide a foundation for evaluating interactive information retrieval--in a cross-language setting, no less. Details available at the iCLEF 2008 site or in this paper.
I'm thrilled to see efforts like these emerging to evaluate interactive retrieval--indeed, this feels like a solitaire version of Phetch.
Tuesday, May 27, 2008
The Magic Shelf
Now back to our commercial-free programming...
Monday, May 26, 2008
Your Input is Relevant!
As some of you may know, I am the primary author of the Human Computer Information Retrieval entry on Wikipedia. I created this entry last November, shortly after the HCIR '07 workshop. One of the ideas we've tossed around for HCIR '08 is to collaboratively edit the page. But why wait? With apologies to Isaac Asimov, I/you/we are Wikipedia, so let's improve the entry now!
And, while you've got Wikipedia on the brain, please take a look at the Relevance (Information Retrieval) entry. After an unsuccessful attempt to have this entry folded into the main Information Retrieval entry, I've tried to rewrite it to conform to what I perceive as Wikipedia's standards of quality and non-partisanship. While I tried my best, I'm sure there's still room for improving it, and I suspect that some of you reading this are among the best qualified folks to do so!
As Lawrence Lessig says, it's a read-write society. So readers, please help out a bit with the writing.
Saturday, May 24, 2008
Games With an HCIR Purpose?
A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,
Here is a brief explanation from the site:
When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.
Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.
I've been interested in Von Ahn's work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site:
Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.
If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.
A few important details that this description leaves out:
- The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
- A Seeker loses points (I can't recall how many) for wrong guesses.
- The game has a time limit (hence the "Quick!").
Now, let's unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.
A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.
Assuming these simplifications, here is how a Seeker plays Phetch:
- Read the description provided by the Describer and uses it to compose a search.
- Scan the results sequentially, interrupting either to make a guess or to reformulate the search.
The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.
Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR '07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin's grand challenge.