Sunday, June 29, 2008

Back from ISSS Workshop

My apologies for the sparsity of posts lately; it's been a busy week!

I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.
We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.

One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.
I'll let folks know as more information is released from the workshop.

Tuesday, June 24, 2008

What is (not) Exploratory Search?

One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.

In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).

But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.

Should we conclude then that exploratory search is, in fact, a fringe use case?

According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).
If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.

Let me offer the following characterization of non-exploratory search:
  • You know exactly what you want.
  • You know exactly how to ask for it.
  • You expect a search query to yield one of two responses:
    - Success: you are presented with the object of your search.
    - Failure: you learn that the object of your search is unavailable.
If any of these assumptions fails to hold, then the search problem is, to some extent, exploratory.

There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.

Friday, June 20, 2008

Enterprise Search Done Right

A recent study from AIIM (the Association for Information and Image Management, also known as the Enterprise Content Management Association) reports that enterprise search frustrates and disappoints users. Specifically, 49% of survey respondents “agreed” or “strongly agreed” that it is a difficult and time consuming process to find the information they need to do their job.

Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.
As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.

Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.

Tuesday, June 17, 2008

Information Retrieval Systems, 1896 - 1966

My colleague and Endeca co-founder Pete Bell just pointed me to a great post by Kevin Kelly about what may be the earliest implementation of a faceted navigation system. Like every good Endecan, I'm familiar with Ranganathan's struggle to sell the library world on colon classification. But it is still striking to see this struggle played out through technology artifacts from a pre-Internet world.

Monday, June 16, 2008

A Game to Evaluate Browsing Interfaces?

I've mused a fair amount about to apply the concept of the Phetch human computation game to evaluate browsing-based information retrieval interfaces. I'd love to be able to better evaluate faceted navigation and clustering approaches, relative to conventional search as well as relative to one another.

Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.

As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.

As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.

Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.

Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.

Thursday, June 12, 2008

Max Wilson's Blog

Max Wilson, a colleague of mine at the University of Southampton who has contributed frequently to the conversation here at the Noisy Channel, just started a blog of his own. Check out Max's blog here.

His post on exhibiting exploratory behaviour (that's the Queen's English to you!) raises an issue at the heart of many of our discussions on this blog: what is exploratory behavior? Is it clarification or refinement? Are users exploring in order to resolve imperfect communication with the information retrieval system, or are they exploring in order to learn?

These are burning questions, and I look forward to learning more about how Max, m.c. schraefel, and others are addressing them.

Wednesday, June 11, 2008

How Google Measures Search Quality

Thanks to Jon Elsas for calling my attention to a great post at Datawocky today on how Google measures search quality, written by Anand Rajaraman based on his conversation with Google Director of Research Peter Norvig.

The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.

I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.

More questions for Amit. :)

Tuesday, June 10, 2008

Seeking Opinions about Information Seeking

In a couple of weeks, I'll be participating in an invitational workshop sponsored by the National Science Foundation on Information Seeking Support Systems at the University of North Carolina - Chapel Hill. The participants are an impressive bunch--I feel like I'm the only person attending whom I've never heard of!

So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.

I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.

Sunday, June 8, 2008

Exploratory search is relevant too!

After seeing what the Noisy channel readership has done to improve the HCIR and Relevance Wikipedia entries, I was thinking we might take on one or two more. Specifically, the Exploratory Search and Exploratory Search Systems entries are, quite frankly, in sad shape.

Between the readership here, the folks involved in HCIR '08, and the participants in the IS3 workshop, I would think we have more than enough expertise in exploratory search to fix these up.

Any volunteers? For those of you who are doing research in exploratory search, consider that those two Wikipedia pages are the top hits returned when people search for exploratory search on Google.

Thursday, June 5, 2008

HCIR '08

It's my pleasure to announce...

HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008

About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.

In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.

This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.

Keynote speaker: Susan Dumais, Microsoft Research

Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.

Possible topics include, but are not limited to:
  • Novel interaction techniques for information retrieval.
  • Modeling and evaluation of interactive information retrieval.
  • Exploratory search and information discovery.
  • Information visualization and visual analytics.
  • Applications of HCI techniques to information retrieval needs in specific domains.
  • Ethnography and user studies relevant to information retrieval and access.
  • Scale and efficiency considerations for interactive information retrieval systems.
  • Relevance feedback and active learning approaches for information retrieval.

Important Dates
  • Aug 22 - Papers/abstracts due
  • Sep 12 - Decisions to authors
  • Oct 3 - Final copy due for printing
  • Oct 23 - Workshop date
Contributions will be peer-reviewed by two members of the program committee. For information on paper submission, see http://research.microsoft.com/~ryenw/hcir2008/submit.html or contact cua-hcir2008@cua.edu.


Workshop Organization

Workshop chairs:
Program chair:
Program Committee:
Supporters

Wednesday, June 4, 2008

Idea Navigation

Last summer, my colleague Vladimir Zelevinsky worked with two interns, Robin Stewart (MIT) and Greg Scott (Tufts), on a novel approach to information exploration. They call it "idea navigation": the basic idea is to extract subject-verb-object triples from unstructured text, group them into hierarchies, and then expose them in a faceted search and browsing interface. I like to think of it as an exploratory search take on question answering.

We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.

Click on the frame below to see the presentation they delivered at CHI '08.



Idea Navigation: Structured Browsing for Unstructured Text

Monday, June 2, 2008

Clarification vs. Refinement

The other day, in between braving the Hulk and Spiderman rides at Endeca Discover '08, I was chatting with Peter Morville about one of my favorite pet peeves in faceted search implementations: the confounding of clarification and refinement. To my delight, he posted about it at findability.org today.

What is the difference? I think it's easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you're dropped somewhere at random, you're really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.

How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.

"Did you mean..." is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I'm glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.

Sunday, June 1, 2008

Your Input Really is Relevant!

For those who haven't been following the progress on the Wikipedia entry for "Relevance (Information Retrieval)", I'd like to thank Jon Elsas, Bob Carpenter, and Fernando Diaz for helping turn lead into gold.

Check out:
I'm proud of The Noisy Channel community for fixing one of the top two hits on Google for "relevance".

Sunday, June 29, 2008

Back from ISSS Workshop

My apologies for the sparsity of posts lately; it's been a busy week!

I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.
We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.

One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.
I'll let folks know as more information is released from the workshop.

Tuesday, June 24, 2008

What is (not) Exploratory Search?

One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.

In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).

But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.

Should we conclude then that exploratory search is, in fact, a fringe use case?

According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).
If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.

Let me offer the following characterization of non-exploratory search:
  • You know exactly what you want.
  • You know exactly how to ask for it.
  • You expect a search query to yield one of two responses:
    - Success: you are presented with the object of your search.
    - Failure: you learn that the object of your search is unavailable.
If any of these assumptions fails to hold, then the search problem is, to some extent, exploratory.

There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.

Friday, June 20, 2008

Enterprise Search Done Right

A recent study from AIIM (the Association for Information and Image Management, also known as the Enterprise Content Management Association) reports that enterprise search frustrates and disappoints users. Specifically, 49% of survey respondents “agreed” or “strongly agreed” that it is a difficult and time consuming process to find the information they need to do their job.

Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.
As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.

Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.

Tuesday, June 17, 2008

Information Retrieval Systems, 1896 - 1966

My colleague and Endeca co-founder Pete Bell just pointed me to a great post by Kevin Kelly about what may be the earliest implementation of a faceted navigation system. Like every good Endecan, I'm familiar with Ranganathan's struggle to sell the library world on colon classification. But it is still striking to see this struggle played out through technology artifacts from a pre-Internet world.

Monday, June 16, 2008

A Game to Evaluate Browsing Interfaces?

I've mused a fair amount about to apply the concept of the Phetch human computation game to evaluate browsing-based information retrieval interfaces. I'd love to be able to better evaluate faceted navigation and clustering approaches, relative to conventional search as well as relative to one another.

Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.

As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.

As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.

Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.

Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.

Thursday, June 12, 2008

Max Wilson's Blog

Max Wilson, a colleague of mine at the University of Southampton who has contributed frequently to the conversation here at the Noisy Channel, just started a blog of his own. Check out Max's blog here.

His post on exhibiting exploratory behaviour (that's the Queen's English to you!) raises an issue at the heart of many of our discussions on this blog: what is exploratory behavior? Is it clarification or refinement? Are users exploring in order to resolve imperfect communication with the information retrieval system, or are they exploring in order to learn?

These are burning questions, and I look forward to learning more about how Max, m.c. schraefel, and others are addressing them.

Wednesday, June 11, 2008

How Google Measures Search Quality

Thanks to Jon Elsas for calling my attention to a great post at Datawocky today on how Google measures search quality, written by Anand Rajaraman based on his conversation with Google Director of Research Peter Norvig.

The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.

I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.

More questions for Amit. :)

Tuesday, June 10, 2008

Seeking Opinions about Information Seeking

In a couple of weeks, I'll be participating in an invitational workshop sponsored by the National Science Foundation on Information Seeking Support Systems at the University of North Carolina - Chapel Hill. The participants are an impressive bunch--I feel like I'm the only person attending whom I've never heard of!

So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.

I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.

Sunday, June 8, 2008

Exploratory search is relevant too!

After seeing what the Noisy channel readership has done to improve the HCIR and Relevance Wikipedia entries, I was thinking we might take on one or two more. Specifically, the Exploratory Search and Exploratory Search Systems entries are, quite frankly, in sad shape.

Between the readership here, the folks involved in HCIR '08, and the participants in the IS3 workshop, I would think we have more than enough expertise in exploratory search to fix these up.

Any volunteers? For those of you who are doing research in exploratory search, consider that those two Wikipedia pages are the top hits returned when people search for exploratory search on Google.

Thursday, June 5, 2008

HCIR '08

It's my pleasure to announce...

HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008

About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.

In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.

This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.

Keynote speaker: Susan Dumais, Microsoft Research

Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.

Possible topics include, but are not limited to:
  • Novel interaction techniques for information retrieval.
  • Modeling and evaluation of interactive information retrieval.
  • Exploratory search and information discovery.
  • Information visualization and visual analytics.
  • Applications of HCI techniques to information retrieval needs in specific domains.
  • Ethnography and user studies relevant to information retrieval and access.
  • Scale and efficiency considerations for interactive information retrieval systems.
  • Relevance feedback and active learning approaches for information retrieval.

Important Dates
  • Aug 22 - Papers/abstracts due
  • Sep 12 - Decisions to authors
  • Oct 3 - Final copy due for printing
  • Oct 23 - Workshop date
Contributions will be peer-reviewed by two members of the program committee. For information on paper submission, see http://research.microsoft.com/~ryenw/hcir2008/submit.html or contact cua-hcir2008@cua.edu.


Workshop Organization

Workshop chairs:
Program chair:
Program Committee:
Supporters

Wednesday, June 4, 2008

Idea Navigation

Last summer, my colleague Vladimir Zelevinsky worked with two interns, Robin Stewart (MIT) and Greg Scott (Tufts), on a novel approach to information exploration. They call it "idea navigation": the basic idea is to extract subject-verb-object triples from unstructured text, group them into hierarchies, and then expose them in a faceted search and browsing interface. I like to think of it as an exploratory search take on question answering.

We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.

Click on the frame below to see the presentation they delivered at CHI '08.



Idea Navigation: Structured Browsing for Unstructured Text

Monday, June 2, 2008

Clarification vs. Refinement

The other day, in between braving the Hulk and Spiderman rides at Endeca Discover '08, I was chatting with Peter Morville about one of my favorite pet peeves in faceted search implementations: the confounding of clarification and refinement. To my delight, he posted about it at findability.org today.

What is the difference? I think it's easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you're dropped somewhere at random, you're really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.

How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.

"Did you mean..." is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I'm glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.

Sunday, June 1, 2008

Your Input Really is Relevant!

For those who haven't been following the progress on the Wikipedia entry for "Relevance (Information Retrieval)", I'd like to thank Jon Elsas, Bob Carpenter, and Fernando Diaz for helping turn lead into gold.

Check out:
I'm proud of The Noisy Channel community for fixing one of the top two hits on Google for "relevance".

Sunday, June 29, 2008

Back from ISSS Workshop

My apologies for the sparsity of posts lately; it's been a busy week!

I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.
We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.

One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.
I'll let folks know as more information is released from the workshop.

Tuesday, June 24, 2008

What is (not) Exploratory Search?

One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.

In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).

But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.

Should we conclude then that exploratory search is, in fact, a fringe use case?

According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).
If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.

Let me offer the following characterization of non-exploratory search:
  • You know exactly what you want.
  • You know exactly how to ask for it.
  • You expect a search query to yield one of two responses:
    - Success: you are presented with the object of your search.
    - Failure: you learn that the object of your search is unavailable.
If any of these assumptions fails to hold, then the search problem is, to some extent, exploratory.

There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.

Friday, June 20, 2008

Enterprise Search Done Right

A recent study from AIIM (the Association for Information and Image Management, also known as the Enterprise Content Management Association) reports that enterprise search frustrates and disappoints users. Specifically, 49% of survey respondents “agreed” or “strongly agreed” that it is a difficult and time consuming process to find the information they need to do their job.

Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.
As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.

Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.

Tuesday, June 17, 2008

Information Retrieval Systems, 1896 - 1966

My colleague and Endeca co-founder Pete Bell just pointed me to a great post by Kevin Kelly about what may be the earliest implementation of a faceted navigation system. Like every good Endecan, I'm familiar with Ranganathan's struggle to sell the library world on colon classification. But it is still striking to see this struggle played out through technology artifacts from a pre-Internet world.

Monday, June 16, 2008

A Game to Evaluate Browsing Interfaces?

I've mused a fair amount about to apply the concept of the Phetch human computation game to evaluate browsing-based information retrieval interfaces. I'd love to be able to better evaluate faceted navigation and clustering approaches, relative to conventional search as well as relative to one another.

Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.

As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.

As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.

Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.

Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.

Thursday, June 12, 2008

Max Wilson's Blog

Max Wilson, a colleague of mine at the University of Southampton who has contributed frequently to the conversation here at the Noisy Channel, just started a blog of his own. Check out Max's blog here.

His post on exhibiting exploratory behaviour (that's the Queen's English to you!) raises an issue at the heart of many of our discussions on this blog: what is exploratory behavior? Is it clarification or refinement? Are users exploring in order to resolve imperfect communication with the information retrieval system, or are they exploring in order to learn?

These are burning questions, and I look forward to learning more about how Max, m.c. schraefel, and others are addressing them.

Wednesday, June 11, 2008

How Google Measures Search Quality

Thanks to Jon Elsas for calling my attention to a great post at Datawocky today on how Google measures search quality, written by Anand Rajaraman based on his conversation with Google Director of Research Peter Norvig.

The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.

I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.

More questions for Amit. :)

Tuesday, June 10, 2008

Seeking Opinions about Information Seeking

In a couple of weeks, I'll be participating in an invitational workshop sponsored by the National Science Foundation on Information Seeking Support Systems at the University of North Carolina - Chapel Hill. The participants are an impressive bunch--I feel like I'm the only person attending whom I've never heard of!

So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.

I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.

Sunday, June 8, 2008

Exploratory search is relevant too!

After seeing what the Noisy channel readership has done to improve the HCIR and Relevance Wikipedia entries, I was thinking we might take on one or two more. Specifically, the Exploratory Search and Exploratory Search Systems entries are, quite frankly, in sad shape.

Between the readership here, the folks involved in HCIR '08, and the participants in the IS3 workshop, I would think we have more than enough expertise in exploratory search to fix these up.

Any volunteers? For those of you who are doing research in exploratory search, consider that those two Wikipedia pages are the top hits returned when people search for exploratory search on Google.

Thursday, June 5, 2008

HCIR '08

It's my pleasure to announce...

HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008

About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.

In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.

This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.

Keynote speaker: Susan Dumais, Microsoft Research

Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.

Possible topics include, but are not limited to:
  • Novel interaction techniques for information retrieval.
  • Modeling and evaluation of interactive information retrieval.
  • Exploratory search and information discovery.
  • Information visualization and visual analytics.
  • Applications of HCI techniques to information retrieval needs in specific domains.
  • Ethnography and user studies relevant to information retrieval and access.
  • Scale and efficiency considerations for interactive information retrieval systems.
  • Relevance feedback and active learning approaches for information retrieval.

Important Dates
  • Aug 22 - Papers/abstracts due
  • Sep 12 - Decisions to authors
  • Oct 3 - Final copy due for printing
  • Oct 23 - Workshop date
Contributions will be peer-reviewed by two members of the program committee. For information on paper submission, see http://research.microsoft.com/~ryenw/hcir2008/submit.html or contact cua-hcir2008@cua.edu.


Workshop Organization

Workshop chairs:
Program chair:
Program Committee:
Supporters

Wednesday, June 4, 2008

Idea Navigation

Last summer, my colleague Vladimir Zelevinsky worked with two interns, Robin Stewart (MIT) and Greg Scott (Tufts), on a novel approach to information exploration. They call it "idea navigation": the basic idea is to extract subject-verb-object triples from unstructured text, group them into hierarchies, and then expose them in a faceted search and browsing interface. I like to think of it as an exploratory search take on question answering.

We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.

Click on the frame below to see the presentation they delivered at CHI '08.



Idea Navigation: Structured Browsing for Unstructured Text

Monday, June 2, 2008

Clarification vs. Refinement

The other day, in between braving the Hulk and Spiderman rides at Endeca Discover '08, I was chatting with Peter Morville about one of my favorite pet peeves in faceted search implementations: the confounding of clarification and refinement. To my delight, he posted about it at findability.org today.

What is the difference? I think it's easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you're dropped somewhere at random, you're really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.

How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.

"Did you mean..." is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I'm glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.

Sunday, June 1, 2008

Your Input Really is Relevant!

For those who haven't been following the progress on the Wikipedia entry for "Relevance (Information Retrieval)", I'd like to thank Jon Elsas, Bob Carpenter, and Fernando Diaz for helping turn lead into gold.

Check out:
I'm proud of The Noisy Channel community for fixing one of the top two hits on Google for "relevance".