Redirecting to http://thenoisychannel.com...

Wednesday, July 2, 2008

A Call to Action

I sent the following open letter to the leading enterprise providers and industry analysts in the information access community. I am inspired by the recent efforts of researchers to bring industry events to major academic conferences. I'd like to see industry--particularly enterprise providers and industry analysts--return the favor, embracing these events to help bridge the gap between research and practice.

Dear friends in the information access community,

I am reaching out to you with this open letter because I believe we, the leading providers and analysts in the information access community, share a common goal of helping companies understand, evaluate, and differentiate the technologies in this space.

Frankly, I feel that we as a community can do much better at achieving this goal. In my experience talking with CTOs, CIOs, and other decision makers in enterprises, I've found that too many people fail to understand either the state of current technology or the processes they need to put in place to leverage that technology. Indeed, a recent AIIM report confirms what I already knew anecdotally--that there is a widespread failure in the enterprise to understand and derive value from information access.

In order to advance the state of knowledge, I propose that we engage an underutilized resource: the scholarly community of information retrieval and information science researchers. Not only has this community brought us many of the foundations of the technology we provide, but it has also developed a rigorous tradition of evaluation and peer review.

In addition, this community has been increasingly interested in connection with practitioners, as demonstrated by the industry days held at top-tier scholarly conferences, such as SIGIR, CIKM, and ECIR. I have participated in a few of these, and I was impressed with the quality of both the presenters and the attendees. Web search leaders, such as Google, Yahoo, and Microsoft, have embraced these events, as have smaller companies that specialize in search and related technologies, such as information extraction. Enterprise information access providers, however, have been largely absent at these events, as have industry analysts.

I suggest that we take at least the following steps to engage the scholarly community of information retrieval and information science researchers:
Collaborate with the organizers of academic conferences such as SIGIR, CIKM, and ECIR to promote participation of enterprise information access providers and analysts in conference industry days.

Participate in workshops that are particularly relevant to enterprise information access providers, such as the annual HCIR and exploratory search workshops.
The rigor and independence of the conferences and workshops makes them ideal as vendor-neutral forums. I hope that you all will join me in working to strengthen the connection between the commercial and scholarly communities, thus furthering everyone's understanding of the technology that drives our community forward.

Please contact me at dt@endeca.com or join in an open discussion at http://thenoisychannel.blogspot.com/2008/07/call-to-action.html if you are interested in participating in this effort.

Sincerely,
Daniel Tunkelang

Tuesday, July 1, 2008

Clarification before Refinement on Amazon

I just noticed today that a search on Amazon (e.g., this search for algorithms) does not provide the options to sort the results or to refine by anything other than category. Once you do select a category (e.g., books), you are given additional refinement options, as well as the ability to sort.

While I find this interface less than ideal (e.g. even if all of your search are in a single category, it still makes you select that category explicitly), I do commend them for recognizing the need to have users clarify before they refine. The implication--one we've been pursuing at Endeca--is that it is incumbent on the system to detect when its understanding of the user's intent is ambiguous enough to require a clarification dialogue.

Sunday, June 29, 2008

Back from ISSS Workshop

My apologies for the sparsity of posts lately; it's been a busy week!

I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:

The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.

We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.

One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:

We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.

I'll let folks know as more information is released from the workshop.

Tuesday, June 24, 2008

What is (not) Exploratory Search?

One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.

In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).

But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.

Should we conclude then that exploratory search is, in fact, a fringe use case?

According to Ryen White, Gary Marchionini, and Gheorghe Muresan:

Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).

If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.

Let me offer the following characterization of non-exploratory search:

You know exactly what you want.
You know exactly how to ask for it.
You expect a search query to yield one of two responses:
- Success: you are presented with the object of your search.
- Failure: you learn that the object of your search is unavailable.

If any of these assumptions fails to hold, then the search problem is, to some extent, exploratory.

There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.

Friday, June 20, 2008

Enterprise Search Done Right

A recent study from AIIM (the Association for Information and Image Management, also known as the Enterprise Content Management Association) reports that enterprise search frustrates and disappoints users. Specifically, 49% of survey respondents “agreed” or “strongly agreed” that it is a difficult and time consuming process to find the information they need to do their job.

Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:

But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.

As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.

Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.

Tuesday, June 17, 2008

Information Retrieval Systems, 1896 - 1966

My colleague and Endeca co-founder Pete Bell just pointed me to a great post by Kevin Kelly about what may be the earliest implementation of a faceted navigation system. Like every good Endecan, I'm familiar with Ranganathan's struggle to sell the library world on colon classification. But it is still striking to see this struggle played out through technology artifacts from a pre-Internet world.

Monday, June 16, 2008

A Game to Evaluate Browsing Interfaces?

I've mused a fair amount about to apply the concept of the Phetch human computation game to evaluate browsing-based information retrieval interfaces. I'd love to be able to better evaluate faceted navigation and clustering approaches, relative to conventional search as well as relative to one another.

Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.

As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.

As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.

Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.

Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.

Thursday, June 12, 2008

Max Wilson's Blog

Max Wilson, a colleague of mine at the University of Southampton who has contributed frequently to the conversation here at the Noisy Channel, just started a blog of his own. Check out Max's blog here.

His post on exhibiting exploratory behaviour (that's the Queen's English to you!) raises an issue at the heart of many of our discussions on this blog: what is exploratory behavior? Is it clarification or refinement? Are users exploring in order to resolve imperfect communication with the information retrieval system, or are they exploring in order to learn?

These are burning questions, and I look forward to learning more about how Max, m.c. schraefel, and others are addressing them.

Wednesday, June 11, 2008

How Google Measures Search Quality

Thanks to Jon Elsas for calling my attention to a great post at Datawocky today on how Google measures search quality, written by Anand Rajaraman based on his conversation with Google Director of Research Peter Norvig.

The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.

I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.

More questions for Amit. :)

Tuesday, June 10, 2008

Seeking Opinions about Information Seeking

In a couple of weeks, I'll be participating in an invitational workshop sponsored by the National Science Foundation on Information Seeking Support Systems at the University of North Carolina - Chapel Hill. The participants are an impressive bunch--I feel like I'm the only person attending whom I've never heard of!

So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.

I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.

Sunday, June 8, 2008

Exploratory search is relevant too!

After seeing what the Noisy channel readership has done to improve the HCIR and Relevance Wikipedia entries, I was thinking we might take on one or two more. Specifically, the Exploratory Search and Exploratory Search Systems entries are, quite frankly, in sad shape.

Between the readership here, the folks involved in HCIR '08, and the participants in the IS3 workshop, I would think we have more than enough expertise in exploratory search to fix these up.

Any volunteers? For those of you who are doing research in exploratory search, consider that those two Wikipedia pages are the top hits returned when people search for exploratory search on Google.

Thursday, June 5, 2008

HCIR '08

It's my pleasure to announce...

HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval
October 23, 2008
Redmond, Washington, USA
http://research.microsoft.com/~ryenw/hcir2008

About this Workshop
As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.

In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the HCIR 2007 workshop, co-hosted by MIT and Endeca, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.

This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.

Keynote speaker: Susan Dumais, Microsoft Research

Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.

Possible topics include, but are not limited to:

Novel interaction techniques for information retrieval.
Modeling and evaluation of interactive information retrieval.
Exploratory search and information discovery.
Information visualization and visual analytics.
Applications of HCI techniques to information retrieval needs in specific domains.
Ethnography and user studies relevant to information retrieval and access.
Scale and efficiency considerations for interactive information retrieval systems.
Relevance feedback and active learning approaches for information retrieval.

Important Dates

Aug 22 - Papers/abstracts due
Sep 12 - Decisions to authors
Oct 3 - Final copy due for printing
Oct 23 - Workshop date

Contributions will be peer-reviewed by two members of the program committee. For information on paper submission, see http://research.microsoft.com/~ryenw/hcir2008/submit.html or contact cua-hcir2008@cua.edu.

Workshop Organization

Workshop chairs:

Daniel Tunkelang, Endeca
Ryen White, Microsoft Research

Program chair:

Bill Kules, Catholic University of America

Program Committee:

James Allan, University of Massachusetts, USA
Peter Anick, Yahoo!, USA
Peter Bailey, Live Search, USA
Peter Brusilovsky, University of Pittsburgh, USA
Pia Borlund, Royal School of Library and Information Science, Denmark
Robert Capra, University of North Carolina at Chapel Hill, USA
Ed Chi, Palo Alto Research Center (PARC), USA
Ed Cutrell, Microsoft Research, USA
Ed Fox, Virginia Tech, USA
Gene Golovchinsky, FX Palo Alto Laboratory, USA
Marti Hearst, University of California at Berkeley, USA
Jim Jansen, Pennsylvania State University, USA
Diane Kelly, University of North Carolina at Chapel Hill, USA
Gary Marchionini, University of North Carolina at Chapel Hill, USA
Merrie Morris, Microsoft Research, USA
Jeremy Pickens, FX Palo Alto Laboratory, USA
Yan Qu, University of Maryland at College Park, USA
Amanda Spink, Queensland University of Technology, Australia
Elaine Toms, Dalhousie University, Canada
Martin Wattenberg, IBM Research, USA
Ross Wilkinson, CSIRO, Australia

Supporters

Wednesday, June 4, 2008

Idea Navigation

Last summer, my colleague Vladimir Zelevinsky worked with two interns, Robin Stewart (MIT) and Greg Scott (Tufts), on a novel approach to information exploration. They call it "idea navigation": the basic idea is to extract subject-verb-object triples from unstructured text, group them into hierarchies, and then expose them in a faceted search and browsing interface. I like to think of it as an exploratory search take on question answering.

We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.

Click on the frame below to see the presentation they delivered at CHI '08.

Idea Navigation: Structured Browsing for Unstructured Text

Monday, June 2, 2008

Clarification vs. Refinement

The other day, in between braving the Hulk and Spiderman rides at Endeca Discover '08, I was chatting with Peter Morville about one of my favorite pet peeves in faceted search implementations: the confounding of clarification and refinement. To my delight, he posted about it at findability.org today.

What is the difference? I think it's easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you're dropped somewhere at random, you're really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.

How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.

"Did you mean..." is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I'm glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.

Sunday, June 1, 2008

Your Input Really is Relevant!

For those who haven't been following the progress on the Wikipedia entry for "Relevance (Information Retrieval)", I'd like to thank Jon Elsas, Bob Carpenter, and Fernando Diaz for helping turn lead into gold.

Check out:

I'm proud of The Noisy Channel community for fixing one of the top two hits on Google for "relevance".

Friday, May 30, 2008

Is Search Broken?

Last night, I had the privilege of speaking to fellow CMU School of Computer Science alumni at Fidelity's Center for Advanced Technology in Boston. Dean Randy Bryant, Associate Director of Corporate Relations Dan Jenkins, and Director of Alumni Relations Tina Carr, organized the event, and they encouraged me to pick a provocative subject.

Thus encouraged, I decided to ask the question: Is Search Broken?

Slides are here as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.

| View | Upload your own

Wednesday, May 28, 2008

Another HCIR Game

I just received an announcement from the SIG-IRList about the flickling challenge, a "game" designed around known-item image retrieval from Flickr. The user is given an image (not annotated) and the goal is to find the image again from Flickr using the system.

I'm not sure how well it will catch on with casual gamers--but that is hardly its primary motivation. Rather, the challenge was designed to help provide a foundation for evaluating interactive information retrieval--in a cross-language setting, no less. Details available at the iCLEF 2008 site or in this paper.

I'm thrilled to see efforts like these emerging to evaluate interactive retrieval--indeed, this feels like a solitaire version of Phetch.

Tuesday, May 27, 2008

The Magic Shelf

I generally shy away from pimping Endeca's customers here at The Noisy Channel, but occasionally I have to make an exception. As some of you may remember, Borders made a deal several years ago to have Amazon operate their web site. Last year, they decided to reclaim their site. And today they are live, powered by Endeca! For more details, visit http://blog.endeca.com.

Now back to our commercial-free programming...

Monday, May 26, 2008

Your Input is Relevant!

The following is a public service announcement.

As some of you may know, I am the primary author of the Human Computer Information Retrieval entry on Wikipedia. I created this entry last November, shortly after the HCIR '07 workshop. One of the ideas we've tossed around for HCIR '08 is to collaboratively edit the page. But why wait? With apologies to Isaac Asimov, I/you/we are Wikipedia, so let's improve the entry now!

And, while you've got Wikipedia on the brain, please take a look at the Relevance (Information Retrieval) entry. After an unsuccessful attempt to have this entry folded into the main Information Retrieval entry, I've tried to rewrite it to conform to what I perceive as Wikipedia's standards of quality and non-partisanship. While I tried my best, I'm sure there's still room for improving it, and I suspect that some of you reading this are among the best qualified folks to do so!

As Lawrence Lessig says, it's a read-write society. So readers, please help out a bit with the writing.

Saturday, May 24, 2008

Games With an HCIR Purpose?

A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,

Here is a brief explanation from the site:

When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.

Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.

I've been interested in Von Ahn's work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site:

Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.

One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.

If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.

A few important details that this description leaves out:

The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
A Seeker loses points (I can't recall how many) for wrong guesses.
The game has a time limit (hence the "Quick!").

Now, let's unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.

A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.

Assuming these simplifications, here is how a Seeker plays Phetch:

Read the description provided by the Describer and uses it to compose a search.
Scan the results sequentially, interrupting either to make a guess or to reformulate the search.

The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.

Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR '07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin's grand challenge.

Subscribe to: Posts (Atom)

Wednesday, July 2, 2008

A Call to Action

Dear friends in the information access community,

I am reaching out to you with this open letter because I believe we, the leading providers and analysts in the information access community, share a common goal of helping companies understand, evaluate, and differentiate the technologies in this space.

Frankly, I feel that we as a community can do much better at achieving this goal. In my experience talking with CTOs, CIOs, and other decision makers in enterprises, I've found that too many people fail to understand either the state of current technology or the processes they need to put in place to leverage that technology. Indeed, a recent AIIM report confirms what I already knew anecdotally--that there is a widespread failure in the enterprise to understand and derive value from information access.

In order to advance the state of knowledge, I propose that we engage an underutilized resource: the scholarly community of information retrieval and information science researchers. Not only has this community brought us many of the foundations of the technology we provide, but it has also developed a rigorous tradition of evaluation and peer review.

In addition, this community has been increasingly interested in connection with practitioners, as demonstrated by the industry days held at top-tier scholarly conferences, such as SIGIR, CIKM, and ECIR. I have participated in a few of these, and I was impressed with the quality of both the presenters and the attendees. Web search leaders, such as Google, Yahoo, and Microsoft, have embraced these events, as have smaller companies that specialize in search and related technologies, such as information extraction. Enterprise information access providers, however, have been largely absent at these events, as have industry analysts.

I suggest that we take at least the following steps to engage the scholarly community of information retrieval and information science researchers:
Collaborate with the organizers of academic conferences such as SIGIR, CIKM, and ECIR to promote participation of enterprise information access providers and analysts in conference industry days.

Participate in workshops that are particularly relevant to enterprise information access providers, such as the annual HCIR and exploratory search workshops.
The rigor and independence of the conferences and workshops makes them ideal as vendor-neutral forums. I hope that you all will join me in working to strengthen the connection between the commercial and scholarly communities, thus furthering everyone's understanding of the technology that drives our community forward.

Please contact me at dt@endeca.com or join in an open discussion at http://thenoisychannel.blogspot.com/2008/07/call-to-action.html if you are interested in participating in this effort.

Sincerely,
Daniel Tunkelang

Tuesday, July 1, 2008

Clarification before Refinement on Amazon

Sunday, June 29, 2008

Back from ISSS Workshop

The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.

We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.

I'll let folks know as more information is released from the workshop.

Tuesday, June 24, 2008

What is (not) Exploratory Search?

Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).

You know exactly what you want.
You know exactly how to ask for it.
You expect a search query to yield one of two responses:
- Success: you are presented with the object of your search.
- Failure: you learn that the object of your search is unavailable.

Friday, June 20, 2008

Enterprise Search Done Right

But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.

Tuesday, June 17, 2008

Information Retrieval Systems, 1896 - 1966

Monday, June 16, 2008

A Game to Evaluate Browsing Interfaces?

Thursday, June 12, 2008

Max Wilson's Blog

Wednesday, June 11, 2008

How Google Measures Search Quality

Tuesday, June 10, 2008

Seeking Opinions about Information Seeking

Sunday, June 8, 2008

Exploratory search is relevant too!

Thursday, June 5, 2008

HCIR '08

Novel interaction techniques for information retrieval.
Modeling and evaluation of interactive information retrieval.
Exploratory search and information discovery.
Information visualization and visual analytics.
Applications of HCI techniques to information retrieval needs in specific domains.
Ethnography and user studies relevant to information retrieval and access.
Scale and efficiency considerations for interactive information retrieval systems.
Relevance feedback and active learning approaches for information retrieval.

Important Dates

Aug 22 - Papers/abstracts due
Sep 12 - Decisions to authors
Oct 3 - Final copy due for printing
Oct 23 - Workshop date

Daniel Tunkelang, Endeca
Ryen White, Microsoft Research

Program chair:

Bill Kules, Catholic University of America

Program Committee:

James Allan, University of Massachusetts, USA
Peter Anick, Yahoo!, USA
Peter Bailey, Live Search, USA
Peter Brusilovsky, University of Pittsburgh, USA
Pia Borlund, Royal School of Library and Information Science, Denmark
Robert Capra, University of North Carolina at Chapel Hill, USA
Ed Chi, Palo Alto Research Center (PARC), USA
Ed Cutrell, Microsoft Research, USA
Ed Fox, Virginia Tech, USA
Gene Golovchinsky, FX Palo Alto Laboratory, USA
Marti Hearst, University of California at Berkeley, USA
Jim Jansen, Pennsylvania State University, USA
Diane Kelly, University of North Carolina at Chapel Hill, USA
Gary Marchionini, University of North Carolina at Chapel Hill, USA
Merrie Morris, Microsoft Research, USA
Jeremy Pickens, FX Palo Alto Laboratory, USA
Yan Qu, University of Maryland at College Park, USA
Amanda Spink, Queensland University of Technology, Australia
Elaine Toms, Dalhousie University, Canada
Martin Wattenberg, IBM Research, USA
Ross Wilkinson, CSIRO, Australia

Supporters

Wednesday, June 4, 2008

Idea Navigation

Idea Navigation: Structured Browsing for Unstructured Text

Monday, June 2, 2008

Clarification vs. Refinement

Sunday, June 1, 2008

Your Input Really is Relevant!

I'm proud of The Noisy Channel community for fixing one of the top two hits on Google for "relevance".

Friday, May 30, 2008

Is Search Broken?

Thus encouraged, I decided to ask the question: Is Search Broken?

Slides are here as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.

| View | Upload your own

Wednesday, May 28, 2008

Another HCIR Game

Tuesday, May 27, 2008

The Magic Shelf

Monday, May 26, 2008

Your Input is Relevant!

Saturday, May 24, 2008

Games With an HCIR Purpose?

A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,

Here is a brief explanation from the site:

When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.

Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.

One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.

If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.

A few important details that this description leaves out:

The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
A Seeker loses points (I can't recall how many) for wrong guesses.
The game has a time limit (hence the "Quick!").

Assuming these simplifications, here is how a Seeker plays Phetch:

Read the description provided by the Describer and uses it to compose a search.
Scan the results sequentially, interrupting either to make a guess or to reformulate the search.

The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.

Wednesday, July 2, 2008

A Call to Action

Dear friends in the information access community,

I am reaching out to you with this open letter because I believe we, the leading providers and analysts in the information access community, share a common goal of helping companies understand, evaluate, and differentiate the technologies in this space.

Frankly, I feel that we as a community can do much better at achieving this goal. In my experience talking with CTOs, CIOs, and other decision makers in enterprises, I've found that too many people fail to understand either the state of current technology or the processes they need to put in place to leverage that technology. Indeed, a recent AIIM report confirms what I already knew anecdotally--that there is a widespread failure in the enterprise to understand and derive value from information access.

In order to advance the state of knowledge, I propose that we engage an underutilized resource: the scholarly community of information retrieval and information science researchers. Not only has this community brought us many of the foundations of the technology we provide, but it has also developed a rigorous tradition of evaluation and peer review.

In addition, this community has been increasingly interested in connection with practitioners, as demonstrated by the industry days held at top-tier scholarly conferences, such as SIGIR, CIKM, and ECIR. I have participated in a few of these, and I was impressed with the quality of both the presenters and the attendees. Web search leaders, such as Google, Yahoo, and Microsoft, have embraced these events, as have smaller companies that specialize in search and related technologies, such as information extraction. Enterprise information access providers, however, have been largely absent at these events, as have industry analysts.

I suggest that we take at least the following steps to engage the scholarly community of information retrieval and information science researchers:
Collaborate with the organizers of academic conferences such as SIGIR, CIKM, and ECIR to promote participation of enterprise information access providers and analysts in conference industry days.

Participate in workshops that are particularly relevant to enterprise information access providers, such as the annual HCIR and exploratory search workshops.
The rigor and independence of the conferences and workshops makes them ideal as vendor-neutral forums. I hope that you all will join me in working to strengthen the connection between the commercial and scholarly communities, thus furthering everyone's understanding of the technology that drives our community forward.

Please contact me at dt@endeca.com or join in an open discussion at http://thenoisychannel.blogspot.com/2008/07/call-to-action.html if you are interested in participating in this effort.

Sincerely,
Daniel Tunkelang

Tuesday, July 1, 2008

Clarification before Refinement on Amazon

Sunday, June 29, 2008

Back from ISSS Workshop

The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.

We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.

I'll let folks know as more information is released from the workshop.

Tuesday, June 24, 2008

What is (not) Exploratory Search?

Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).

You know exactly what you want.
You know exactly how to ask for it.
You expect a search query to yield one of two responses:
- Success: you are presented with the object of your search.
- Failure: you learn that the object of your search is unavailable.

Friday, June 20, 2008

Enterprise Search Done Right

But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.

Tuesday, June 17, 2008

Information Retrieval Systems, 1896 - 1966

Monday, June 16, 2008

A Game to Evaluate Browsing Interfaces?

Thursday, June 12, 2008

Max Wilson's Blog

Wednesday, June 11, 2008

How Google Measures Search Quality

Tuesday, June 10, 2008

Seeking Opinions about Information Seeking

Sunday, June 8, 2008

Exploratory search is relevant too!

Thursday, June 5, 2008

HCIR '08

Novel interaction techniques for information retrieval.
Modeling and evaluation of interactive information retrieval.
Exploratory search and information discovery.
Information visualization and visual analytics.
Applications of HCI techniques to information retrieval needs in specific domains.
Ethnography and user studies relevant to information retrieval and access.
Scale and efficiency considerations for interactive information retrieval systems.
Relevance feedback and active learning approaches for information retrieval.

Important Dates

Aug 22 - Papers/abstracts due
Sep 12 - Decisions to authors
Oct 3 - Final copy due for printing
Oct 23 - Workshop date

Daniel Tunkelang, Endeca
Ryen White, Microsoft Research

Program chair:

Bill Kules, Catholic University of America

Program Committee:

James Allan, University of Massachusetts, USA
Peter Anick, Yahoo!, USA
Peter Bailey, Live Search, USA
Peter Brusilovsky, University of Pittsburgh, USA
Pia Borlund, Royal School of Library and Information Science, Denmark
Robert Capra, University of North Carolina at Chapel Hill, USA
Ed Chi, Palo Alto Research Center (PARC), USA
Ed Cutrell, Microsoft Research, USA
Ed Fox, Virginia Tech, USA
Gene Golovchinsky, FX Palo Alto Laboratory, USA
Marti Hearst, University of California at Berkeley, USA
Jim Jansen, Pennsylvania State University, USA
Diane Kelly, University of North Carolina at Chapel Hill, USA
Gary Marchionini, University of North Carolina at Chapel Hill, USA
Merrie Morris, Microsoft Research, USA
Jeremy Pickens, FX Palo Alto Laboratory, USA
Yan Qu, University of Maryland at College Park, USA
Amanda Spink, Queensland University of Technology, Australia
Elaine Toms, Dalhousie University, Canada
Martin Wattenberg, IBM Research, USA
Ross Wilkinson, CSIRO, Australia

Supporters

Wednesday, June 4, 2008

Idea Navigation

Idea Navigation: Structured Browsing for Unstructured Text

Monday, June 2, 2008

Clarification vs. Refinement

Sunday, June 1, 2008

Your Input Really is Relevant!

I'm proud of The Noisy Channel community for fixing one of the top two hits on Google for "relevance".

Friday, May 30, 2008

Is Search Broken?

Thus encouraged, I decided to ask the question: Is Search Broken?

Slides are here as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.

| View | Upload your own

Wednesday, May 28, 2008

Another HCIR Game

Tuesday, May 27, 2008

The Magic Shelf

Monday, May 26, 2008

Your Input is Relevant!

Saturday, May 24, 2008

Games With an HCIR Purpose?

A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,

Here is a brief explanation from the site:

When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.

Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.

One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.

If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.

A few important details that this description leaves out:

The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
A Seeker loses points (I can't recall how many) for wrong guesses.
The game has a time limit (hence the "Quick!").

Assuming these simplifications, here is how a Seeker plays Phetch:

Read the description provided by the Describer and uses it to compose a search.
Scan the results sequentially, interrupting either to make a guess or to reformulate the search.

The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.

Wednesday, July 2, 2008

Tuesday, July 1, 2008

Sunday, June 29, 2008

Tuesday, June 24, 2008

Friday, June 20, 2008

Tuesday, June 17, 2008

Monday, June 16, 2008

Thursday, June 12, 2008

Wednesday, June 11, 2008

Tuesday, June 10, 2008

Sunday, June 8, 2008

Thursday, June 5, 2008

Wednesday, June 4, 2008

Monday, June 2, 2008

Sunday, June 1, 2008

Friday, May 30, 2008

Wednesday, May 28, 2008

Tuesday, May 27, 2008

Monday, May 26, 2008

Saturday, May 24, 2008

Wednesday, July 2, 2008

Tuesday, July 1, 2008

Sunday, June 29, 2008

Tuesday, June 24, 2008

Friday, June 20, 2008

Tuesday, June 17, 2008

Monday, June 16, 2008

Thursday, June 12, 2008

Wednesday, June 11, 2008

Tuesday, June 10, 2008

Sunday, June 8, 2008

Thursday, June 5, 2008

Wednesday, June 4, 2008

Monday, June 2, 2008

Sunday, June 1, 2008

Friday, May 30, 2008

Wednesday, May 28, 2008

Tuesday, May 27, 2008

Monday, May 26, 2008

Saturday, May 24, 2008

Wednesday, July 2, 2008

Tuesday, July 1, 2008

Sunday, June 29, 2008

Tuesday, June 24, 2008

Friday, June 20, 2008

Tuesday, June 17, 2008

Monday, June 16, 2008

Thursday, June 12, 2008

Wednesday, June 11, 2008

Tuesday, June 10, 2008

Sunday, June 8, 2008

Thursday, June 5, 2008

Wednesday, June 4, 2008

Monday, June 2, 2008

Sunday, June 1, 2008

Friday, May 30, 2008

Wednesday, May 28, 2008

Tuesday, May 27, 2008

Monday, May 26, 2008

Saturday, May 24, 2008

Recent Comments

Make some noise!

Subscribe Now

HCIR '08

About Me

Blog Archive

What I'm Reading