Thursday, September 4, 2008

Query Elaboration as a Dialogue

I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?

The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.

Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.

That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones. 

But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...

6 comments:

jeremy said...

I completely agree with you about both things: (1) the need for query as a dialogue, and (2) the need for transparency in the algorithm itself, so as to make the dialogue possible.

And this makes a lot of sense when searching enterprise data.

But do you think we'll ever see something like this take off for the web as a whole? Maybe not from the big players, because they have too much money to make by keeping things closed. But there are efforts for open search for the web as a whole. Do you think those will ever take off?

Or will "open" not succeed in a web environment, because of gaming/manipulation/spam issues?

That's my question really: What happens when "open" meets "spam"?

Personally, I've always thought that a truly open, dialgogue system will be able to allow the user to easily filter for spam. It might take a little bit of extra work.. a 3% "effort tax" if you will. But what is your take on this issue?

Daniel Tunkelang said...

As I argued with Amit Singhal a few months ago, I do think relevance ultimately needs to be in the hands of users, not search engines acting as paternalistic gatekeepers. But I do concede it's a challenge to give users this power in a form that does not require unreasonable effort or expertise.

jeremy said...

Well, if they can't give that power to every user, there should at least be a switch, cookie, commandline option, or something that "turns on" the more powerful relevance iteration/feedback/HCI interface. It's one thing to solve it for all the grandmothers.. a hard problem. It's another thing to conclude that just because grandmothers don't get it, you won't make *anything* available to *any* user.

Yet they've still decided to take the "all or nothing" approach.. which is frankly surprising considering they have an attitude of "get it out there and then iterate". If they'd just get something out there, and working for 5% of the users, then they could iterate until they figured out how to make it work for the grandmothers, too.

Daniel Tunkelang said...

You're preaching to the converted. But I think it's not so much an unwillingness to experiment as a mindset: Google--and most search engine companies--see their primary directive as determining relevance for you. Relative to this mindset, allowing users to negotiate relevance isn't so much an enhancement as an abdication of responsibility. I violently disagree with this mindset, but I can appreciate it on its own terms.

jeremy said...

I still think that there is a difference between determining relevance for you, and allowing you enough input options so that you can tell them what you want, so that they can still determine relevance for you.

I have no problem with them applying as much algorithmic intelligence as possible to find patterns (and relevance) in the data that you likely would not have found on your own. What I object to is the paucity of options for specifying my own input to that algorithmic intelligence.

But maybe I'm saying the same thing you are, in a different way?

Daniel Tunkelang said...

I see the problem not as the paucity of input options as the lack of transparency in how that input is processed. Perhaps we are saying the same things. I think that transparency is a prerequisite for meaningful query expressiveness.

Thursday, September 4, 2008

Query Elaboration as a Dialogue

I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?

The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.

Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.

That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones. 

But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...

6 comments:

jeremy said...

I completely agree with you about both things: (1) the need for query as a dialogue, and (2) the need for transparency in the algorithm itself, so as to make the dialogue possible.

And this makes a lot of sense when searching enterprise data.

But do you think we'll ever see something like this take off for the web as a whole? Maybe not from the big players, because they have too much money to make by keeping things closed. But there are efforts for open search for the web as a whole. Do you think those will ever take off?

Or will "open" not succeed in a web environment, because of gaming/manipulation/spam issues?

That's my question really: What happens when "open" meets "spam"?

Personally, I've always thought that a truly open, dialgogue system will be able to allow the user to easily filter for spam. It might take a little bit of extra work.. a 3% "effort tax" if you will. But what is your take on this issue?

Daniel Tunkelang said...

As I argued with Amit Singhal a few months ago, I do think relevance ultimately needs to be in the hands of users, not search engines acting as paternalistic gatekeepers. But I do concede it's a challenge to give users this power in a form that does not require unreasonable effort or expertise.

jeremy said...

Well, if they can't give that power to every user, there should at least be a switch, cookie, commandline option, or something that "turns on" the more powerful relevance iteration/feedback/HCI interface. It's one thing to solve it for all the grandmothers.. a hard problem. It's another thing to conclude that just because grandmothers don't get it, you won't make *anything* available to *any* user.

Yet they've still decided to take the "all or nothing" approach.. which is frankly surprising considering they have an attitude of "get it out there and then iterate". If they'd just get something out there, and working for 5% of the users, then they could iterate until they figured out how to make it work for the grandmothers, too.

Daniel Tunkelang said...

You're preaching to the converted. But I think it's not so much an unwillingness to experiment as a mindset: Google--and most search engine companies--see their primary directive as determining relevance for you. Relative to this mindset, allowing users to negotiate relevance isn't so much an enhancement as an abdication of responsibility. I violently disagree with this mindset, but I can appreciate it on its own terms.

jeremy said...

I still think that there is a difference between determining relevance for you, and allowing you enough input options so that you can tell them what you want, so that they can still determine relevance for you.

I have no problem with them applying as much algorithmic intelligence as possible to find patterns (and relevance) in the data that you likely would not have found on your own. What I object to is the paucity of options for specifying my own input to that algorithmic intelligence.

But maybe I'm saying the same thing you are, in a different way?

Daniel Tunkelang said...

I see the problem not as the paucity of input options as the lack of transparency in how that input is processed. Perhaps we are saying the same things. I think that transparency is a prerequisite for meaningful query expressiveness.

Thursday, September 4, 2008

Query Elaboration as a Dialogue

I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?

The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.

Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.

That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones. 

But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...

6 comments:

jeremy said...

I completely agree with you about both things: (1) the need for query as a dialogue, and (2) the need for transparency in the algorithm itself, so as to make the dialogue possible.

And this makes a lot of sense when searching enterprise data.

But do you think we'll ever see something like this take off for the web as a whole? Maybe not from the big players, because they have too much money to make by keeping things closed. But there are efforts for open search for the web as a whole. Do you think those will ever take off?

Or will "open" not succeed in a web environment, because of gaming/manipulation/spam issues?

That's my question really: What happens when "open" meets "spam"?

Personally, I've always thought that a truly open, dialgogue system will be able to allow the user to easily filter for spam. It might take a little bit of extra work.. a 3% "effort tax" if you will. But what is your take on this issue?

Daniel Tunkelang said...

As I argued with Amit Singhal a few months ago, I do think relevance ultimately needs to be in the hands of users, not search engines acting as paternalistic gatekeepers. But I do concede it's a challenge to give users this power in a form that does not require unreasonable effort or expertise.

jeremy said...

Well, if they can't give that power to every user, there should at least be a switch, cookie, commandline option, or something that "turns on" the more powerful relevance iteration/feedback/HCI interface. It's one thing to solve it for all the grandmothers.. a hard problem. It's another thing to conclude that just because grandmothers don't get it, you won't make *anything* available to *any* user.

Yet they've still decided to take the "all or nothing" approach.. which is frankly surprising considering they have an attitude of "get it out there and then iterate". If they'd just get something out there, and working for 5% of the users, then they could iterate until they figured out how to make it work for the grandmothers, too.

Daniel Tunkelang said...

You're preaching to the converted. But I think it's not so much an unwillingness to experiment as a mindset: Google--and most search engine companies--see their primary directive as determining relevance for you. Relative to this mindset, allowing users to negotiate relevance isn't so much an enhancement as an abdication of responsibility. I violently disagree with this mindset, but I can appreciate it on its own terms.

jeremy said...

I still think that there is a difference between determining relevance for you, and allowing you enough input options so that you can tell them what you want, so that they can still determine relevance for you.

I have no problem with them applying as much algorithmic intelligence as possible to find patterns (and relevance) in the data that you likely would not have found on your own. What I object to is the paucity of options for specifying my own input to that algorithmic intelligence.

But maybe I'm saying the same thing you are, in a different way?

Daniel Tunkelang said...

I see the problem not as the paucity of input options as the lack of transparency in how that input is processed. Perhaps we are saying the same things. I think that transparency is a prerequisite for meaningful query expressiveness.