Showing posts with label transparency. Show all posts
Showing posts with label transparency. Show all posts

Monday, September 15, 2008

Information Accountability

The recent United Airlines stock fiasco triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:

    In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) included a link to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.



    For anyone who wants all of the gory details, Google's version of the story is here; the Tribune Company's version is here.

I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
  • The first was a piece in BBC News about a speech by Sir Tim Berners-Lee expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's Angels and Demons), and rumors that a vaccine given to children in Britain was harmful.

  • The second was a column in the New York Times about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."
I see a common thread here is that I'd like to call "information accountability." I don't mean this term in the sense of a recent CACM article about information privacy and sensitivity, but rather in a sense of information provenance and responsibility.

Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.

But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.

It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.

The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?

There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.

Thursday, September 4, 2008

Query Elaboration as a Dialogue

I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?

The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.

Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.

That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones. 

But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...

Monday, September 1, 2008

Quick Bites: E-Discovery and Transparency

One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.

I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.

I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.

Wednesday, August 27, 2008

Transparency in Information Retrieval

It's been hard to find time to write another post while keeping up with the comment stream on my previous post about set retrieval! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.

Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.

The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.

Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.

What does this have to do with set retrieval vs. ranked retrieval? Plenty!

Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.

The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.

In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.

But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.

Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.

If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.

But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?

To be continued...
Showing posts with label transparency. Show all posts
Showing posts with label transparency. Show all posts

Monday, September 15, 2008

Information Accountability

The recent United Airlines stock fiasco triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:

    In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) included a link to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.



    For anyone who wants all of the gory details, Google's version of the story is here; the Tribune Company's version is here.

I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
  • The first was a piece in BBC News about a speech by Sir Tim Berners-Lee expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's Angels and Demons), and rumors that a vaccine given to children in Britain was harmful.

  • The second was a column in the New York Times about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."
I see a common thread here is that I'd like to call "information accountability." I don't mean this term in the sense of a recent CACM article about information privacy and sensitivity, but rather in a sense of information provenance and responsibility.

Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.

But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.

It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.

The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?

There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.

Thursday, September 4, 2008

Query Elaboration as a Dialogue

I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?

The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.

Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.

That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones. 

But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...

Monday, September 1, 2008

Quick Bites: E-Discovery and Transparency

One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.

I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.

I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.

Wednesday, August 27, 2008

Transparency in Information Retrieval

It's been hard to find time to write another post while keeping up with the comment stream on my previous post about set retrieval! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.

Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.

The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.

Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.

What does this have to do with set retrieval vs. ranked retrieval? Plenty!

Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.

The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.

In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.

But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.

Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.

If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.

But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?

To be continued...
Showing posts with label transparency. Show all posts
Showing posts with label transparency. Show all posts

Monday, September 15, 2008

Information Accountability

The recent United Airlines stock fiasco triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:

    In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) included a link to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.



    For anyone who wants all of the gory details, Google's version of the story is here; the Tribune Company's version is here.

I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
  • The first was a piece in BBC News about a speech by Sir Tim Berners-Lee expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's Angels and Demons), and rumors that a vaccine given to children in Britain was harmful.

  • The second was a column in the New York Times about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."
I see a common thread here is that I'd like to call "information accountability." I don't mean this term in the sense of a recent CACM article about information privacy and sensitivity, but rather in a sense of information provenance and responsibility.

Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.

But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.

It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.

The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?

There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.

Thursday, September 4, 2008

Query Elaboration as a Dialogue

I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?

The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.

Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.

That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones. 

But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...

Monday, September 1, 2008

Quick Bites: E-Discovery and Transparency

One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.

I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.

I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.

Wednesday, August 27, 2008

Transparency in Information Retrieval

It's been hard to find time to write another post while keeping up with the comment stream on my previous post about set retrieval! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.

Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.

The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.

Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.

What does this have to do with set retrieval vs. ranked retrieval? Plenty!

Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.

The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.

In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.

But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.

Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.

If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.

But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?

To be continued...