Showing posts with label Search. Show all posts
Showing posts with label Search. Show all posts
Tuesday, September 16, 2008
Quick Bites: Search Evaluation at Google
Original post is here; Jeff's commentary is here. Not surprisingly, my reaction is that Google should consider a richer notion of "results" than an ordering of matching pages, perhaps a faceted approach that reflects the "several dimensions to 'good' results."
Sunday, September 14, 2008
Is Blog Search Different?
Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.
The position paper suggests focusing on 3 three kinds of search tasks:
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
The position paper suggests focusing on 3 three kinds of search tasks:
- Find out what are people thinking or feeling about X over time.
- Find good blogs/authors to read.
- Find useful information that was published in blogs sometime in the past.
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
Sunday, September 7, 2008
Quick Bites: Is Search Really 90% Solved?
Props to Michael Arrington for calling out this snippet in an interview with Marissa Mayer, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:
Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.I agree with Michael that search isn't even close to being solved yet. I've criticized the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of open problems in search for those ambitious enough to tackle them.
Quick Bites: Applying Turing's Ideas to Search
A colleague of mine at Endeca recently pointed me to a post by John Ferrara at Boxes and Arrows entitled Applying Turing's Ideas to Search.
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
If the user has not phrased her search clearly enough for another person to understand what she’s trying to find, then it’s not reasonable to expect that a comparatively "dumb" machine could do better. In a Turing test, the response to a question incomprehensible even to humans would prove nothing, because it wouldn’t provide any distinction between person and machine.While I'm not convinced that search engine designers should be aspiring to pass the Turing test, I agree wholeheartedly with the vision John puts forward:
It describes an ideal form of human-computer interaction in which people express their information needs in their own words, and the system understands and responds to their requests as another human being would. During my usability test, it became clear that this was the very standard to which my test participants held search engines.It's not about the search engine convincing the user that another human being is producing the answers, but rather engaging users in a conversation that helps them articulate and elaborate their information needs. Or, as we like to call it around here, HCIR.
Thursday, September 4, 2008
Query Elaboration as a Dialogue
I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?
The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.
Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.
That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones.
But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...
Monday, September 1, 2008
Quick Bites: E-Discovery and Transparency
One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
Labels:
e-Discovery,
Relevance,
Search,
transparency
Wednesday, August 27, 2008
Transparency in Information Retrieval
It's been hard to find time to write another post while keeping up with the comment stream on my previous post about set retrieval! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.
Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.
The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.
Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.
What does this have to do with set retrieval vs. ranked retrieval? Plenty!
Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.
The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.
In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.
But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.
Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.
If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.
But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?
To be continued...
Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.
The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.
Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.
What does this have to do with set retrieval vs. ranked retrieval? Plenty!
Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.
The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.
In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.
But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.
Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.
If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.
But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?
To be continued...
Sunday, August 24, 2008
Set Retrieval vs. Ranked Retrieval
After last week's post about a racially targeted web search engine, you'd think I'd avoid controversy for a while. To the contrary, I now feel bold enough like to bring up what I have found to be my most controversial position within the information retrieval community: my preference for set retrieval over ranked retrieval.
This will be the first of several posts along this theme, so I'll start by introducing the terms.
What is set retrieval in practice? In my view, a set retrieval approach satisfies two expectations:
This will be the first of several posts along this theme, so I'll start by introducing the terms.
- In a ranked retrieval approach, the system responds to a search query by ranking all documents in the corpus based on its estimate of their relevance to the query.
- In a set retrieval approach, the system partitions the corpus into two subsets of documents: those it considers relevant to the search query, and those it does not.
What is set retrieval in practice? In my view, a set retrieval approach satisfies two expectations:
- The number of documents reported to match my search should be meaningful--or at least should be a meaningful estimate. More generally, any summary information reported about this set should be useful.
- Displaying a random subset of the set of matching documents to the user should be a plausible behavior, even if it is not as good as displaying the top-ranked matches. In other words, relevance ranking should help distinguish more relevant results from less relevant results, rather than distinguishing relevant results from irrelevant results.
Labels:
Information Retrieval,
Relevance,
Search
Saturday, August 16, 2008
Thinking Outside the Black Box
I was reading Techmeme today, and I noticed an LA Times article about RushmoreDrive, described on its About Us page as "a first-of-its-kind search engine for the Black community." My first reaction, blogged by others already, was that this idea was dumb and racist. In fact, it took some work to find positive commentary about RushmoreDrive.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
Labels:
exploratory search,
Google,
Relevance,
Search
Wednesday, August 13, 2008
David Huynh's Freebase Parallax
One of the perks of working in HCIR is that you get to meet some of the coolest people in academic and industrial research. I met David Huynh a few years ago, while he was a graduate student at MIT, working in the Haystack group and on the Simile project. You've probably seen some of his work: his Timeline project has been deployed all over the web.
Despite efforts by me and other to persuade David to stay in the Northeast, he went out west a few months ago to join Metaweb, a company with ambitions "to build a better infrastructure for the Web." While I (and others) am not persuaded by Freebase, Metaweb's "open database of the world’s information," I am happy to see that David is still doing great work.
I encourage you to check out David's latest project: Freebase Parallax. In it, he does something I've never seen outside Endeca (excepting David's earlier work on a Nested Faceted Browser) he allows you to navigate using the facets of multiple entity types, joining between sets of entities through their relationships. At Endeca, we call this "record relationship navigation"--we presented it at HCIR '07, showing an how it can enable social navigation.
David includes a video where he eloquently demonstrates how Parallax works, and the interface is quite compelling. I'm not sure how well it scales with large data sets, but David's focus has been on interfaces rather than systems. My biggest complaint--which isn't David's fault--is that the Freebase content is a bit sparse. But his interface strikes me as a great fit for exploratory search.
Despite efforts by me and other to persuade David to stay in the Northeast, he went out west a few months ago to join Metaweb, a company with ambitions "to build a better infrastructure for the Web." While I (and others) am not persuaded by Freebase, Metaweb's "open database of the world’s information," I am happy to see that David is still doing great work.
I encourage you to check out David's latest project: Freebase Parallax. In it, he does something I've never seen outside Endeca (excepting David's earlier work on a Nested Faceted Browser) he allows you to navigate using the facets of multiple entity types, joining between sets of entities through their relationships. At Endeca, we call this "record relationship navigation"--we presented it at HCIR '07, showing an how it can enable social navigation.
David includes a video where he eloquently demonstrates how Parallax works, and the interface is quite compelling. I'm not sure how well it scales with large data sets, but David's focus has been on interfaces rather than systems. My biggest complaint--which isn't David's fault--is that the Freebase content is a bit sparse. But his interface strikes me as a great fit for exploratory search.
Conversation with Seth Grimes
I had an great conversation with Intelligent Enterprise columnist Seth Grimes today. Apparently there's an upside to writing critical commentary on Google's aspirations in the enterprise!
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
Sunday, August 10, 2008
Why Enterprise Search Will Never Be Google-y
As I prepared to end my trilogy of Google-themed posts, I ran into two recently published items. They provide an excellent context for what I intended to talk about: the challenges and opportunities of enterprise search.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
Enterprise search and web search are fundamentally different animals, and I'd argue that enterprise search won't--and shouldn't--be Google-y any time soon....Like web search, Google's enterprise search is easy to use--if you're willing to go along with how Google's algorithms view and present your business information....Ironically, enterprises, with all of their highly structures and carefully organized silos of information, require a very different and paradoxically more complex approach.I highly recommend you read the whole article (it's only 2 pages), not only because it informative and well written, but also because the author isn't working for one of the Big Three.
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
Labels:
Enterprise Search,
exploratory search,
Google,
Search
Thursday, August 7, 2008
Where Google Isn't Good Enough
My last post, Is Google Good Enough?, challenged would-be Google killers to identify and address clear consumer needs for which Google isn't good enough as a solution. I like helping my readers, so here are some ideas.
- Shopping. Google Product Search (fka Froogle) is not one of Google's crown jewels. At best, it works well when you know the exact name of the product you are looking for. But it pales in contrast to any modern ecommerce site, such as Amazon or Home Depot. What makes a shopping site successful? Put simply, it helps users find what they want, even when they didn't know exactly what they wanted when they started.
- Finding a job. Google has not thrown its hat into the ring of job search, and even the page they offer for finding jobs at Google could use some improvement. The two biggest job sites, Monster and Careerbuilder, succeed in terms of the number of jobs posted, but aren't exactly optimized for user experience. Dice does better, but only for technology jobs. Interestingly, the best job finding site may be LinkedIn--not because of their search implementation (which is adequate but not innovative), but because of their success in getting millions of professionals to provide high-quality data.
- Finding employees. Again, LinkedIn has probably come closest to providing a good employee finding site. The large job sites (all of which I've used at some point) not only fail to support exploratory search, but also suffer from a skew towards ineligible candidates and a nuisance of recruiters posing as job seekers. Here again, Google has not tried to compete.
- Planning a trip. Sure, you can use Expedia, Travelocity, or Kayak to find a flight, hotel, and car rental. But there's a lot of room for improvement when it comes to planning a trip, whether for business or pleasure. The existing tools do a poor job of putting together a coordinated itinerary (e.g., meals, activities), and also don't integrate with relevant information sources, such as local directories and reviews. This is another area where Google has not tried to play.
Tuesday, August 5, 2008
Is Google Good Enough?
As Chief Scientist of Endeca, I spend a lot of my time explaining to people why they should not be satisfied with an information seekin interface that only offers them keyword search as an input mechanism and a ranked list of results as output. I tell them about query clarification dialogs, faceted navigation, and set analysis. More broadly, I evangelize exploratory search and human computer information retrieval as critical to addressing the inherent weakness of conventional ranked retrieval. If you haven't heard me expound on the subject, feel free to check out this slide show on Is Search Broken?.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
Monday, July 28, 2008
Not as Cuil as I Expected
Today's big tech news is the launch of Cuil, the latest challenger to Google's hegemony in Web search. Given the impressive team of Xooglers that put it together, I had high expectations for the launch.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
Sunday, July 13, 2008
Small is Beautiful
Today's New York Times has an article by John Markoff called On a Small Screen, Just the Salient Stuff. It argues that the design constraints of the iPhone (and of mobile devices in general) lead to an improved user experience, since site designers do a better job of focusing on the information that users will find relevant.
Of course, on a blog entitled The Noisy Channel, I can't help praising approaches that strive to improve the signal-to-noise ratio in information seeking applications. And I'm glad to see them quoting Ben Shneiderman, a colleague of mine at the University of Maryland who has spent much of his career focusing on HCIR issues.
Still, I think they could have taken the idea much further. Their discussion of more efficient or ergonomic use of real estate boils down to stripping extraneous content (a good idea, but hardly novel), and making sites vertically oriented (i.e., no horizontal scrolling). They don't consider the question of what information is best to present in the limited space--which, in my mind, is the most important question to consider as we optimize interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice.
Perhaps I am asking too much to expect them to call out the extreme inefficiency of ranked lists, compared to summarization-oriented approaches. Certainly the mobile space opens great opportunities for someone to get this right on the web.
Of course, on a blog entitled The Noisy Channel, I can't help praising approaches that strive to improve the signal-to-noise ratio in information seeking applications. And I'm glad to see them quoting Ben Shneiderman, a colleague of mine at the University of Maryland who has spent much of his career focusing on HCIR issues.
Still, I think they could have taken the idea much further. Their discussion of more efficient or ergonomic use of real estate boils down to stripping extraneous content (a good idea, but hardly novel), and making sites vertically oriented (i.e., no horizontal scrolling). They don't consider the question of what information is best to present in the limited space--which, in my mind, is the most important question to consider as we optimize interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice.
Perhaps I am asking too much to expect them to call out the extreme inefficiency of ranked lists, compared to summarization-oriented approaches. Certainly the mobile space opens great opportunities for someone to get this right on the web.
Labels:
HCIR,
Information technology,
Search,
Usability
Sunday, June 29, 2008
Back from ISSS Workshop
My apologies for the sparsity of posts lately; it's been a busy week!
I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.
One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.I'll let folks know as more information is released from the workshop.
Tuesday, June 24, 2008
What is (not) Exploratory Search?
One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.
In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).
But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.
Should we conclude then that exploratory search is, in fact, a fringe use case?
According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Let me offer the following characterization of non-exploratory search:
There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.
In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).
But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.
Should we conclude then that exploratory search is, in fact, a fringe use case?
According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.
Let me offer the following characterization of non-exploratory search:
- You know exactly what you want.
- You know exactly how to ask for it.
- You expect a search query to yield one of two responses:
- Success: you are presented with the object of your search.
- Failure: you learn that the object of your search is unavailable.
There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.
Friday, June 20, 2008
Enterprise Search Done Right
A recent study from AIIM (the Association for Information and Image Management, also known as the Enterprise Content Management Association) reports that enterprise search frustrates and disappoints users. Specifically, 49% of survey respondents “agreed” or “strongly agreed” that it is a difficult and time consuming process to find the information they need to do their job.
Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.
Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.
Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.
Labels:
Endeca,
Enterprise Search,
Information technology,
Search
Monday, June 16, 2008
A Game to Evaluate Browsing Interfaces?
I've mused a fair amount about to apply the concept of the Phetch human computation game to evaluate browsing-based information retrieval interfaces. I'd love to be able to better evaluate faceted navigation and clustering approaches, relative to conventional search as well as relative to one another.
Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.
As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.
As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.
Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.
Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.
Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.
As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.
As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.
Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.
Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.
Labels:
Evaluation,
faceted navigation,
HCIR,
Search
Subscribe to:
Posts (Atom)
Showing posts with label Search. Show all posts
Showing posts with label Search. Show all posts
Tuesday, September 16, 2008
Quick Bites: Search Evaluation at Google
Sunday, September 14, 2008
Is Blog Search Different?
Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.
The position paper suggests focusing on 3 three kinds of search tasks:
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
The position paper suggests focusing on 3 three kinds of search tasks:
- Find out what are people thinking or feeling about X over time.
- Find good blogs/authors to read.
- Find useful information that was published in blogs sometime in the past.
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
Sunday, September 7, 2008
Quick Bites: Is Search Really 90% Solved?
Props to Michael Arrington for calling out this snippet in an interview with Marissa Mayer, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:
Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.I agree with Michael that search isn't even close to being solved yet. I've criticized the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of open problems in search for those ambitious enough to tackle them.
Quick Bites: Applying Turing's Ideas to Search
A colleague of mine at Endeca recently pointed me to a post by John Ferrara at Boxes and Arrows entitled Applying Turing's Ideas to Search.
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
If the user has not phrased her search clearly enough for another person to understand what she’s trying to find, then it’s not reasonable to expect that a comparatively "dumb" machine could do better. In a Turing test, the response to a question incomprehensible even to humans would prove nothing, because it wouldn’t provide any distinction between person and machine.While I'm not convinced that search engine designers should be aspiring to pass the Turing test, I agree wholeheartedly with the vision John puts forward:
It describes an ideal form of human-computer interaction in which people express their information needs in their own words, and the system understands and responds to their requests as another human being would. During my usability test, it became clear that this was the very standard to which my test participants held search engines.It's not about the search engine convincing the user that another human being is producing the answers, but rather engaging users in a conversation that helps them articulate and elaborate their information needs. Or, as we like to call it around here, HCIR.
Thursday, September 4, 2008
Query Elaboration as a Dialogue
I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?
The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.
Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.
That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones.
But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...
Monday, September 1, 2008
Quick Bites: E-Discovery and Transparency
One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
Labels:
e-Discovery,
Relevance,
Search,
transparency
Wednesday, August 27, 2008
Transparency in Information Retrieval
It's been hard to find time to write another post while keeping up with the comment stream on my previous post about set retrieval! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.
Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.
The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.
Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.
What does this have to do with set retrieval vs. ranked retrieval? Plenty!
Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.
The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.
In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.
But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.
Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.
If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.
But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?
To be continued...
Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.
The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.
Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.
What does this have to do with set retrieval vs. ranked retrieval? Plenty!
Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.
The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.
In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.
But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.
Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.
If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.
But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?
To be continued...
Sunday, August 24, 2008
Set Retrieval vs. Ranked Retrieval
After last week's post about a racially targeted web search engine, you'd think I'd avoid controversy for a while. To the contrary, I now feel bold enough like to bring up what I have found to be my most controversial position within the information retrieval community: my preference for set retrieval over ranked retrieval.
This will be the first of several posts along this theme, so I'll start by introducing the terms.
What is set retrieval in practice? In my view, a set retrieval approach satisfies two expectations:
This will be the first of several posts along this theme, so I'll start by introducing the terms.
- In a ranked retrieval approach, the system responds to a search query by ranking all documents in the corpus based on its estimate of their relevance to the query.
- In a set retrieval approach, the system partitions the corpus into two subsets of documents: those it considers relevant to the search query, and those it does not.
What is set retrieval in practice? In my view, a set retrieval approach satisfies two expectations:
- The number of documents reported to match my search should be meaningful--or at least should be a meaningful estimate. More generally, any summary information reported about this set should be useful.
- Displaying a random subset of the set of matching documents to the user should be a plausible behavior, even if it is not as good as displaying the top-ranked matches. In other words, relevance ranking should help distinguish more relevant results from less relevant results, rather than distinguishing relevant results from irrelevant results.
Labels:
Information Retrieval,
Relevance,
Search
Saturday, August 16, 2008
Thinking Outside the Black Box
I was reading Techmeme today, and I noticed an LA Times article about RushmoreDrive, described on its About Us page as "a first-of-its-kind search engine for the Black community." My first reaction, blogged by others already, was that this idea was dumb and racist. In fact, it took some work to find positive commentary about RushmoreDrive.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
Labels:
exploratory search,
Google,
Relevance,
Search
Wednesday, August 13, 2008
David Huynh's Freebase Parallax
One of the perks of working in HCIR is that you get to meet some of the coolest people in academic and industrial research. I met David Huynh a few years ago, while he was a graduate student at MIT, working in the Haystack group and on the Simile project. You've probably seen some of his work: his Timeline project has been deployed all over the web.
Despite efforts by me and other to persuade David to stay in the Northeast, he went out west a few months ago to join Metaweb, a company with ambitions "to build a better infrastructure for the Web." While I (and others) am not persuaded by Freebase, Metaweb's "open database of the world’s information," I am happy to see that David is still doing great work.
I encourage you to check out David's latest project: Freebase Parallax. In it, he does something I've never seen outside Endeca (excepting David's earlier work on a Nested Faceted Browser) he allows you to navigate using the facets of multiple entity types, joining between sets of entities through their relationships. At Endeca, we call this "record relationship navigation"--we presented it at HCIR '07, showing an how it can enable social navigation.
David includes a video where he eloquently demonstrates how Parallax works, and the interface is quite compelling. I'm not sure how well it scales with large data sets, but David's focus has been on interfaces rather than systems. My biggest complaint--which isn't David's fault--is that the Freebase content is a bit sparse. But his interface strikes me as a great fit for exploratory search.
Despite efforts by me and other to persuade David to stay in the Northeast, he went out west a few months ago to join Metaweb, a company with ambitions "to build a better infrastructure for the Web." While I (and others) am not persuaded by Freebase, Metaweb's "open database of the world’s information," I am happy to see that David is still doing great work.
I encourage you to check out David's latest project: Freebase Parallax. In it, he does something I've never seen outside Endeca (excepting David's earlier work on a Nested Faceted Browser) he allows you to navigate using the facets of multiple entity types, joining between sets of entities through their relationships. At Endeca, we call this "record relationship navigation"--we presented it at HCIR '07, showing an how it can enable social navigation.
David includes a video where he eloquently demonstrates how Parallax works, and the interface is quite compelling. I'm not sure how well it scales with large data sets, but David's focus has been on interfaces rather than systems. My biggest complaint--which isn't David's fault--is that the Freebase content is a bit sparse. But his interface strikes me as a great fit for exploratory search.
Conversation with Seth Grimes
I had an great conversation with Intelligent Enterprise columnist Seth Grimes today. Apparently there's an upside to writing critical commentary on Google's aspirations in the enterprise!
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
Sunday, August 10, 2008
Why Enterprise Search Will Never Be Google-y
As I prepared to end my trilogy of Google-themed posts, I ran into two recently published items. They provide an excellent context for what I intended to talk about: the challenges and opportunities of enterprise search.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
Enterprise search and web search are fundamentally different animals, and I'd argue that enterprise search won't--and shouldn't--be Google-y any time soon....Like web search, Google's enterprise search is easy to use--if you're willing to go along with how Google's algorithms view and present your business information....Ironically, enterprises, with all of their highly structures and carefully organized silos of information, require a very different and paradoxically more complex approach.I highly recommend you read the whole article (it's only 2 pages), not only because it informative and well written, but also because the author isn't working for one of the Big Three.
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
Labels:
Enterprise Search,
exploratory search,
Google,
Search
Thursday, August 7, 2008
Where Google Isn't Good Enough
My last post, Is Google Good Enough?, challenged would-be Google killers to identify and address clear consumer needs for which Google isn't good enough as a solution. I like helping my readers, so here are some ideas.
- Shopping. Google Product Search (fka Froogle) is not one of Google's crown jewels. At best, it works well when you know the exact name of the product you are looking for. But it pales in contrast to any modern ecommerce site, such as Amazon or Home Depot. What makes a shopping site successful? Put simply, it helps users find what they want, even when they didn't know exactly what they wanted when they started.
- Finding a job. Google has not thrown its hat into the ring of job search, and even the page they offer for finding jobs at Google could use some improvement. The two biggest job sites, Monster and Careerbuilder, succeed in terms of the number of jobs posted, but aren't exactly optimized for user experience. Dice does better, but only for technology jobs. Interestingly, the best job finding site may be LinkedIn--not because of their search implementation (which is adequate but not innovative), but because of their success in getting millions of professionals to provide high-quality data.
- Finding employees. Again, LinkedIn has probably come closest to providing a good employee finding site. The large job sites (all of which I've used at some point) not only fail to support exploratory search, but also suffer from a skew towards ineligible candidates and a nuisance of recruiters posing as job seekers. Here again, Google has not tried to compete.
- Planning a trip. Sure, you can use Expedia, Travelocity, or Kayak to find a flight, hotel, and car rental. But there's a lot of room for improvement when it comes to planning a trip, whether for business or pleasure. The existing tools do a poor job of putting together a coordinated itinerary (e.g., meals, activities), and also don't integrate with relevant information sources, such as local directories and reviews. This is another area where Google has not tried to play.
Tuesday, August 5, 2008
Is Google Good Enough?
As Chief Scientist of Endeca, I spend a lot of my time explaining to people why they should not be satisfied with an information seekin interface that only offers them keyword search as an input mechanism and a ranked list of results as output. I tell them about query clarification dialogs, faceted navigation, and set analysis. More broadly, I evangelize exploratory search and human computer information retrieval as critical to addressing the inherent weakness of conventional ranked retrieval. If you haven't heard me expound on the subject, feel free to check out this slide show on Is Search Broken?.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
Monday, July 28, 2008
Not as Cuil as I Expected
Today's big tech news is the launch of Cuil, the latest challenger to Google's hegemony in Web search. Given the impressive team of Xooglers that put it together, I had high expectations for the launch.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
Sunday, July 13, 2008
Small is Beautiful
Today's New York Times has an article by John Markoff called On a Small Screen, Just the Salient Stuff. It argues that the design constraints of the iPhone (and of mobile devices in general) lead to an improved user experience, since site designers do a better job of focusing on the information that users will find relevant.
Of course, on a blog entitled The Noisy Channel, I can't help praising approaches that strive to improve the signal-to-noise ratio in information seeking applications. And I'm glad to see them quoting Ben Shneiderman, a colleague of mine at the University of Maryland who has spent much of his career focusing on HCIR issues.
Still, I think they could have taken the idea much further. Their discussion of more efficient or ergonomic use of real estate boils down to stripping extraneous content (a good idea, but hardly novel), and making sites vertically oriented (i.e., no horizontal scrolling). They don't consider the question of what information is best to present in the limited space--which, in my mind, is the most important question to consider as we optimize interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice.
Perhaps I am asking too much to expect them to call out the extreme inefficiency of ranked lists, compared to summarization-oriented approaches. Certainly the mobile space opens great opportunities for someone to get this right on the web.
Of course, on a blog entitled The Noisy Channel, I can't help praising approaches that strive to improve the signal-to-noise ratio in information seeking applications. And I'm glad to see them quoting Ben Shneiderman, a colleague of mine at the University of Maryland who has spent much of his career focusing on HCIR issues.
Still, I think they could have taken the idea much further. Their discussion of more efficient or ergonomic use of real estate boils down to stripping extraneous content (a good idea, but hardly novel), and making sites vertically oriented (i.e., no horizontal scrolling). They don't consider the question of what information is best to present in the limited space--which, in my mind, is the most important question to consider as we optimize interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice.
Perhaps I am asking too much to expect them to call out the extreme inefficiency of ranked lists, compared to summarization-oriented approaches. Certainly the mobile space opens great opportunities for someone to get this right on the web.
Labels:
HCIR,
Information technology,
Search,
Usability
Sunday, June 29, 2008
Back from ISSS Workshop
My apologies for the sparsity of posts lately; it's been a busy week!
I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.
One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.I'll let folks know as more information is released from the workshop.
Tuesday, June 24, 2008
What is (not) Exploratory Search?
One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.
In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).
But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.
Should we conclude then that exploratory search is, in fact, a fringe use case?
According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Let me offer the following characterization of non-exploratory search:
There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.
In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).
But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.
Should we conclude then that exploratory search is, in fact, a fringe use case?
According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.
Let me offer the following characterization of non-exploratory search:
- You know exactly what you want.
- You know exactly how to ask for it.
- You expect a search query to yield one of two responses:
- Success: you are presented with the object of your search.
- Failure: you learn that the object of your search is unavailable.
There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.
Friday, June 20, 2008
Enterprise Search Done Right
A recent study from AIIM (the Association for Information and Image Management, also known as the Enterprise Content Management Association) reports that enterprise search frustrates and disappoints users. Specifically, 49% of survey respondents “agreed” or “strongly agreed” that it is a difficult and time consuming process to find the information they need to do their job.
Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.
Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.
Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.
Labels:
Endeca,
Enterprise Search,
Information technology,
Search
Monday, June 16, 2008
A Game to Evaluate Browsing Interfaces?
I've mused a fair amount about to apply the concept of the Phetch human computation game to evaluate browsing-based information retrieval interfaces. I'd love to be able to better evaluate faceted navigation and clustering approaches, relative to conventional search as well as relative to one another.
Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.
As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.
As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.
Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.
Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.
Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.
As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.
As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.
Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.
Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.
Labels:
Evaluation,
faceted navigation,
HCIR,
Search
Subscribe to:
Posts (Atom)
Showing posts with label Search. Show all posts
Showing posts with label Search. Show all posts
Tuesday, September 16, 2008
Quick Bites: Search Evaluation at Google
Sunday, September 14, 2008
Is Blog Search Different?
Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.
The position paper suggests focusing on 3 three kinds of search tasks:
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
The position paper suggests focusing on 3 three kinds of search tasks:
- Find out what are people thinking or feeling about X over time.
- Find good blogs/authors to read.
- Find useful information that was published in blogs sometime in the past.
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
Sunday, September 7, 2008
Quick Bites: Is Search Really 90% Solved?
Props to Michael Arrington for calling out this snippet in an interview with Marissa Mayer, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:
Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.I agree with Michael that search isn't even close to being solved yet. I've criticized the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of open problems in search for those ambitious enough to tackle them.
Quick Bites: Applying Turing's Ideas to Search
A colleague of mine at Endeca recently pointed me to a post by John Ferrara at Boxes and Arrows entitled Applying Turing's Ideas to Search.
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
If the user has not phrased her search clearly enough for another person to understand what she’s trying to find, then it’s not reasonable to expect that a comparatively "dumb" machine could do better. In a Turing test, the response to a question incomprehensible even to humans would prove nothing, because it wouldn’t provide any distinction between person and machine.While I'm not convinced that search engine designers should be aspiring to pass the Turing test, I agree wholeheartedly with the vision John puts forward:
It describes an ideal form of human-computer interaction in which people express their information needs in their own words, and the system understands and responds to their requests as another human being would. During my usability test, it became clear that this was the very standard to which my test participants held search engines.It's not about the search engine convincing the user that another human being is producing the answers, but rather engaging users in a conversation that helps them articulate and elaborate their information needs. Or, as we like to call it around here, HCIR.
Thursday, September 4, 2008
Query Elaboration as a Dialogue
I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?
The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.
Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.
That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones.
But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...
Monday, September 1, 2008
Quick Bites: E-Discovery and Transparency
One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
Labels:
e-Discovery,
Relevance,
Search,
transparency
Wednesday, August 27, 2008
Transparency in Information Retrieval
It's been hard to find time to write another post while keeping up with the comment stream on my previous post about set retrieval! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.
Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.
The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.
Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.
What does this have to do with set retrieval vs. ranked retrieval? Plenty!
Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.
The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.
In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.
But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.
Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.
If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.
But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?
To be continued...
Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an increasingly popular term these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.
The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.
Some of you might find this description too anthropomorphic. But a recent study reported that most users expect search engines to read their minds--never mind that the general case goes beyond AI-complete (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.
What does this have to do with set retrieval vs. ranked retrieval? Plenty!
Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.
The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.
In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, known-item search) , a state-of-the-art implementation of ranked retrieval yields results that are good enough.
But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be SIGIR regulars. At worst, they employ secret, proprietary models, either to protect their competitive differentiation or to thwart spammers.
Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.
If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to satisfice. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.
But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?
To be continued...
Sunday, August 24, 2008
Set Retrieval vs. Ranked Retrieval
After last week's post about a racially targeted web search engine, you'd think I'd avoid controversy for a while. To the contrary, I now feel bold enough like to bring up what I have found to be my most controversial position within the information retrieval community: my preference for set retrieval over ranked retrieval.
This will be the first of several posts along this theme, so I'll start by introducing the terms.
What is set retrieval in practice? In my view, a set retrieval approach satisfies two expectations:
This will be the first of several posts along this theme, so I'll start by introducing the terms.
- In a ranked retrieval approach, the system responds to a search query by ranking all documents in the corpus based on its estimate of their relevance to the query.
- In a set retrieval approach, the system partitions the corpus into two subsets of documents: those it considers relevant to the search query, and those it does not.
What is set retrieval in practice? In my view, a set retrieval approach satisfies two expectations:
- The number of documents reported to match my search should be meaningful--or at least should be a meaningful estimate. More generally, any summary information reported about this set should be useful.
- Displaying a random subset of the set of matching documents to the user should be a plausible behavior, even if it is not as good as displaying the top-ranked matches. In other words, relevance ranking should help distinguish more relevant results from less relevant results, rather than distinguishing relevant results from irrelevant results.
Labels:
Information Retrieval,
Relevance,
Search
Saturday, August 16, 2008
Thinking Outside the Black Box
I was reading Techmeme today, and I noticed an LA Times article about RushmoreDrive, described on its About Us page as "a first-of-its-kind search engine for the Black community." My first reaction, blogged by others already, was that this idea was dumb and racist. In fact, it took some work to find positive commentary about RushmoreDrive.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
Labels:
exploratory search,
Google,
Relevance,
Search
Wednesday, August 13, 2008
David Huynh's Freebase Parallax
One of the perks of working in HCIR is that you get to meet some of the coolest people in academic and industrial research. I met David Huynh a few years ago, while he was a graduate student at MIT, working in the Haystack group and on the Simile project. You've probably seen some of his work: his Timeline project has been deployed all over the web.
Despite efforts by me and other to persuade David to stay in the Northeast, he went out west a few months ago to join Metaweb, a company with ambitions "to build a better infrastructure for the Web." While I (and others) am not persuaded by Freebase, Metaweb's "open database of the world’s information," I am happy to see that David is still doing great work.
I encourage you to check out David's latest project: Freebase Parallax. In it, he does something I've never seen outside Endeca (excepting David's earlier work on a Nested Faceted Browser) he allows you to navigate using the facets of multiple entity types, joining between sets of entities through their relationships. At Endeca, we call this "record relationship navigation"--we presented it at HCIR '07, showing an how it can enable social navigation.
David includes a video where he eloquently demonstrates how Parallax works, and the interface is quite compelling. I'm not sure how well it scales with large data sets, but David's focus has been on interfaces rather than systems. My biggest complaint--which isn't David's fault--is that the Freebase content is a bit sparse. But his interface strikes me as a great fit for exploratory search.
Despite efforts by me and other to persuade David to stay in the Northeast, he went out west a few months ago to join Metaweb, a company with ambitions "to build a better infrastructure for the Web." While I (and others) am not persuaded by Freebase, Metaweb's "open database of the world’s information," I am happy to see that David is still doing great work.
I encourage you to check out David's latest project: Freebase Parallax. In it, he does something I've never seen outside Endeca (excepting David's earlier work on a Nested Faceted Browser) he allows you to navigate using the facets of multiple entity types, joining between sets of entities through their relationships. At Endeca, we call this "record relationship navigation"--we presented it at HCIR '07, showing an how it can enable social navigation.
David includes a video where he eloquently demonstrates how Parallax works, and the interface is quite compelling. I'm not sure how well it scales with large data sets, but David's focus has been on interfaces rather than systems. My biggest complaint--which isn't David's fault--is that the Freebase content is a bit sparse. But his interface strikes me as a great fit for exploratory search.
Conversation with Seth Grimes
I had an great conversation with Intelligent Enterprise columnist Seth Grimes today. Apparently there's an upside to writing critical commentary on Google's aspirations in the enterprise!
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
Sunday, August 10, 2008
Why Enterprise Search Will Never Be Google-y
As I prepared to end my trilogy of Google-themed posts, I ran into two recently published items. They provide an excellent context for what I intended to talk about: the challenges and opportunities of enterprise search.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
Enterprise search and web search are fundamentally different animals, and I'd argue that enterprise search won't--and shouldn't--be Google-y any time soon....Like web search, Google's enterprise search is easy to use--if you're willing to go along with how Google's algorithms view and present your business information....Ironically, enterprises, with all of their highly structures and carefully organized silos of information, require a very different and paradoxically more complex approach.I highly recommend you read the whole article (it's only 2 pages), not only because it informative and well written, but also because the author isn't working for one of the Big Three.
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
Labels:
Enterprise Search,
exploratory search,
Google,
Search
Thursday, August 7, 2008
Where Google Isn't Good Enough
My last post, Is Google Good Enough?, challenged would-be Google killers to identify and address clear consumer needs for which Google isn't good enough as a solution. I like helping my readers, so here are some ideas.
- Shopping. Google Product Search (fka Froogle) is not one of Google's crown jewels. At best, it works well when you know the exact name of the product you are looking for. But it pales in contrast to any modern ecommerce site, such as Amazon or Home Depot. What makes a shopping site successful? Put simply, it helps users find what they want, even when they didn't know exactly what they wanted when they started.
- Finding a job. Google has not thrown its hat into the ring of job search, and even the page they offer for finding jobs at Google could use some improvement. The two biggest job sites, Monster and Careerbuilder, succeed in terms of the number of jobs posted, but aren't exactly optimized for user experience. Dice does better, but only for technology jobs. Interestingly, the best job finding site may be LinkedIn--not because of their search implementation (which is adequate but not innovative), but because of their success in getting millions of professionals to provide high-quality data.
- Finding employees. Again, LinkedIn has probably come closest to providing a good employee finding site. The large job sites (all of which I've used at some point) not only fail to support exploratory search, but also suffer from a skew towards ineligible candidates and a nuisance of recruiters posing as job seekers. Here again, Google has not tried to compete.
- Planning a trip. Sure, you can use Expedia, Travelocity, or Kayak to find a flight, hotel, and car rental. But there's a lot of room for improvement when it comes to planning a trip, whether for business or pleasure. The existing tools do a poor job of putting together a coordinated itinerary (e.g., meals, activities), and also don't integrate with relevant information sources, such as local directories and reviews. This is another area where Google has not tried to play.
Tuesday, August 5, 2008
Is Google Good Enough?
As Chief Scientist of Endeca, I spend a lot of my time explaining to people why they should not be satisfied with an information seekin interface that only offers them keyword search as an input mechanism and a ranked list of results as output. I tell them about query clarification dialogs, faceted navigation, and set analysis. More broadly, I evangelize exploratory search and human computer information retrieval as critical to addressing the inherent weakness of conventional ranked retrieval. If you haven't heard me expound on the subject, feel free to check out this slide show on Is Search Broken?.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
Monday, July 28, 2008
Not as Cuil as I Expected
Today's big tech news is the launch of Cuil, the latest challenger to Google's hegemony in Web search. Given the impressive team of Xooglers that put it together, I had high expectations for the launch.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
Sunday, July 13, 2008
Small is Beautiful
Today's New York Times has an article by John Markoff called On a Small Screen, Just the Salient Stuff. It argues that the design constraints of the iPhone (and of mobile devices in general) lead to an improved user experience, since site designers do a better job of focusing on the information that users will find relevant.
Of course, on a blog entitled The Noisy Channel, I can't help praising approaches that strive to improve the signal-to-noise ratio in information seeking applications. And I'm glad to see them quoting Ben Shneiderman, a colleague of mine at the University of Maryland who has spent much of his career focusing on HCIR issues.
Still, I think they could have taken the idea much further. Their discussion of more efficient or ergonomic use of real estate boils down to stripping extraneous content (a good idea, but hardly novel), and making sites vertically oriented (i.e., no horizontal scrolling). They don't consider the question of what information is best to present in the limited space--which, in my mind, is the most important question to consider as we optimize interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice.
Perhaps I am asking too much to expect them to call out the extreme inefficiency of ranked lists, compared to summarization-oriented approaches. Certainly the mobile space opens great opportunities for someone to get this right on the web.
Of course, on a blog entitled The Noisy Channel, I can't help praising approaches that strive to improve the signal-to-noise ratio in information seeking applications. And I'm glad to see them quoting Ben Shneiderman, a colleague of mine at the University of Maryland who has spent much of his career focusing on HCIR issues.
Still, I think they could have taken the idea much further. Their discussion of more efficient or ergonomic use of real estate boils down to stripping extraneous content (a good idea, but hardly novel), and making sites vertically oriented (i.e., no horizontal scrolling). They don't consider the question of what information is best to present in the limited space--which, in my mind, is the most important question to consider as we optimize interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice.
Perhaps I am asking too much to expect them to call out the extreme inefficiency of ranked lists, compared to summarization-oriented approaches. Certainly the mobile space opens great opportunities for someone to get this right on the web.
Labels:
HCIR,
Information technology,
Search,
Usability
Sunday, June 29, 2008
Back from ISSS Workshop
My apologies for the sparsity of posts lately; it's been a busy week!
I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
I just came back from the Information Seeking Support Systems Workshop, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.
One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.I'll let folks know as more information is released from the workshop.
Tuesday, June 24, 2008
What is (not) Exploratory Search?
One of the recurring topics at The Noisy Channel is exploratory search. Indeed, one of our readers recently took the initiative to upgrade the Wikipedia entry on exploratory search.
In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).
But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.
Should we conclude then that exploratory search is, in fact, a fringe use case?
According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Let me offer the following characterization of non-exploratory search:
There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.
In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).
But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.
Should we conclude then that exploratory search is, in fact, a fringe use case?
According to Ryen White, Gary Marchionini, and Gheorghe Muresan:
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.
Let me offer the following characterization of non-exploratory search:
- You know exactly what you want.
- You know exactly how to ask for it.
- You expect a search query to yield one of two responses:
- Success: you are presented with the object of your search.
- Failure: you learn that the object of your search is unavailable.
There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.
Friday, June 20, 2008
Enterprise Search Done Right
A recent study from AIIM (the Association for Information and Image Management, also known as the Enterprise Content Management Association) reports that enterprise search frustrates and disappoints users. Specifically, 49% of survey respondents “agreed” or “strongly agreed” that it is a difficult and time consuming process to find the information they need to do their job.
Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.
Given that I work for a leading enterprise search provider, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.
Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.
Labels:
Endeca,
Enterprise Search,
Information technology,
Search
Monday, June 16, 2008
A Game to Evaluate Browsing Interfaces?
I've mused a fair amount about to apply the concept of the Phetch human computation game to evaluate browsing-based information retrieval interfaces. I'd love to be able to better evaluate faceted navigation and clustering approaches, relative to conventional search as well as relative to one another.
Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.
As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.
As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.
Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.
Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.
Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.
As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.
As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.
Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.
Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.
Labels:
Evaluation,
faceted navigation,
HCIR,
Search
Subscribe to:
Posts (Atom)