Showing posts with label Google. Show all posts
Showing posts with label Google. Show all posts
Tuesday, September 16, 2008
Quick Bites: Search Evaluation at Google
Original post is here; Jeff's commentary is here. Not surprisingly, my reaction is that Google should consider a richer notion of "results" than an ordering of matching pages, perhaps a faceted approach that reflects the "several dimensions to 'good' results."
Monday, September 15, 2008
Information Accountability
The recent United Airlines stock fiasco triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) included a link to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.

For anyone who wants all of the gory details, Google's version of the story is here; the Tribune Company's version is here.
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
- The first was a piece in BBC News about a speech by Sir Tim Berners-Lee expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's Angels and Demons), and rumors that a vaccine given to children in Britain was harmful.
- The second was a column in the New York Times about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
Labels:
Google,
Information technology,
social media,
transparency
Sunday, September 7, 2008
Quick Bites: Is Search Really 90% Solved?
Props to Michael Arrington for calling out this snippet in an interview with Marissa Mayer, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:
Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.I agree with Michael that search isn't even close to being solved yet. I've criticized the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of open problems in search for those ambitious enough to tackle them.
Tuesday, September 2, 2008
Quick Bites: Google Chrome
For those of you who thought that no major technology news would come out during the Labor Day weekend, check out the prematurely released comic book hailing Google Chrome, Google's long rumored entry into browser wars. By the time you are reading this, the (Windows only) beta may even be available for download. The official Google announcement is here.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
Saturday, August 16, 2008
Thinking Outside the Black Box
I was reading Techmeme today, and I noticed an LA Times article about RushmoreDrive, described on its About Us page as "a first-of-its-kind search engine for the Black community." My first reaction, blogged by others already, was that this idea was dumb and racist. In fact, it took some work to find positive commentary about RushmoreDrive.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
Labels:
exploratory search,
Google,
Relevance,
Search
Wednesday, August 13, 2008
Conversation with Seth Grimes
I had an great conversation with Intelligent Enterprise columnist Seth Grimes today. Apparently there's an upside to writing critical commentary on Google's aspirations in the enterprise!
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
Sunday, August 10, 2008
Why Enterprise Search Will Never Be Google-y
As I prepared to end my trilogy of Google-themed posts, I ran into two recently published items. They provide an excellent context for what I intended to talk about: the challenges and opportunities of enterprise search.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
Enterprise search and web search are fundamentally different animals, and I'd argue that enterprise search won't--and shouldn't--be Google-y any time soon....Like web search, Google's enterprise search is easy to use--if you're willing to go along with how Google's algorithms view and present your business information....Ironically, enterprises, with all of their highly structures and carefully organized silos of information, require a very different and paradoxically more complex approach.I highly recommend you read the whole article (it's only 2 pages), not only because it informative and well written, but also because the author isn't working for one of the Big Three.
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
Labels:
Enterprise Search,
exploratory search,
Google,
Search
Thursday, August 7, 2008
Where Google Isn't Good Enough
My last post, Is Google Good Enough?, challenged would-be Google killers to identify and address clear consumer needs for which Google isn't good enough as a solution. I like helping my readers, so here are some ideas.
- Shopping. Google Product Search (fka Froogle) is not one of Google's crown jewels. At best, it works well when you know the exact name of the product you are looking for. But it pales in contrast to any modern ecommerce site, such as Amazon or Home Depot. What makes a shopping site successful? Put simply, it helps users find what they want, even when they didn't know exactly what they wanted when they started.
- Finding a job. Google has not thrown its hat into the ring of job search, and even the page they offer for finding jobs at Google could use some improvement. The two biggest job sites, Monster and Careerbuilder, succeed in terms of the number of jobs posted, but aren't exactly optimized for user experience. Dice does better, but only for technology jobs. Interestingly, the best job finding site may be LinkedIn--not because of their search implementation (which is adequate but not innovative), but because of their success in getting millions of professionals to provide high-quality data.
- Finding employees. Again, LinkedIn has probably come closest to providing a good employee finding site. The large job sites (all of which I've used at some point) not only fail to support exploratory search, but also suffer from a skew towards ineligible candidates and a nuisance of recruiters posing as job seekers. Here again, Google has not tried to compete.
- Planning a trip. Sure, you can use Expedia, Travelocity, or Kayak to find a flight, hotel, and car rental. But there's a lot of room for improvement when it comes to planning a trip, whether for business or pleasure. The existing tools do a poor job of putting together a coordinated itinerary (e.g., meals, activities), and also don't integrate with relevant information sources, such as local directories and reviews. This is another area where Google has not tried to play.
Tuesday, August 5, 2008
Is Google Good Enough?
As Chief Scientist of Endeca, I spend a lot of my time explaining to people why they should not be satisfied with an information seekin interface that only offers them keyword search as an input mechanism and a ranked list of results as output. I tell them about query clarification dialogs, faceted navigation, and set analysis. More broadly, I evangelize exploratory search and human computer information retrieval as critical to addressing the inherent weakness of conventional ranked retrieval. If you haven't heard me expound on the subject, feel free to check out this slide show on Is Search Broken?.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
Monday, July 28, 2008
Not as Cuil as I Expected
Today's big tech news is the launch of Cuil, the latest challenger to Google's hegemony in Web search. Given the impressive team of Xooglers that put it together, I had high expectations for the launch.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
Wednesday, July 23, 2008
Knol: Google takes on Wikipedia
Just a few days ago, I was commenting on a New York Times article about Wikipedia's new approval system that the biggest problem with Wikipedia is anonymous authorship. By synchronous coincidence, Google unveiled Knol today, which is something of a cross between Wikipedia and Squidoo. It's most salient feature is that each entry will have a clearly identified author. They even allow authors to verify their identities using credit cards or phone directories.
It's a nice idea, since anonymous authorship is a a major factor in the adversarial nature of information retrieval on the web. Not only does the accountability of authorship inhibit vandalism and edit wars, but it also allows readers to decide for themselves whom to trust--at least to the extent that readers are able and willing to obtain reliable information about the authors. Without question, they are addressing Wikipedia's biggest weakness.
But it's too little, too late. Wikipedia is already there. And, despite complaints about its inaccuracy and bias, Wikipedia is a fantastic, highly utilized resource. The only way I see for Knol to supplant Wikipedia in reasonable time frame is through a massive cut-and-paste to make up for the huge difference in content.
Interestingly, Wikipedia does not seem to place any onerous restrictions on verbatim copying. However, unless a single author is 100% responsible for authoring a Wikipedia entry, it isn't clear that anyone can simply copy the entry into Knol.
I know that it's dangerous to bet against Google. But I'm really skeptical about this latest effort. It's a pity, because I think their emphasis is the right one. But for once I wish they'd been a bit more humble and accepted that they aren't going to build a better Wikipedia from scratch.
It's a nice idea, since anonymous authorship is a a major factor in the adversarial nature of information retrieval on the web. Not only does the accountability of authorship inhibit vandalism and edit wars, but it also allows readers to decide for themselves whom to trust--at least to the extent that readers are able and willing to obtain reliable information about the authors. Without question, they are addressing Wikipedia's biggest weakness.
But it's too little, too late. Wikipedia is already there. And, despite complaints about its inaccuracy and bias, Wikipedia is a fantastic, highly utilized resource. The only way I see for Knol to supplant Wikipedia in reasonable time frame is through a massive cut-and-paste to make up for the huge difference in content.
Interestingly, Wikipedia does not seem to place any onerous restrictions on verbatim copying. However, unless a single author is 100% responsible for authoring a Wikipedia entry, it isn't clear that anyone can simply copy the entry into Knol.
I know that it's dangerous to bet against Google. But I'm really skeptical about this latest effort. It's a pity, because I think their emphasis is the right one. But for once I wish they'd been a bit more humble and accepted that they aren't going to build a better Wikipedia from scratch.
Wednesday, June 11, 2008
How Google Measures Search Quality
Thanks to Jon Elsas for calling my attention to a great post at Datawocky today on how Google measures search quality, written by Anand Rajaraman based on his conversation with Google Director of Research Peter Norvig.
The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.
I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.
More questions for Amit. :)
The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.
I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.
More questions for Amit. :)
Labels:
Amit Singhal,
Cranfield,
Evaluation,
Google,
Information Retrieval,
Relevance,
Search
Saturday, April 12, 2008
Can Search be a Utility?
A recent lecture at the New York CTO club inspired a heated discussion on what is wrong with enterprise search solutions. Specifically, Jon Williams asked why search can't be a utility.
It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.
On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.
Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).
While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.
On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.
In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.
It seems we can go in two directions.
The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.
The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.
It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.
On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.
Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).
While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.
On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.
In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.
It seems we can go in two directions.
The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.
The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.
Labels:
Enterprise Search,
Google,
Relevance,
Wikipedia
Tuesday, April 8, 2008
Q&A with Amit Singhal
Amit Singhal, who is head of search quality at Google, gave a very entertaining keynote at ECIR '08 that focused on the adversarial aspects of Web IR. Specifically, he discussed some of the techniques used in the arms race to game Google's ranking algorithms. Perhaps he revealed more than he intended!
During the question and answer session, I reminded Amit of the admonition against security through obscurity that is well accepted in the security and cryptography communities. I questioned whether his team is pursuing the wrong strategy by failing to respect this maxim. Amit replied that a relevance analog to security by design was an interesting challenge (which he delegated to the audience), but he appealed to the subjectivity of relevance as a reason for it being harder to make relevance as transparent as security.
While I accept the difficulty of this challenge, I reject the suggestion that subjectivity makes it harder. To being with, Google and other web search engines rank results objectively, rather than based on user-specific considerations. Furthermore, the subjectivity of relevance should make the adversarial problem easier rather than harder, as has been observed in the security industry.
But the challenge is indeed a daunting one. Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?
At Endeca, we emphasize the transparency of our engine as a core value of our offering to enterprises. Granted, our clients generally do not have an adversarial relationship with their data. Still, I am convinced that the same approach not only can work on the web, but will be the only way to end the arms race between spammers and Amit's army of tweakers.
During the question and answer session, I reminded Amit of the admonition against security through obscurity that is well accepted in the security and cryptography communities. I questioned whether his team is pursuing the wrong strategy by failing to respect this maxim. Amit replied that a relevance analog to security by design was an interesting challenge (which he delegated to the audience), but he appealed to the subjectivity of relevance as a reason for it being harder to make relevance as transparent as security.
While I accept the difficulty of this challenge, I reject the suggestion that subjectivity makes it harder. To being with, Google and other web search engines rank results objectively, rather than based on user-specific considerations. Furthermore, the subjectivity of relevance should make the adversarial problem easier rather than harder, as has been observed in the security industry.
But the challenge is indeed a daunting one. Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?
At Endeca, we emphasize the transparency of our engine as a core value of our offering to enterprises. Granted, our clients generally do not have an adversarial relationship with their data. Still, I am convinced that the same approach not only can work on the web, but will be the only way to end the arms race between spammers and Amit's army of tweakers.
Labels:
Amit Singhal,
ECIR,
Google,
Information Retrieval,
Relevance
Subscribe to:
Posts (Atom)
Showing posts with label Google. Show all posts
Showing posts with label Google. Show all posts
Tuesday, September 16, 2008
Quick Bites: Search Evaluation at Google
Monday, September 15, 2008
Information Accountability
The recent United Airlines stock fiasco triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) included a link to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.

For anyone who wants all of the gory details, Google's version of the story is here; the Tribune Company's version is here.
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
- The first was a piece in BBC News about a speech by Sir Tim Berners-Lee expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's Angels and Demons), and rumors that a vaccine given to children in Britain was harmful.
- The second was a column in the New York Times about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
Labels:
Google,
Information technology,
social media,
transparency
Sunday, September 7, 2008
Quick Bites: Is Search Really 90% Solved?
Props to Michael Arrington for calling out this snippet in an interview with Marissa Mayer, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:
Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.I agree with Michael that search isn't even close to being solved yet. I've criticized the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of open problems in search for those ambitious enough to tackle them.
Tuesday, September 2, 2008
Quick Bites: Google Chrome
For those of you who thought that no major technology news would come out during the Labor Day weekend, check out the prematurely released comic book hailing Google Chrome, Google's long rumored entry into browser wars. By the time you are reading this, the (Windows only) beta may even be available for download. The official Google announcement is here.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
Saturday, August 16, 2008
Thinking Outside the Black Box
I was reading Techmeme today, and I noticed an LA Times article about RushmoreDrive, described on its About Us page as "a first-of-its-kind search engine for the Black community." My first reaction, blogged by others already, was that this idea was dumb and racist. In fact, it took some work to find positive commentary about RushmoreDrive.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
Labels:
exploratory search,
Google,
Relevance,
Search
Wednesday, August 13, 2008
Conversation with Seth Grimes
I had an great conversation with Intelligent Enterprise columnist Seth Grimes today. Apparently there's an upside to writing critical commentary on Google's aspirations in the enterprise!
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
Sunday, August 10, 2008
Why Enterprise Search Will Never Be Google-y
As I prepared to end my trilogy of Google-themed posts, I ran into two recently published items. They provide an excellent context for what I intended to talk about: the challenges and opportunities of enterprise search.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
Enterprise search and web search are fundamentally different animals, and I'd argue that enterprise search won't--and shouldn't--be Google-y any time soon....Like web search, Google's enterprise search is easy to use--if you're willing to go along with how Google's algorithms view and present your business information....Ironically, enterprises, with all of their highly structures and carefully organized silos of information, require a very different and paradoxically more complex approach.I highly recommend you read the whole article (it's only 2 pages), not only because it informative and well written, but also because the author isn't working for one of the Big Three.
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
Labels:
Enterprise Search,
exploratory search,
Google,
Search
Thursday, August 7, 2008
Where Google Isn't Good Enough
My last post, Is Google Good Enough?, challenged would-be Google killers to identify and address clear consumer needs for which Google isn't good enough as a solution. I like helping my readers, so here are some ideas.
- Shopping. Google Product Search (fka Froogle) is not one of Google's crown jewels. At best, it works well when you know the exact name of the product you are looking for. But it pales in contrast to any modern ecommerce site, such as Amazon or Home Depot. What makes a shopping site successful? Put simply, it helps users find what they want, even when they didn't know exactly what they wanted when they started.
- Finding a job. Google has not thrown its hat into the ring of job search, and even the page they offer for finding jobs at Google could use some improvement. The two biggest job sites, Monster and Careerbuilder, succeed in terms of the number of jobs posted, but aren't exactly optimized for user experience. Dice does better, but only for technology jobs. Interestingly, the best job finding site may be LinkedIn--not because of their search implementation (which is adequate but not innovative), but because of their success in getting millions of professionals to provide high-quality data.
- Finding employees. Again, LinkedIn has probably come closest to providing a good employee finding site. The large job sites (all of which I've used at some point) not only fail to support exploratory search, but also suffer from a skew towards ineligible candidates and a nuisance of recruiters posing as job seekers. Here again, Google has not tried to compete.
- Planning a trip. Sure, you can use Expedia, Travelocity, or Kayak to find a flight, hotel, and car rental. But there's a lot of room for improvement when it comes to planning a trip, whether for business or pleasure. The existing tools do a poor job of putting together a coordinated itinerary (e.g., meals, activities), and also don't integrate with relevant information sources, such as local directories and reviews. This is another area where Google has not tried to play.
Tuesday, August 5, 2008
Is Google Good Enough?
As Chief Scientist of Endeca, I spend a lot of my time explaining to people why they should not be satisfied with an information seekin interface that only offers them keyword search as an input mechanism and a ranked list of results as output. I tell them about query clarification dialogs, faceted navigation, and set analysis. More broadly, I evangelize exploratory search and human computer information retrieval as critical to addressing the inherent weakness of conventional ranked retrieval. If you haven't heard me expound on the subject, feel free to check out this slide show on Is Search Broken?.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
Monday, July 28, 2008
Not as Cuil as I Expected
Today's big tech news is the launch of Cuil, the latest challenger to Google's hegemony in Web search. Given the impressive team of Xooglers that put it together, I had high expectations for the launch.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
Wednesday, July 23, 2008
Knol: Google takes on Wikipedia
Just a few days ago, I was commenting on a New York Times article about Wikipedia's new approval system that the biggest problem with Wikipedia is anonymous authorship. By synchronous coincidence, Google unveiled Knol today, which is something of a cross between Wikipedia and Squidoo. It's most salient feature is that each entry will have a clearly identified author. They even allow authors to verify their identities using credit cards or phone directories.
It's a nice idea, since anonymous authorship is a a major factor in the adversarial nature of information retrieval on the web. Not only does the accountability of authorship inhibit vandalism and edit wars, but it also allows readers to decide for themselves whom to trust--at least to the extent that readers are able and willing to obtain reliable information about the authors. Without question, they are addressing Wikipedia's biggest weakness.
But it's too little, too late. Wikipedia is already there. And, despite complaints about its inaccuracy and bias, Wikipedia is a fantastic, highly utilized resource. The only way I see for Knol to supplant Wikipedia in reasonable time frame is through a massive cut-and-paste to make up for the huge difference in content.
Interestingly, Wikipedia does not seem to place any onerous restrictions on verbatim copying. However, unless a single author is 100% responsible for authoring a Wikipedia entry, it isn't clear that anyone can simply copy the entry into Knol.
I know that it's dangerous to bet against Google. But I'm really skeptical about this latest effort. It's a pity, because I think their emphasis is the right one. But for once I wish they'd been a bit more humble and accepted that they aren't going to build a better Wikipedia from scratch.
It's a nice idea, since anonymous authorship is a a major factor in the adversarial nature of information retrieval on the web. Not only does the accountability of authorship inhibit vandalism and edit wars, but it also allows readers to decide for themselves whom to trust--at least to the extent that readers are able and willing to obtain reliable information about the authors. Without question, they are addressing Wikipedia's biggest weakness.
But it's too little, too late. Wikipedia is already there. And, despite complaints about its inaccuracy and bias, Wikipedia is a fantastic, highly utilized resource. The only way I see for Knol to supplant Wikipedia in reasonable time frame is through a massive cut-and-paste to make up for the huge difference in content.
Interestingly, Wikipedia does not seem to place any onerous restrictions on verbatim copying. However, unless a single author is 100% responsible for authoring a Wikipedia entry, it isn't clear that anyone can simply copy the entry into Knol.
I know that it's dangerous to bet against Google. But I'm really skeptical about this latest effort. It's a pity, because I think their emphasis is the right one. But for once I wish they'd been a bit more humble and accepted that they aren't going to build a better Wikipedia from scratch.
Wednesday, June 11, 2008
How Google Measures Search Quality
Thanks to Jon Elsas for calling my attention to a great post at Datawocky today on how Google measures search quality, written by Anand Rajaraman based on his conversation with Google Director of Research Peter Norvig.
The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.
I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.
More questions for Amit. :)
The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.
I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.
More questions for Amit. :)
Labels:
Amit Singhal,
Cranfield,
Evaluation,
Google,
Information Retrieval,
Relevance,
Search
Saturday, April 12, 2008
Can Search be a Utility?
A recent lecture at the New York CTO club inspired a heated discussion on what is wrong with enterprise search solutions. Specifically, Jon Williams asked why search can't be a utility.
It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.
On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.
Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).
While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.
On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.
In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.
It seems we can go in two directions.
The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.
The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.
It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.
On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.
Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).
While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.
On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.
In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.
It seems we can go in two directions.
The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.
The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.
Labels:
Enterprise Search,
Google,
Relevance,
Wikipedia
Tuesday, April 8, 2008
Q&A with Amit Singhal
Amit Singhal, who is head of search quality at Google, gave a very entertaining keynote at ECIR '08 that focused on the adversarial aspects of Web IR. Specifically, he discussed some of the techniques used in the arms race to game Google's ranking algorithms. Perhaps he revealed more than he intended!
During the question and answer session, I reminded Amit of the admonition against security through obscurity that is well accepted in the security and cryptography communities. I questioned whether his team is pursuing the wrong strategy by failing to respect this maxim. Amit replied that a relevance analog to security by design was an interesting challenge (which he delegated to the audience), but he appealed to the subjectivity of relevance as a reason for it being harder to make relevance as transparent as security.
While I accept the difficulty of this challenge, I reject the suggestion that subjectivity makes it harder. To being with, Google and other web search engines rank results objectively, rather than based on user-specific considerations. Furthermore, the subjectivity of relevance should make the adversarial problem easier rather than harder, as has been observed in the security industry.
But the challenge is indeed a daunting one. Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?
At Endeca, we emphasize the transparency of our engine as a core value of our offering to enterprises. Granted, our clients generally do not have an adversarial relationship with their data. Still, I am convinced that the same approach not only can work on the web, but will be the only way to end the arms race between spammers and Amit's army of tweakers.
During the question and answer session, I reminded Amit of the admonition against security through obscurity that is well accepted in the security and cryptography communities. I questioned whether his team is pursuing the wrong strategy by failing to respect this maxim. Amit replied that a relevance analog to security by design was an interesting challenge (which he delegated to the audience), but he appealed to the subjectivity of relevance as a reason for it being harder to make relevance as transparent as security.
While I accept the difficulty of this challenge, I reject the suggestion that subjectivity makes it harder. To being with, Google and other web search engines rank results objectively, rather than based on user-specific considerations. Furthermore, the subjectivity of relevance should make the adversarial problem easier rather than harder, as has been observed in the security industry.
But the challenge is indeed a daunting one. Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?
At Endeca, we emphasize the transparency of our engine as a core value of our offering to enterprises. Granted, our clients generally do not have an adversarial relationship with their data. Still, I am convinced that the same approach not only can work on the web, but will be the only way to end the arms race between spammers and Amit's army of tweakers.
Labels:
Amit Singhal,
ECIR,
Google,
Information Retrieval,
Relevance
Subscribe to:
Posts (Atom)
Showing posts with label Google. Show all posts
Showing posts with label Google. Show all posts
Tuesday, September 16, 2008
Quick Bites: Search Evaluation at Google
Monday, September 15, 2008
Information Accountability
The recent United Airlines stock fiasco triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) included a link to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.

For anyone who wants all of the gory details, Google's version of the story is here; the Tribune Company's version is here.
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
- The first was a piece in BBC News about a speech by Sir Tim Berners-Lee expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's Angels and Demons), and rumors that a vaccine given to children in Britain was harmful.
- The second was a column in the New York Times about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
Labels:
Google,
Information technology,
social media,
transparency
Sunday, September 7, 2008
Quick Bites: Is Search Really 90% Solved?
Props to Michael Arrington for calling out this snippet in an interview with Marissa Mayer, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:
Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.I agree with Michael that search isn't even close to being solved yet. I've criticized the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of open problems in search for those ambitious enough to tackle them.
Tuesday, September 2, 2008
Quick Bites: Google Chrome
For those of you who thought that no major technology news would come out during the Labor Day weekend, check out the prematurely released comic book hailing Google Chrome, Google's long rumored entry into browser wars. By the time you are reading this, the (Windows only) beta may even be available for download. The official Google announcement is here.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
Saturday, August 16, 2008
Thinking Outside the Black Box
I was reading Techmeme today, and I noticed an LA Times article about RushmoreDrive, described on its About Us page as "a first-of-its-kind search engine for the Black community." My first reaction, blogged by others already, was that this idea was dumb and racist. In fact, it took some work to find positive commentary about RushmoreDrive.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
But I've learned from the way the blogosphere handled the Cuil launch not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at Amy Ruth's and the service was as gracious as the chicken and waffles were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.
The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of Ask.com, a corporate sibling of IAC-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.
What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I criticized Amit Singhal for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.
I don't know how much information race provides as prior to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.
Labels:
exploratory search,
Google,
Relevance,
Search
Wednesday, August 13, 2008
Conversation with Seth Grimes
I had an great conversation with Intelligent Enterprise columnist Seth Grimes today. Apparently there's an upside to writing critical commentary on Google's aspirations in the enterprise!
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.
Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn't.
I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to earlier posts on the subject rather than bore the regulars.
But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.
I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.
Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.
Sunday, August 10, 2008
Why Enterprise Search Will Never Be Google-y
As I prepared to end my trilogy of Google-themed posts, I ran into two recently published items. They provide an excellent context for what I intended to talk about: the challenges and opportunities of enterprise search.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
The first is Google's announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.
The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.
First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.
Second, the Chris Sherman piece. Here is an excerpt:
Enterprise search and web search are fundamentally different animals, and I'd argue that enterprise search won't--and shouldn't--be Google-y any time soon....Like web search, Google's enterprise search is easy to use--if you're willing to go along with how Google's algorithms view and present your business information....Ironically, enterprises, with all of their highly structures and carefully organized silos of information, require a very different and paradoxically more complex approach.I highly recommend you read the whole article (it's only 2 pages), not only because it informative and well written, but also because the author isn't working for one of the Big Three.
The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.
But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.
If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.
Labels:
Enterprise Search,
exploratory search,
Google,
Search
Thursday, August 7, 2008
Where Google Isn't Good Enough
My last post, Is Google Good Enough?, challenged would-be Google killers to identify and address clear consumer needs for which Google isn't good enough as a solution. I like helping my readers, so here are some ideas.
- Shopping. Google Product Search (fka Froogle) is not one of Google's crown jewels. At best, it works well when you know the exact name of the product you are looking for. But it pales in contrast to any modern ecommerce site, such as Amazon or Home Depot. What makes a shopping site successful? Put simply, it helps users find what they want, even when they didn't know exactly what they wanted when they started.
- Finding a job. Google has not thrown its hat into the ring of job search, and even the page they offer for finding jobs at Google could use some improvement. The two biggest job sites, Monster and Careerbuilder, succeed in terms of the number of jobs posted, but aren't exactly optimized for user experience. Dice does better, but only for technology jobs. Interestingly, the best job finding site may be LinkedIn--not because of their search implementation (which is adequate but not innovative), but because of their success in getting millions of professionals to provide high-quality data.
- Finding employees. Again, LinkedIn has probably come closest to providing a good employee finding site. The large job sites (all of which I've used at some point) not only fail to support exploratory search, but also suffer from a skew towards ineligible candidates and a nuisance of recruiters posing as job seekers. Here again, Google has not tried to compete.
- Planning a trip. Sure, you can use Expedia, Travelocity, or Kayak to find a flight, hotel, and car rental. But there's a lot of room for improvement when it comes to planning a trip, whether for business or pleasure. The existing tools do a poor job of putting together a coordinated itinerary (e.g., meals, activities), and also don't integrate with relevant information sources, such as local directories and reviews. This is another area where Google has not tried to play.
Tuesday, August 5, 2008
Is Google Good Enough?
As Chief Scientist of Endeca, I spend a lot of my time explaining to people why they should not be satisfied with an information seekin interface that only offers them keyword search as an input mechanism and a ranked list of results as output. I tell them about query clarification dialogs, faceted navigation, and set analysis. More broadly, I evangelize exploratory search and human computer information retrieval as critical to addressing the inherent weakness of conventional ranked retrieval. If you haven't heard me expound on the subject, feel free to check out this slide show on Is Search Broken?.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, enterprise search is different.
1) Google does well enough on result quality, enough of the time.
While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.
2) Google doesn't support exploratory search. But it often leads you to a tool that does.
The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.
3) Google is a benign monopoly that mitigates choice overload.
Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.
In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.
Monday, July 28, 2008
Not as Cuil as I Expected
Today's big tech news is the launch of Cuil, the latest challenger to Google's hegemony in Web search. Given the impressive team of Xooglers that put it together, I had high expectations for the launch.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I'm not taking it personally--after all, their own site doesn't show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.
Perhaps I'm expecting too much on day 1. But they're not just trying to beat Gigablast; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.
I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.
Wednesday, July 23, 2008
Knol: Google takes on Wikipedia
Just a few days ago, I was commenting on a New York Times article about Wikipedia's new approval system that the biggest problem with Wikipedia is anonymous authorship. By synchronous coincidence, Google unveiled Knol today, which is something of a cross between Wikipedia and Squidoo. It's most salient feature is that each entry will have a clearly identified author. They even allow authors to verify their identities using credit cards or phone directories.
It's a nice idea, since anonymous authorship is a a major factor in the adversarial nature of information retrieval on the web. Not only does the accountability of authorship inhibit vandalism and edit wars, but it also allows readers to decide for themselves whom to trust--at least to the extent that readers are able and willing to obtain reliable information about the authors. Without question, they are addressing Wikipedia's biggest weakness.
But it's too little, too late. Wikipedia is already there. And, despite complaints about its inaccuracy and bias, Wikipedia is a fantastic, highly utilized resource. The only way I see for Knol to supplant Wikipedia in reasonable time frame is through a massive cut-and-paste to make up for the huge difference in content.
Interestingly, Wikipedia does not seem to place any onerous restrictions on verbatim copying. However, unless a single author is 100% responsible for authoring a Wikipedia entry, it isn't clear that anyone can simply copy the entry into Knol.
I know that it's dangerous to bet against Google. But I'm really skeptical about this latest effort. It's a pity, because I think their emphasis is the right one. But for once I wish they'd been a bit more humble and accepted that they aren't going to build a better Wikipedia from scratch.
It's a nice idea, since anonymous authorship is a a major factor in the adversarial nature of information retrieval on the web. Not only does the accountability of authorship inhibit vandalism and edit wars, but it also allows readers to decide for themselves whom to trust--at least to the extent that readers are able and willing to obtain reliable information about the authors. Without question, they are addressing Wikipedia's biggest weakness.
But it's too little, too late. Wikipedia is already there. And, despite complaints about its inaccuracy and bias, Wikipedia is a fantastic, highly utilized resource. The only way I see for Knol to supplant Wikipedia in reasonable time frame is through a massive cut-and-paste to make up for the huge difference in content.
Interestingly, Wikipedia does not seem to place any onerous restrictions on verbatim copying. However, unless a single author is 100% responsible for authoring a Wikipedia entry, it isn't clear that anyone can simply copy the entry into Knol.
I know that it's dangerous to bet against Google. But I'm really skeptical about this latest effort. It's a pity, because I think their emphasis is the right one. But for once I wish they'd been a bit more humble and accepted that they aren't going to build a better Wikipedia from scratch.
Wednesday, June 11, 2008
How Google Measures Search Quality
Thanks to Jon Elsas for calling my attention to a great post at Datawocky today on how Google measures search quality, written by Anand Rajaraman based on his conversation with Google Director of Research Peter Norvig.
The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.
I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.
More questions for Amit. :)
The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.
I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than mean average precision.
More questions for Amit. :)
Labels:
Amit Singhal,
Cranfield,
Evaluation,
Google,
Information Retrieval,
Relevance,
Search
Saturday, April 12, 2008
Can Search be a Utility?
A recent lecture at the New York CTO club inspired a heated discussion on what is wrong with enterprise search solutions. Specifically, Jon Williams asked why search can't be a utility.
It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.
On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.
Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).
While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.
On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.
In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.
It seems we can go in two directions.
The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.
The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.
It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.
On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.
Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).
While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.
On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.
In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.
It seems we can go in two directions.
The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.
The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.
Labels:
Enterprise Search,
Google,
Relevance,
Wikipedia
Tuesday, April 8, 2008
Q&A with Amit Singhal
Amit Singhal, who is head of search quality at Google, gave a very entertaining keynote at ECIR '08 that focused on the adversarial aspects of Web IR. Specifically, he discussed some of the techniques used in the arms race to game Google's ranking algorithms. Perhaps he revealed more than he intended!
During the question and answer session, I reminded Amit of the admonition against security through obscurity that is well accepted in the security and cryptography communities. I questioned whether his team is pursuing the wrong strategy by failing to respect this maxim. Amit replied that a relevance analog to security by design was an interesting challenge (which he delegated to the audience), but he appealed to the subjectivity of relevance as a reason for it being harder to make relevance as transparent as security.
While I accept the difficulty of this challenge, I reject the suggestion that subjectivity makes it harder. To being with, Google and other web search engines rank results objectively, rather than based on user-specific considerations. Furthermore, the subjectivity of relevance should make the adversarial problem easier rather than harder, as has been observed in the security industry.
But the challenge is indeed a daunting one. Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?
At Endeca, we emphasize the transparency of our engine as a core value of our offering to enterprises. Granted, our clients generally do not have an adversarial relationship with their data. Still, I am convinced that the same approach not only can work on the web, but will be the only way to end the arms race between spammers and Amit's army of tweakers.
During the question and answer session, I reminded Amit of the admonition against security through obscurity that is well accepted in the security and cryptography communities. I questioned whether his team is pursuing the wrong strategy by failing to respect this maxim. Amit replied that a relevance analog to security by design was an interesting challenge (which he delegated to the audience), but he appealed to the subjectivity of relevance as a reason for it being harder to make relevance as transparent as security.
While I accept the difficulty of this challenge, I reject the suggestion that subjectivity makes it harder. To being with, Google and other web search engines rank results objectively, rather than based on user-specific considerations. Furthermore, the subjectivity of relevance should make the adversarial problem easier rather than harder, as has been observed in the security industry.
But the challenge is indeed a daunting one. Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?
At Endeca, we emphasize the transparency of our engine as a core value of our offering to enterprises. Granted, our clients generally do not have an adversarial relationship with their data. Still, I am convinced that the same approach not only can work on the web, but will be the only way to end the arms race between spammers and Amit's army of tweakers.
Labels:
Amit Singhal,
ECIR,
Google,
Information Retrieval,
Relevance
Subscribe to:
Posts (Atom)