Please redirect your readers to http://thenoisychannel.com! The RSS feed is available at http://thenoisychannel.com/?feed=rss2.
See you all there...
Tuesday, September 16, 2008
Migrating Tonight!
At long last, this blog will migrate over to a hosted WordPress platform at http://thenoisychannel.com/. Thanks to Andy Milk (and to Endeca for lending me his services) and especially to Noisy Channel regular David Fauth for making this promised migration a reality!
As of midnight EST, please visit the new site. My goal is to redirect all incoming Blogger traffic to the new hosted site. This will be the last post here at Blogger.
p.s. Please note that I'll be manually migrating any content (posts and comments) from the past 5 days, i.e., since I performed an import on September 12th. My apologies if anything is lost in translation.
As of midnight EST, please visit the new site. My goal is to redirect all incoming Blogger traffic to the new hosted site. This will be the last post here at Blogger.
p.s. Please note that I'll be manually migrating any content (posts and comments) from the past 5 days, i.e., since I performed an import on September 12th. My apologies if anything is lost in translation.
Quick Bites: Search Evaluation at Google
Quick Bites: Is Wikipedia Production Slowing Down?
Thanks to Sérgio for tweeting this post by Peter Pirolli at PARC: Is Wikipedia Production Slowing Down?
Here's the picture showing the reduction of growth in the number of Wikipedia editors over time:
Interesting material and commentary at Augmented Social Cognition and Peter Pirolli's blog. Are people are running out of things to write about?
Here's the picture showing the reduction of growth in the number of Wikipedia editors over time:
Interesting material and commentary at Augmented Social Cognition and Peter Pirolli's blog. Are people are running out of things to write about?
Monday, September 15, 2008
Information Accountability
The recent United Airlines stock fiasco triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) included a link to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.
For anyone who wants all of the gory details, Google's version of the story is here; the Tribune Company's version is here.
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
- The first was a piece in BBC News about a speech by Sir Tim Berners-Lee expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's Angels and Demons), and rumors that a vaccine given to children in Britain was harmful.
- The second was a column in the New York Times about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
Labels:
Google,
Information technology,
social media,
transparency
Sunday, September 14, 2008
Is Blog Search Different?
Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.
The position paper suggests focusing on 3 three kinds of search tasks:
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
The position paper suggests focusing on 3 three kinds of search tasks:
- Find out what are people thinking or feeling about X over time.
- Find good blogs/authors to read.
- Find useful information that was published in blogs sometime in the past.
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
Saturday, September 13, 2008
Progress on the Migration
Please check out http://thenoisychannel.com/ to see the future of The Noisy Channel in progress. I'm using WordPress hosted on GoDaddy and did the minimum work to port all posts and comments (not including this one).
Here is the my current list of tasks that I'd like to get done before we move.
Here is the my current list of tasks that I'd like to get done before we move.
- Design! I'm currently using the default WordPress theme, which is pretty lame. I'm inclined to use a clean but stylish two-column theme that is widget-friendly. Maybe Cutline. In any case, I'd like the new site to be a tad less spartan before we move into it.
- Internal Links. My habit of linking back to previous posts now means I have to map those links to the new posts. I suspect I'll do it manually, since I don't see an easy way to automate it.
- Redirects. Unfortunately I don't think I can actually get Blogger to redirect traffic automatically. So my plan is to post signage throughout this blog making it clear that the blog has moved.
Friday, September 12, 2008
Quick Bites: Probably Irrelevant. (Not!)
Thanks to Jeff Dalton for spreading the word about a new information retrieval blog: Probably Irrelevant. It's a group blog, currently listing Fernando Diaz and Jon Elsas as contributors. Given the authors and the blog name's anagram of "Re-plan IR revolt, baby!", I expect great things!
Wednesday, September 10, 2008
Fun with Twitter
I recently joined Twitter and asked the twitterverse for opinions about DreamHost vs. GoDaddy as a platform to host this blog on WordPress. I was shocked when I noticed today that I'd gotten this response from the President / COO of GoDaddy (or perhaps a sales rep posing as such).
Seems like a lot of work for customer acquisition!
Seems like a lot of work for customer acquisition!
Quick Bites: Email becomes a Dangerous Distraction
Just read this article citing a number of studies to the effect that email is a major productivity drain. Nothing surprising to me--a lot of us have learned the hard way that the only way to be productive is to not check email constantly.
But I am curious if anyone has made progress on tools that alert you to emails that do call for immediate attention. I'm personally a fan of attention bonds approaches, but I imagine that the machine learning folks have at least thought about this as a sort of inverse spam filtering problem.
But I am curious if anyone has made progress on tools that alert you to emails that do call for immediate attention. I'm personally a fan of attention bonds approaches, but I imagine that the machine learning folks have at least thought about this as a sort of inverse spam filtering problem.
Tuesday, September 9, 2008
Quick Bites: The Clickwheel Must Die
As someone who's long felt that the iPod's clickwheel violates Fitts's law, I was delighted to read this Gizmodo article asserting that the iPod's clickwheel must die. My choice quote:
Now I'm waiting to see someone fix the tiny minimize/maximize/close buttons in the upper right corner on Windows, which I suspect have become the textbook example of violating Fitts's law.
Quite simply, the clickwheel hasn't scaled to handle the long, modern day menus in powerful iPods.Fortunately Apple recognized its mistake on this one and fixed the problem in its touch interface. Though, to be clear, the problem was not inherent in the choice of a wheel interface, but rather in the requirement to make gratuitously precise selections.
Now I'm waiting to see someone fix the tiny minimize/maximize/close buttons in the upper right corner on Windows, which I suspect have become the textbook example of violating Fitts's law.
Monday, September 8, 2008
Incentives for Active Users
Some of the most successful web sites today are social networks, such as Facebook and LinkedIn. These are not only popular web sites; they are also remarkably effective people search tools. For example, I can use LinkedIn to find the 163 people in my network who mention "information retrieval" in their profiles and live within 50 miles of my ZIP code (I can't promise you'll see the same results!).
A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.
First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.
Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.
Many people have noted the network effect that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.
Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a Professional Marketplace, powered by Endeca, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, saving IBM $500M in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the ACM, and their seeing great uptake in their author profile pages.
I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.
A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.
First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.
Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.
Many people have noted the network effect that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.
Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a Professional Marketplace, powered by Endeca, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, saving IBM $500M in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the ACM, and their seeing great uptake in their author profile pages.
I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.
Quick Bites: Taxonomy Directed Folksonomies
Props to Gwen Harris at Taxonomy Watch for posting a paper by Sarah Hayman and Nick Lothian on Taxonomy Directed Folksonomies.
The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.
I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.
The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.
I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.
Sunday, September 7, 2008
Quick Bites: Is Search Really 90% Solved?
Props to Michael Arrington for calling out this snippet in an interview with Marissa Mayer, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:
Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.I agree with Michael that search isn't even close to being solved yet. I've criticized the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of open problems in search for those ambitious enough to tackle them.
Quick Bites: Applying Turing's Ideas to Search
A colleague of mine at Endeca recently pointed me to a post by John Ferrara at Boxes and Arrows entitled Applying Turing's Ideas to Search.
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
If the user has not phrased her search clearly enough for another person to understand what she’s trying to find, then it’s not reasonable to expect that a comparatively "dumb" machine could do better. In a Turing test, the response to a question incomprehensible even to humans would prove nothing, because it wouldn’t provide any distinction between person and machine.While I'm not convinced that search engine designers should be aspiring to pass the Turing test, I agree wholeheartedly with the vision John puts forward:
It describes an ideal form of human-computer interaction in which people express their information needs in their own words, and the system understands and responds to their requests as another human being would. During my usability test, it became clear that this was the very standard to which my test participants held search engines.It's not about the search engine convincing the user that another human being is producing the answers, but rather engaging users in a conversation that helps them articulate and elaborate their information needs. Or, as we like to call it around here, HCIR.
Saturday, September 6, 2008
Migrating Soon
Just another reminder that I expect to migrate this blog to a hosted WordPress platform in the next days. If you have opinions about hosting platforms, please let me know by commenting here. Right now, I'm debating between DreamHost and GoDaddy, but I'm very open to suggestions.
I will do everything in my power to minimize disruption--not sure how easy Blogger will make it to redirect users to the new site. I'll probably post here for a while after to the move to try to direct traffic.
I do expect the new site to be under a domain name I've already reserved: http://thenoisychannel.com. It currently forwards to Blogger.
I will do everything in my power to minimize disruption--not sure how easy Blogger will make it to redirect users to the new site. I'll probably post here for a while after to the move to try to direct traffic.
I do expect the new site to be under a domain name I've already reserved: http://thenoisychannel.com. It currently forwards to Blogger.
Back from the Endeca Government Summit
I spent Thursday at the Endeca Government Summit, where I had the privilege to chat face-to-face with some Noisy Channel readers. Mostly, I was there to learn more about the sorts of information seeking problems people are facing in the public sector in general, and in the intelligence agencies in particular.
While I can't go into much detail, the key concern was exploration of information availability. This problem is the antithesis of known-item search: rather than you are trying to retrieve information you know exist (and which you know how to specify), you are trying to determine if there is information available that would help you with a particular task.
Despite being lost in a sea of TLAs, I came away with a deepened appreciation of both the problems the intelligence agencies are trying to address and the relevance of exploratory search approaches to those problems.
While I can't go into much detail, the key concern was exploration of information availability. This problem is the antithesis of known-item search: rather than you are trying to retrieve information you know exist (and which you know how to specify), you are trying to determine if there is information available that would help you with a particular task.
Despite being lost in a sea of TLAs, I came away with a deepened appreciation of both the problems the intelligence agencies are trying to address and the relevance of exploratory search approaches to those problems.
Labels:
Endeca,
exploratory search,
intelligence analysis
Thursday, September 4, 2008
Query Elaboration as a Dialogue
I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?
The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.
Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.
That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones.
But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...
Tuesday, September 2, 2008
Migrating to WordPress
Just a quick note to let folks know that I'll be migrating to WordPress in the next days. I'll make every effort to have to move be seamless. I have secured the domain name http://thenoisychannel.com, which currently forwards Blogger, but will shift to wherever the blog is hosted. I apologize in advance for any disruption.
Quick Bites: Google Chrome
For those of you who thought that no major technology news would come out during the Labor Day weekend, check out the prematurely released comic book hailing Google Chrome, Google's long rumored entry into browser wars. By the time you are reading this, the (Windows only) beta may even be available for download. The official Google announcement is here.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
Monday, September 1, 2008
Quick Bites: E-Discovery and Transparency
One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
Labels:
e-Discovery,
Relevance,
Search,
transparency
POLL: Blogging Platform
I've gotten a fair amount of feedback suggesting that I switch blogging platforms. Since I'd plan to make such changes infrequently, I'd like to get input from readers before doing so, especially since migration may have hiccups.
I've just posted a poll on the home page to ask if folks here have a preference as to which blogging platform I use. Please vote this week, and feel free to post comments here.
I've just posted a poll on the home page to ask if folks here have a preference as to which blogging platform I use. Please vote this week, and feel free to post comments here.
Subscribe to:
Posts (Atom)
Tuesday, September 16, 2008
We've Moved!
Please redirect your readers to http://thenoisychannel.com! The RSS feed is available at http://thenoisychannel.com/?feed=rss2.
See you all there...
See you all there...
Migrating Tonight!
At long last, this blog will migrate over to a hosted WordPress platform at http://thenoisychannel.com/. Thanks to Andy Milk (and to Endeca for lending me his services) and especially to Noisy Channel regular David Fauth for making this promised migration a reality!
As of midnight EST, please visit the new site. My goal is to redirect all incoming Blogger traffic to the new hosted site. This will be the last post here at Blogger.
p.s. Please note that I'll be manually migrating any content (posts and comments) from the past 5 days, i.e., since I performed an import on September 12th. My apologies if anything is lost in translation.
As of midnight EST, please visit the new site. My goal is to redirect all incoming Blogger traffic to the new hosted site. This will be the last post here at Blogger.
p.s. Please note that I'll be manually migrating any content (posts and comments) from the past 5 days, i.e., since I performed an import on September 12th. My apologies if anything is lost in translation.
Quick Bites: Search Evaluation at Google
Quick Bites: Is Wikipedia Production Slowing Down?
Thanks to Sérgio for tweeting this post by Peter Pirolli at PARC: Is Wikipedia Production Slowing Down?
Here's the picture showing the reduction of growth in the number of Wikipedia editors over time:
Interesting material and commentary at Augmented Social Cognition and Peter Pirolli's blog. Are people are running out of things to write about?
Here's the picture showing the reduction of growth in the number of Wikipedia editors over time:
Interesting material and commentary at Augmented Social Cognition and Peter Pirolli's blog. Are people are running out of things to write about?
Monday, September 15, 2008
Information Accountability
The recent United Airlines stock fiasco triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) included a link to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.
For anyone who wants all of the gory details, Google's version of the story is here; the Tribune Company's version is here.
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
- The first was a piece in BBC News about a speech by Sir Tim Berners-Lee expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's Angels and Demons), and rumors that a vaccine given to children in Britain was harmful.
- The second was a column in the New York Times about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
Labels:
Google,
Information technology,
social media,
transparency
Sunday, September 14, 2008
Is Blog Search Different?
Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.
The position paper suggests focusing on 3 three kinds of search tasks:
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
The position paper suggests focusing on 3 three kinds of search tasks:
- Find out what are people thinking or feeling about X over time.
- Find good blogs/authors to read.
- Find useful information that was published in blogs sometime in the past.
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
Saturday, September 13, 2008
Progress on the Migration
Please check out http://thenoisychannel.com/ to see the future of The Noisy Channel in progress. I'm using WordPress hosted on GoDaddy and did the minimum work to port all posts and comments (not including this one).
Here is the my current list of tasks that I'd like to get done before we move.
Here is the my current list of tasks that I'd like to get done before we move.
- Design! I'm currently using the default WordPress theme, which is pretty lame. I'm inclined to use a clean but stylish two-column theme that is widget-friendly. Maybe Cutline. In any case, I'd like the new site to be a tad less spartan before we move into it.
- Internal Links. My habit of linking back to previous posts now means I have to map those links to the new posts. I suspect I'll do it manually, since I don't see an easy way to automate it.
- Redirects. Unfortunately I don't think I can actually get Blogger to redirect traffic automatically. So my plan is to post signage throughout this blog making it clear that the blog has moved.
Friday, September 12, 2008
Quick Bites: Probably Irrelevant. (Not!)
Thanks to Jeff Dalton for spreading the word about a new information retrieval blog: Probably Irrelevant. It's a group blog, currently listing Fernando Diaz and Jon Elsas as contributors. Given the authors and the blog name's anagram of "Re-plan IR revolt, baby!", I expect great things!
Wednesday, September 10, 2008
Fun with Twitter
I recently joined Twitter and asked the twitterverse for opinions about DreamHost vs. GoDaddy as a platform to host this blog on WordPress. I was shocked when I noticed today that I'd gotten this response from the President / COO of GoDaddy (or perhaps a sales rep posing as such).
Seems like a lot of work for customer acquisition!
Seems like a lot of work for customer acquisition!
Quick Bites: Email becomes a Dangerous Distraction
Just read this article citing a number of studies to the effect that email is a major productivity drain. Nothing surprising to me--a lot of us have learned the hard way that the only way to be productive is to not check email constantly.
But I am curious if anyone has made progress on tools that alert you to emails that do call for immediate attention. I'm personally a fan of attention bonds approaches, but I imagine that the machine learning folks have at least thought about this as a sort of inverse spam filtering problem.
But I am curious if anyone has made progress on tools that alert you to emails that do call for immediate attention. I'm personally a fan of attention bonds approaches, but I imagine that the machine learning folks have at least thought about this as a sort of inverse spam filtering problem.
Tuesday, September 9, 2008
Quick Bites: The Clickwheel Must Die
As someone who's long felt that the iPod's clickwheel violates Fitts's law, I was delighted to read this Gizmodo article asserting that the iPod's clickwheel must die. My choice quote:
Now I'm waiting to see someone fix the tiny minimize/maximize/close buttons in the upper right corner on Windows, which I suspect have become the textbook example of violating Fitts's law.
Quite simply, the clickwheel hasn't scaled to handle the long, modern day menus in powerful iPods.Fortunately Apple recognized its mistake on this one and fixed the problem in its touch interface. Though, to be clear, the problem was not inherent in the choice of a wheel interface, but rather in the requirement to make gratuitously precise selections.
Now I'm waiting to see someone fix the tiny minimize/maximize/close buttons in the upper right corner on Windows, which I suspect have become the textbook example of violating Fitts's law.
Monday, September 8, 2008
Incentives for Active Users
Some of the most successful web sites today are social networks, such as Facebook and LinkedIn. These are not only popular web sites; they are also remarkably effective people search tools. For example, I can use LinkedIn to find the 163 people in my network who mention "information retrieval" in their profiles and live within 50 miles of my ZIP code (I can't promise you'll see the same results!).
A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.
First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.
Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.
Many people have noted the network effect that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.
Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a Professional Marketplace, powered by Endeca, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, saving IBM $500M in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the ACM, and their seeing great uptake in their author profile pages.
I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.
A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.
First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.
Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.
Many people have noted the network effect that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.
Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a Professional Marketplace, powered by Endeca, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, saving IBM $500M in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the ACM, and their seeing great uptake in their author profile pages.
I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.
Quick Bites: Taxonomy Directed Folksonomies
Props to Gwen Harris at Taxonomy Watch for posting a paper by Sarah Hayman and Nick Lothian on Taxonomy Directed Folksonomies.
The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.
I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.
The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.
I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.
Sunday, September 7, 2008
Quick Bites: Is Search Really 90% Solved?
Props to Michael Arrington for calling out this snippet in an interview with Marissa Mayer, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:
Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.I agree with Michael that search isn't even close to being solved yet. I've criticized the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of open problems in search for those ambitious enough to tackle them.
Quick Bites: Applying Turing's Ideas to Search
A colleague of mine at Endeca recently pointed me to a post by John Ferrara at Boxes and Arrows entitled Applying Turing's Ideas to Search.
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
If the user has not phrased her search clearly enough for another person to understand what she’s trying to find, then it’s not reasonable to expect that a comparatively "dumb" machine could do better. In a Turing test, the response to a question incomprehensible even to humans would prove nothing, because it wouldn’t provide any distinction between person and machine.While I'm not convinced that search engine designers should be aspiring to pass the Turing test, I agree wholeheartedly with the vision John puts forward:
It describes an ideal form of human-computer interaction in which people express their information needs in their own words, and the system understands and responds to their requests as another human being would. During my usability test, it became clear that this was the very standard to which my test participants held search engines.It's not about the search engine convincing the user that another human being is producing the answers, but rather engaging users in a conversation that helps them articulate and elaborate their information needs. Or, as we like to call it around here, HCIR.
Saturday, September 6, 2008
Migrating Soon
Just another reminder that I expect to migrate this blog to a hosted WordPress platform in the next days. If you have opinions about hosting platforms, please let me know by commenting here. Right now, I'm debating between DreamHost and GoDaddy, but I'm very open to suggestions.
I will do everything in my power to minimize disruption--not sure how easy Blogger will make it to redirect users to the new site. I'll probably post here for a while after to the move to try to direct traffic.
I do expect the new site to be under a domain name I've already reserved: http://thenoisychannel.com. It currently forwards to Blogger.
I will do everything in my power to minimize disruption--not sure how easy Blogger will make it to redirect users to the new site. I'll probably post here for a while after to the move to try to direct traffic.
I do expect the new site to be under a domain name I've already reserved: http://thenoisychannel.com. It currently forwards to Blogger.
Back from the Endeca Government Summit
I spent Thursday at the Endeca Government Summit, where I had the privilege to chat face-to-face with some Noisy Channel readers. Mostly, I was there to learn more about the sorts of information seeking problems people are facing in the public sector in general, and in the intelligence agencies in particular.
While I can't go into much detail, the key concern was exploration of information availability. This problem is the antithesis of known-item search: rather than you are trying to retrieve information you know exist (and which you know how to specify), you are trying to determine if there is information available that would help you with a particular task.
Despite being lost in a sea of TLAs, I came away with a deepened appreciation of both the problems the intelligence agencies are trying to address and the relevance of exploratory search approaches to those problems.
While I can't go into much detail, the key concern was exploration of information availability. This problem is the antithesis of known-item search: rather than you are trying to retrieve information you know exist (and which you know how to specify), you are trying to determine if there is information available that would help you with a particular task.
Despite being lost in a sea of TLAs, I came away with a deepened appreciation of both the problems the intelligence agencies are trying to address and the relevance of exploratory search approaches to those problems.
Labels:
Endeca,
exploratory search,
intelligence analysis
Thursday, September 4, 2008
Query Elaboration as a Dialogue
I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?
The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.
Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.
That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones.
But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...
Tuesday, September 2, 2008
Migrating to WordPress
Just a quick note to let folks know that I'll be migrating to WordPress in the next days. I'll make every effort to have to move be seamless. I have secured the domain name http://thenoisychannel.com, which currently forwards Blogger, but will shift to wherever the blog is hosted. I apologize in advance for any disruption.
Quick Bites: Google Chrome
For those of you who thought that no major technology news would come out during the Labor Day weekend, check out the prematurely released comic book hailing Google Chrome, Google's long rumored entry into browser wars. By the time you are reading this, the (Windows only) beta may even be available for download. The official Google announcement is here.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
Monday, September 1, 2008
Quick Bites: E-Discovery and Transparency
One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
Labels:
e-Discovery,
Relevance,
Search,
transparency
POLL: Blogging Platform
I've gotten a fair amount of feedback suggesting that I switch blogging platforms. Since I'd plan to make such changes infrequently, I'd like to get input from readers before doing so, especially since migration may have hiccups.
I've just posted a poll on the home page to ask if folks here have a preference as to which blogging platform I use. Please vote this week, and feel free to post comments here.
I've just posted a poll on the home page to ask if folks here have a preference as to which blogging platform I use. Please vote this week, and feel free to post comments here.
Subscribe to:
Posts (Atom)
Tuesday, September 16, 2008
We've Moved!
Please redirect your readers to http://thenoisychannel.com! The RSS feed is available at http://thenoisychannel.com/?feed=rss2.
See you all there...
See you all there...
Migrating Tonight!
At long last, this blog will migrate over to a hosted WordPress platform at http://thenoisychannel.com/. Thanks to Andy Milk (and to Endeca for lending me his services) and especially to Noisy Channel regular David Fauth for making this promised migration a reality!
As of midnight EST, please visit the new site. My goal is to redirect all incoming Blogger traffic to the new hosted site. This will be the last post here at Blogger.
p.s. Please note that I'll be manually migrating any content (posts and comments) from the past 5 days, i.e., since I performed an import on September 12th. My apologies if anything is lost in translation.
As of midnight EST, please visit the new site. My goal is to redirect all incoming Blogger traffic to the new hosted site. This will be the last post here at Blogger.
p.s. Please note that I'll be manually migrating any content (posts and comments) from the past 5 days, i.e., since I performed an import on September 12th. My apologies if anything is lost in translation.
Quick Bites: Search Evaluation at Google
Quick Bites: Is Wikipedia Production Slowing Down?
Thanks to Sérgio for tweeting this post by Peter Pirolli at PARC: Is Wikipedia Production Slowing Down?
Here's the picture showing the reduction of growth in the number of Wikipedia editors over time:
Interesting material and commentary at Augmented Social Cognition and Peter Pirolli's blog. Are people are running out of things to write about?
Here's the picture showing the reduction of growth in the number of Wikipedia editors over time:
Interesting material and commentary at Augmented Social Cognition and Peter Pirolli's blog. Are people are running out of things to write about?
Monday, September 15, 2008
Information Accountability
The recent United Airlines stock fiasco triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) included a link to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.
For anyone who wants all of the gory details, Google's version of the story is here; the Tribune Company's version is here.
I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:
- The first was a piece in BBC News about a speech by Sir Tim Berners-Lee expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's Angels and Demons), and rumors that a vaccine given to children in Britain was harmful.
- The second was a column in the New York Times about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."
Whether we're worrying about Google bombing, Google bowling, or what Gartner analyst Whit Andrews calls "denial-of-insight" attacks, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is "all the news that's fit to print" or "don't be evil", our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.
But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating Richard Jewell as a "person of interest" in the Centennial Olympic Park bombing (cf. "Olympic Park Bomber" Eric Robert Rudolph), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.
It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of telephone knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.
The simplest answer is that we are accountable for how we consume information: caveat lector. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?
There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.
Labels:
Google,
Information technology,
social media,
transparency
Sunday, September 14, 2008
Is Blog Search Different?
Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.
The position paper suggests focusing on 3 three kinds of search tasks:
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
The position paper suggests focusing on 3 three kinds of search tasks:
- Find out what are people thinking or feeling about X over time.
- Find good blogs/authors to read.
- Find useful information that was published in blogs sometime in the past.
But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?
Saturday, September 13, 2008
Progress on the Migration
Please check out http://thenoisychannel.com/ to see the future of The Noisy Channel in progress. I'm using WordPress hosted on GoDaddy and did the minimum work to port all posts and comments (not including this one).
Here is the my current list of tasks that I'd like to get done before we move.
Here is the my current list of tasks that I'd like to get done before we move.
- Design! I'm currently using the default WordPress theme, which is pretty lame. I'm inclined to use a clean but stylish two-column theme that is widget-friendly. Maybe Cutline. In any case, I'd like the new site to be a tad less spartan before we move into it.
- Internal Links. My habit of linking back to previous posts now means I have to map those links to the new posts. I suspect I'll do it manually, since I don't see an easy way to automate it.
- Redirects. Unfortunately I don't think I can actually get Blogger to redirect traffic automatically. So my plan is to post signage throughout this blog making it clear that the blog has moved.
Friday, September 12, 2008
Quick Bites: Probably Irrelevant. (Not!)
Thanks to Jeff Dalton for spreading the word about a new information retrieval blog: Probably Irrelevant. It's a group blog, currently listing Fernando Diaz and Jon Elsas as contributors. Given the authors and the blog name's anagram of "Re-plan IR revolt, baby!", I expect great things!
Wednesday, September 10, 2008
Fun with Twitter
I recently joined Twitter and asked the twitterverse for opinions about DreamHost vs. GoDaddy as a platform to host this blog on WordPress. I was shocked when I noticed today that I'd gotten this response from the President / COO of GoDaddy (or perhaps a sales rep posing as such).
Seems like a lot of work for customer acquisition!
Seems like a lot of work for customer acquisition!
Quick Bites: Email becomes a Dangerous Distraction
Just read this article citing a number of studies to the effect that email is a major productivity drain. Nothing surprising to me--a lot of us have learned the hard way that the only way to be productive is to not check email constantly.
But I am curious if anyone has made progress on tools that alert you to emails that do call for immediate attention. I'm personally a fan of attention bonds approaches, but I imagine that the machine learning folks have at least thought about this as a sort of inverse spam filtering problem.
But I am curious if anyone has made progress on tools that alert you to emails that do call for immediate attention. I'm personally a fan of attention bonds approaches, but I imagine that the machine learning folks have at least thought about this as a sort of inverse spam filtering problem.
Tuesday, September 9, 2008
Quick Bites: The Clickwheel Must Die
As someone who's long felt that the iPod's clickwheel violates Fitts's law, I was delighted to read this Gizmodo article asserting that the iPod's clickwheel must die. My choice quote:
Now I'm waiting to see someone fix the tiny minimize/maximize/close buttons in the upper right corner on Windows, which I suspect have become the textbook example of violating Fitts's law.
Quite simply, the clickwheel hasn't scaled to handle the long, modern day menus in powerful iPods.Fortunately Apple recognized its mistake on this one and fixed the problem in its touch interface. Though, to be clear, the problem was not inherent in the choice of a wheel interface, but rather in the requirement to make gratuitously precise selections.
Now I'm waiting to see someone fix the tiny minimize/maximize/close buttons in the upper right corner on Windows, which I suspect have become the textbook example of violating Fitts's law.
Monday, September 8, 2008
Incentives for Active Users
Some of the most successful web sites today are social networks, such as Facebook and LinkedIn. These are not only popular web sites; they are also remarkably effective people search tools. For example, I can use LinkedIn to find the 163 people in my network who mention "information retrieval" in their profiles and live within 50 miles of my ZIP code (I can't promise you'll see the same results!).
A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.
First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.
Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.
Many people have noted the network effect that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.
Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a Professional Marketplace, powered by Endeca, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, saving IBM $500M in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the ACM, and their seeing great uptake in their author profile pages.
I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.
A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.
First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.
Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.
Many people have noted the network effect that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.
Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a Professional Marketplace, powered by Endeca, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, saving IBM $500M in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the ACM, and their seeing great uptake in their author profile pages.
I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.
Quick Bites: Taxonomy Directed Folksonomies
Props to Gwen Harris at Taxonomy Watch for posting a paper by Sarah Hayman and Nick Lothian on Taxonomy Directed Folksonomies.
The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.
I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.
The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.
I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.
Sunday, September 7, 2008
Quick Bites: Is Search Really 90% Solved?
Props to Michael Arrington for calling out this snippet in an interview with Marissa Mayer, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:
Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.I agree with Michael that search isn't even close to being solved yet. I've criticized the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of open problems in search for those ambitious enough to tackle them.
Quick Bites: Applying Turing's Ideas to Search
A colleague of mine at Endeca recently pointed me to a post by John Ferrara at Boxes and Arrows entitled Applying Turing's Ideas to Search.
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:
If the user has not phrased her search clearly enough for another person to understand what she’s trying to find, then it’s not reasonable to expect that a comparatively "dumb" machine could do better. In a Turing test, the response to a question incomprehensible even to humans would prove nothing, because it wouldn’t provide any distinction between person and machine.While I'm not convinced that search engine designers should be aspiring to pass the Turing test, I agree wholeheartedly with the vision John puts forward:
It describes an ideal form of human-computer interaction in which people express their information needs in their own words, and the system understands and responds to their requests as another human being would. During my usability test, it became clear that this was the very standard to which my test participants held search engines.It's not about the search engine convincing the user that another human being is producing the answers, but rather engaging users in a conversation that helps them articulate and elaborate their information needs. Or, as we like to call it around here, HCIR.
Saturday, September 6, 2008
Migrating Soon
Just another reminder that I expect to migrate this blog to a hosted WordPress platform in the next days. If you have opinions about hosting platforms, please let me know by commenting here. Right now, I'm debating between DreamHost and GoDaddy, but I'm very open to suggestions.
I will do everything in my power to minimize disruption--not sure how easy Blogger will make it to redirect users to the new site. I'll probably post here for a while after to the move to try to direct traffic.
I do expect the new site to be under a domain name I've already reserved: http://thenoisychannel.com. It currently forwards to Blogger.
I will do everything in my power to minimize disruption--not sure how easy Blogger will make it to redirect users to the new site. I'll probably post here for a while after to the move to try to direct traffic.
I do expect the new site to be under a domain name I've already reserved: http://thenoisychannel.com. It currently forwards to Blogger.
Back from the Endeca Government Summit
I spent Thursday at the Endeca Government Summit, where I had the privilege to chat face-to-face with some Noisy Channel readers. Mostly, I was there to learn more about the sorts of information seeking problems people are facing in the public sector in general, and in the intelligence agencies in particular.
While I can't go into much detail, the key concern was exploration of information availability. This problem is the antithesis of known-item search: rather than you are trying to retrieve information you know exist (and which you know how to specify), you are trying to determine if there is information available that would help you with a particular task.
Despite being lost in a sea of TLAs, I came away with a deepened appreciation of both the problems the intelligence agencies are trying to address and the relevance of exploratory search approaches to those problems.
While I can't go into much detail, the key concern was exploration of information availability. This problem is the antithesis of known-item search: rather than you are trying to retrieve information you know exist (and which you know how to specify), you are trying to determine if there is information available that would help you with a particular task.
Despite being lost in a sea of TLAs, I came away with a deepened appreciation of both the problems the intelligence agencies are trying to address and the relevance of exploratory search approaches to those problems.
Labels:
Endeca,
exploratory search,
intelligence analysis
Thursday, September 4, 2008
Query Elaboration as a Dialogue
I ended my post on transparency in information retrieval with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?
The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.
Those of you who have been reading this blog for a while or who are familiar with what I do at Endeca shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.
That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones.
But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...
Tuesday, September 2, 2008
Migrating to WordPress
Just a quick note to let folks know that I'll be migrating to WordPress in the next days. I'll make every effort to have to move be seamless. I have secured the domain name http://thenoisychannel.com, which currently forwards Blogger, but will shift to wherever the blog is hosted. I apologize in advance for any disruption.
Quick Bites: Google Chrome
For those of you who thought that no major technology news would come out during the Labor Day weekend, check out the prematurely released comic book hailing Google Chrome, Google's long rumored entry into browser wars. By the time you are reading this, the (Windows only) beta may even be available for download. The official Google announcement is here.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can supplant the operating system with the browser, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.
Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.
Monday, September 1, 2008
Quick Bites: E-Discovery and Transparency
One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
I just read an article on how courts will tolerate search inaccuracies in e-Discovery by way of Curt Monash. It reminded me of our recent discussion of transparency in information retrieval. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.
I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.
Labels:
e-Discovery,
Relevance,
Search,
transparency
POLL: Blogging Platform
I've gotten a fair amount of feedback suggesting that I switch blogging platforms. Since I'd plan to make such changes infrequently, I'd like to get input from readers before doing so, especially since migration may have hiccups.
I've just posted a poll on the home page to ask if folks here have a preference as to which blogging platform I use. Please vote this week, and feel free to post comments here.
I've just posted a poll on the home page to ask if folks here have a preference as to which blogging platform I use. Please vote this week, and feel free to post comments here.
Subscribe to:
Posts (Atom)