Monday, September 8, 2008
Incentives for Active Users
A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.
First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.
Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.
Many people have noted the network effect that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.
Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a Professional Marketplace, powered by Endeca, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, saving IBM $500M in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the ACM, and their seeing great uptake in their author profile pages.
I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.
Quick Bites: Taxonomy Directed Folksonomies
The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.
I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.
Saturday, May 24, 2008
Games With an HCIR Purpose?
A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,
Here is a brief explanation from the site:
When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.
Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.
I've been interested in Von Ahn's work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site:
Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.
If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.
A few important details that this description leaves out:
- The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
- A Seeker loses points (I can't recall how many) for wrong guesses.
- The game has a time limit (hence the "Quick!").
Now, let's unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.
A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.
Assuming these simplifications, here is how a Seeker plays Phetch:
- Read the description provided by the Describer and uses it to compose a search.
- Scan the results sequentially, interrupting either to make a guess or to reformulate the search.
The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.
Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR '07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin's grand challenge.
Wednesday, April 23, 2008
The Efficiency of Social Tagging
After seeing this and the TagMaps work at Yahoo Research Berkeley, I feel that the IR and HCI communities should join forces to understand social tagging in general terms that relate information, knowledge representation, and human beings. These concerns are hardly specific to the web or to what is now called "social media"--after all, media is social by definition. Indeed, there is no reason to confine this approach to human-tagged collections--why not consider automated tagging systems on the same playing field?
Monday, September 8, 2008
Incentives for Active Users
A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.
First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.
Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.
Many people have noted the network effect that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.
Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a Professional Marketplace, powered by Endeca, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, saving IBM $500M in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the ACM, and their seeing great uptake in their author profile pages.
I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.
Quick Bites: Taxonomy Directed Folksonomies
The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.
I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.
Saturday, May 24, 2008
Games With an HCIR Purpose?
A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,
Here is a brief explanation from the site:
When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.
Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.
I've been interested in Von Ahn's work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site:
Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.
If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.
A few important details that this description leaves out:
- The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
- A Seeker loses points (I can't recall how many) for wrong guesses.
- The game has a time limit (hence the "Quick!").
Now, let's unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.
A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.
Assuming these simplifications, here is how a Seeker plays Phetch:
- Read the description provided by the Describer and uses it to compose a search.
- Scan the results sequentially, interrupting either to make a guess or to reformulate the search.
The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.
Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR '07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin's grand challenge.
Wednesday, April 23, 2008
The Efficiency of Social Tagging
After seeing this and the TagMaps work at Yahoo Research Berkeley, I feel that the IR and HCI communities should join forces to understand social tagging in general terms that relate information, knowledge representation, and human beings. These concerns are hardly specific to the web or to what is now called "social media"--after all, media is social by definition. Indeed, there is no reason to confine this approach to human-tagged collections--why not consider automated tagging systems on the same playing field?
Monday, September 8, 2008
Incentives for Active Users
A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.
First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.
Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.
Many people have noted the network effect that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.
Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a Professional Marketplace, powered by Endeca, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, saving IBM $500M in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the ACM, and their seeing great uptake in their author profile pages.
I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.
Quick Bites: Taxonomy Directed Folksonomies
The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.
I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.
Saturday, May 24, 2008
Games With an HCIR Purpose?
A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,
Here is a brief explanation from the site:
When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.
Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.
I've been interested in Von Ahn's work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site:
Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.
If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.
A few important details that this description leaves out:
- The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
- A Seeker loses points (I can't recall how many) for wrong guesses.
- The game has a time limit (hence the "Quick!").
Now, let's unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.
A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.
Assuming these simplifications, here is how a Seeker plays Phetch:
- Read the description provided by the Describer and uses it to compose a search.
- Scan the results sequentially, interrupting either to make a guess or to reformulate the search.
The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.
Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR '07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin's grand challenge.
Wednesday, April 23, 2008
The Efficiency of Social Tagging
After seeing this and the TagMaps work at Yahoo Research Berkeley, I feel that the IR and HCI communities should join forces to understand social tagging in general terms that relate information, knowledge representation, and human beings. These concerns are hardly specific to the web or to what is now called "social media"--after all, media is social by definition. Indeed, there is no reason to confine this approach to human-tagged collections--why not consider automated tagging systems on the same playing field?