A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,
Here is a brief explanation from the site:
When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.
Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.
I've been interested in Von Ahn's work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site:
Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.
If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.
A few important details that this description leaves out:
- The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
- A Seeker loses points (I can't recall how many) for wrong guesses.
- The game has a time limit (hence the "Quick!").
Now, let's unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.
A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.
Assuming these simplifications, here is how a Seeker plays Phetch:
- Read the description provided by the Describer and uses it to compose a search.
- Scan the results sequentially, interrupting either to make a guess or to reformulate the search.
The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.
Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR '07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin's grand challenge.
2 comments:
What do you suppose a seeker does to decide whether to continue serial examination of search results, to reformulate the search, or to abandon the search? One way for sure is when the seeker identifies an unwanted cluster of results that can be easily eliminated in the query language. But what else? (You abandon the search when you can't find a pattern in the results or can't figure out how to refine the query to eliminate unwanted clusters).
Stefano, an excellent question! While every game and every user is unique, here is how I believe a typical Seeker plays.
The Seeker reads the description and tries to come up with a search that accurately represents the query, favoring the topical words in the description.
For example, if the Describer has entered "Michael Jackson wearing a funny-looking sailor hat", then the Seeker's initial search might be michael jackson sailor hat.
Now comes the fun part. The Seeker, making a quick scan through the list, finds zero, one or multiple images that match the description. The speed of visual scan makes it likely that the Seeker will look at least a small ways past the last deliberately scanned image. Let's consider the three cases.
If the Seeker finds exactly one matching image, score! No guarantee of a correct guess, but this is the time to try.
If the Seeker finds no images, he or she reformulates the query, either to favor precision (e.g., adding funny-looking or to favor recall (e.g., replace sailor hat with just hat).
But what if the Seeker finds multiple images that match the description? Here it's worth dropping one of the simplifying assumptions I made--namely, that the description is static. In fact, the Describer can add information asynchronously, is is likely to do so if the Seeker is taking too long. The information can be used, as we have already seen, to help make better queries. But it can also be used to pinpoint a specific result. For example, a detail like the picture seem slightly distorted on the left side might not help the Seeker formulate a better search, but might help the Seeker identify the correct image among the results.
As I said, I'm speculating, but I believe this is at least close to how most Seekers play. Not all that different from hunting down a reference about which you can only remember some random details. :)
Post a Comment