<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-8016696494330504473</id><updated>2011-04-23T12:31:34.217-04:00</updated><category term='Amit Singhal'/><category term='Usability'/><category term='Information technology'/><category term='Evaluation'/><category term='Knowledge representation'/><category term='collaborative filtering'/><category term='University of Glasgow'/><category term='Accessibility'/><category term='PARC'/><category term='Information Retrieval'/><category term='HCIR'/><category term='Ellen Voorhees'/><category term='uncertainty'/><category term='Powerset'/><category term='e-Discovery'/><category term='psychology'/><category term='Wikipedia'/><category term='Library and Information Science'/><category term='Discover &apos;08'/><category term='social navigation'/><category term='Natural language processing'/><category term='LinkedIn'/><category term='Privacy'/><category term='H. V. Jagadish'/><category term='Gian-Carlo Rota'/><category term='probability'/><category term='exploratory search'/><category term='blogs'/><category term='knowledge management'/><category term='hakia'/><category term='Visualization'/><category term='intelligence analysis'/><category term='Leif Azzopardi'/><category term='Endeca'/><category term='Fitts&apos;s Law'/><category term='HCI'/><category term='Columbia University'/><category term='Business intelligence'/><category term='XML'/><category term='Search'/><category term='Google'/><category term='faceted navigation'/><category term='Enterprise Search'/><category term='TREC'/><category term='Information Theory'/><category term='Yahoo Research'/><category term='transparency'/><category term='Collaborative tagging'/><category term='Database'/><category term='Jeff Naughton'/><category term='Nick Belkin'/><category term='Tefko Saracevic'/><category term='Relevance'/><category term='ECIR'/><category term='social media'/><category term='machine learning'/><category term='Cranfield'/><category term='Dagstuhl'/><category term='Database Usability'/><title type='text'>Redirecting to http://thenoisychannel.com...</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>93</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8422933853113111590</id><published>2008-09-16T17:19:00.001-04:00</published><updated>2008-09-16T17:21:11.253-04:00</updated><title type='text'>We've Moved!</title><content type='html'>Please redirect your readers to &lt;a href="http://thenoisychannel.com/"&gt;http://thenoisychannel.com&lt;/a&gt;! The RSS feed is available at &lt;a href="http://thenoisychannel.com/?feed=rss2"&gt;http://thenoisychannel.com/?feed=rss2&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;See you all there...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8422933853113111590?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8422933853113111590/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8422933853113111590' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8422933853113111590'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8422933853113111590'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/weve-moved.html' title='We&apos;ve Moved!'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-4565799435089103496</id><published>2008-09-16T15:24:00.004-04:00</published><updated>2008-09-16T15:33:07.771-04:00</updated><title type='text'>Migrating Tonight!</title><content type='html'>At long last, this blog will migrate over to a hosted WordPress platform at &lt;a href="http://thenoisychannel.com/"&gt;http://thenoisychannel.com/&lt;/a&gt;. Thanks to &lt;a href="http://thenoisychannel.com/"&gt;Andy Milk&lt;/a&gt; (and to &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt; for lending me his services) and especially to Noisy Channel regular &lt;a href="http://iwspaces.blogspot.com/"&gt;David Fauth&lt;/a&gt; for making this promised migration a reality!&lt;br /&gt;&lt;br /&gt;As of midnight EST, please visit the new site. My goal is to redirect all incoming Blogger traffic to the new hosted site. This will be the last post here at Blogger.&lt;br /&gt;&lt;br /&gt;p.s. Please note that I'll be manually migrating any content (posts and comments) from the past 5 days, i.e., since I performed an import on September 12th. My apologies if anything is lost in translation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-4565799435089103496?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/4565799435089103496/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=4565799435089103496' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4565799435089103496'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4565799435089103496'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/migrating-tonight.html' title='Migrating Tonight!'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1542220055264262058</id><published>2008-09-16T13:30:00.002-04:00</published><updated>2008-09-16T13:35:28.418-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><title type='text'>Quick Bites: Search Evaluation at Google</title><content type='html'>Original post is &lt;a href="http://googleblog.blogspot.com/2008/09/search-evaluation-at-google.html"&gt;here&lt;/a&gt;; Jeff's commentary is &lt;a href="http://www.searchenginecaffe.com/2008/09/beyond-relevance.html"&gt;here&lt;/a&gt;. Not surprisingly, my reaction is that Google should consider a richer notion of "results" than an ordering of matching pages, perhaps a faceted approach that reflects the "several dimensions to 'good' results."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1542220055264262058?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1542220055264262058/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1542220055264262058' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1542220055264262058'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1542220055264262058'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-search-evaluation-at-google.html' title='Quick Bites: Search Evaluation at Google'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-9207304146702167077</id><published>2008-09-16T10:27:00.005-04:00</published><updated>2008-09-16T10:43:37.816-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><title type='text'>Quick Bites: Is Wikipedia Production Slowing Down?</title><content type='html'>Thanks to &lt;span class="fn"&gt;&lt;a href="http://sergionunes.com/"&gt;Sérgio&lt;/a&gt; &lt;/span&gt;for tweeting this post by Peter Pirolli at PARC: &lt;a href="http://asc-parc.blogspot.com/2008/09/is-wikipedia-production-slowing-down.html"&gt;Is Wikipedia Production Slowing Down?&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here's the picture showing the reduction of growth in the number of Wikipedia editors over time:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://web.mac.com/peter.pirolli/Professional/Blog/Entries/2008/9/9_Is_Wikipedia_becoming_less_productive.html"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_Y0SVT3VxV1E/SM_EzVO2aBI/AAAAAAAAABs/cpGLSZxmpcM/s320/Picture+2.png" alt="" id="BLOGGER_PHOTO_ID_5246628477061720082" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Interesting material and commentary at &lt;a href="http://asc-parc.blogspot.com/2008/09/is-wikipedia-production-slowing-down.html"&gt;Augmented Social Cognition&lt;/a&gt; and &lt;a href="http://web.mac.com/peter.pirolli/Professional/Blog/Entries/2008/9/9_Is_Wikipedia_becoming_less_productive.html"&gt;Peter Pirolli's blog&lt;/a&gt;. Are people are running out of things to write about?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-9207304146702167077?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/9207304146702167077/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=9207304146702167077' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/9207304146702167077'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/9207304146702167077'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-is-wikipedia-production.html' title='Quick Bites: Is Wikipedia Production Slowing Down?'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_Y0SVT3VxV1E/SM_EzVO2aBI/AAAAAAAAABs/cpGLSZxmpcM/s72-c/Picture+2.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3738886663785186149</id><published>2008-09-15T16:38:00.011-04:00</published><updated>2008-09-15T17:36:03.668-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='social media'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='transparency'/><category scheme='http://www.blogger.com/atom/ns#' term='Information technology'/><title type='text'>Information Accountability</title><content type='html'>The recent &lt;a href="http://www.nytimes.com/2008/09/15/technology/15google.html"&gt;United Airlines stock fiasco&lt;/a&gt; triggered an expected wave of finger pointing. For those who didn't follow the event, here is the executive summary:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;In the wee hours of Sunday, September 7th, The South Florida Sun-Sentinel (a subsidiary of the Tribune Company) &lt;a href="http://1.bp.blogspot.com/_gLGYheTX5nY/SMXV-qLb4aI/AAAAAAAAABE/H96ogTHUI1E/s1600-h/Sentinel_business_section_blog.jpg"&gt;included a link&lt;/a&gt; to an article entitled "UAL Files for Bankruptcy." The link was legit, but the linked article didn't carry its publication date in 2002. Then Google's news bot picked up the article and automatically assigned it a current date. Furthermore, Google sent the link to anyone with an alert set up for news about United. Then, on Monday, September 8th, someone at Income Security Advisors saw the article in the results for a Google News search and sent it out on Bloomberg. The results are in the picture below, courtesy of Bloomberg by way of the New York Times.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.nytimes.com/imagepages/2008/09/15/business/15google.graf02.ready.html"&gt;&lt;img style="cursor: pointer; width: 290px; height: 193px;" src="http://graphics8.nytimes.com/images/2008/09/15/business/15google-graf02.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;For anyone who wants all of the gory details, Google's version of the story is &lt;a href="http://googlenewsblog.blogspot.com/2008/09/update-on-united-airlines-story.html"&gt;here&lt;/a&gt;; the Tribune Company's version is &lt;a href="http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&amp;amp;STORY=/www/story/09-09-2008/0004882072&amp;amp;EDATE="&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;I've spent the past week wondering about this event from an information access perspective. And then today I saw two interesting articles:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The first was a &lt;a href="http://news.bbc.co.uk/1/hi/technology/7613201.stm"&gt;piece in BBC News&lt;/a&gt; about a &lt;a href="http://www.webfoundation.org/donations/knight2008/tbl-speech"&gt;speech by Sir Tim Berners-Lee&lt;/a&gt; expressing concern that the internet needs a way to help people separate rumor from real science. His examples included the fears about the Large Hadron Collider at CERN creating a black hole that would swallow up the earth (which isn't quite the premise of Dan Brown's &lt;a href="http://en.wikipedia.org/wiki/Angels_and_Demons"&gt;Angels and Demons&lt;/a&gt;), and rumors that a vaccine given to children in Britain was harmful.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The second was a &lt;a href="http://www.nytimes.com/2008/09/16/us/politics/16web-nagourney.html"&gt;column &lt;/a&gt;&lt;a href="http://www.nytimes.com/2008/09/16/us/politics/16web-nagourney.html"&gt;in the New York Times&lt;/a&gt; about the dynamics of the US presidential campaign, where Adam Nagourney notes that "senior campaign aides say they are no longer sure what works, as they stumble through what has become a daily campaign fog, struggling to figure out what voters are paying attention to and, not incidentally, what they are even believing."&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;I see a common thread here is that I'd like to call "information accountability." I don't mean this term in the sense of a &lt;a href="http://portal.acm.org/citation.cfm?id=1349043"&gt;recent CACM article&lt;/a&gt; about information privacy and sensitivity, but rather in a sense of information &lt;a href="http://en.wikipedia.org/wiki/Provenance"&gt;provenance&lt;/a&gt; &lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;and responsibility.&lt;br /&gt;&lt;br /&gt;Whether we're worrying about &lt;a href="http://en.wikipedia.org/wiki/Google_bomb"&gt;Google bombing&lt;/a&gt;&lt;a class="linkification-ext" href="http://en.wikipedia.org/wiki/Google_bomb" title="Linkification: http://en.wikipedia.org/wiki/Google_bomb"&gt;&lt;/a&gt;, &lt;a href="http://www.webpronews.com/expertarticles/2005/10/27/google-bowling-how-competitors-can-sabotage-you-what-google-should-do-about-it"&gt;Google bowling&lt;/a&gt;, or what Gartner analyst Whit Andrews calls &lt;a href="http://www.internetnews.com/ec-news/article.php/3643341"&gt;"denial-of-insight" attacks&lt;/a&gt;, our concern is that information often arrives with implicit authority. Despite the aphorism telling us "don't believe everything you read," most of us select news and information sources with some hope that they will be authoritative. Whether the motto is &lt;a href="http://en.wikipedia.org/wiki/The_New_York_Times"&gt;"all the news that's fit to print"&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Don%27t_be_evil"&gt;"don't be evil"&lt;/a&gt;, our choice of what we believe to be information sources is a necessary heuristic to avoid subjecting everything we read to endless skeptical inquiry.&lt;br /&gt;&lt;br /&gt;But sometimes the most reputable news sources get it wrong. Or perhaps "wrong" is the wrong word. When newspapers reported that the FBI was treating &lt;a href="http://en.wikipedia.org/wiki/Richard_Jewell"&gt;Richard Jewell&lt;/a&gt; as a "person of interest" in the &lt;a href="http://en.wikipedia.org/wiki/Centennial_Olympic_Park_bombing"&gt;Centennial Olympic Park bombing&lt;/a&gt; (cf. "Olympic Park Bomber" &lt;a href="http://en.wikipedia.org/wiki/Eric_Robert_Rudolph"&gt;Eric Robert Rudolph&lt;/a&gt;), they weren't lying, but rather were communicating information from what they believed to be a reliable source. And, in turn  the FBI may have been correctly doing its job, given the information they had. But there's no question that Jewell suffered tremendously from his "trial by media" before his name was ultimately cleared.&lt;br /&gt;&lt;br /&gt;It's tempting to react to these information breakdowns with finger-pointing, to figure out who is accountable and, in as litigious a society as the United States, bring on the lawyers. Moreover, there clearly are cases where willful misinformation constitutes criminal defamation or fraud. But I think we need to be careful, especially in a world where information flows in a highly connected--and not necessary acyclic--social graph. Anyone who has played the children's game of &lt;a href="http://en.wikipedia.org/wiki/Chinese_whispers"&gt;telephone&lt;/a&gt; knows that small communication errors can blow up rapidly, and that it's difficult to partition blame fairly.&lt;br /&gt;&lt;br /&gt;The simplest answer is that we are accountable for how we consume information: &lt;a href="http://en.wikipedia.org/wiki/Caveat_lector"&gt;caveat lector&lt;/a&gt;. But this model seems overly simplistic, since our daily lives hinge our ability to consume information without such a skeptical eye that we can accept nothing at face value. Besides, shouldn't we hold information providers responsible for living up the reputations they cultivate and promote?&lt;br /&gt;&lt;br /&gt;There are no easy answers here. But the bad news is that we cannot ignore the questions of information accountability. If terms like "social media" and "web 2.0" mean anything, they surely tell us that the game of telephone will only grow in the number of participants and in the complexity of the communication chains. As a society, we will have to learn to live with and mitigate the fallout.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3738886663785186149?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3738886663785186149/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3738886663785186149' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3738886663785186149'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3738886663785186149'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/information-accountability.html' title='Information Accountability'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8621105808477674565</id><published>2008-09-14T17:51:00.004-04:00</published><updated>2008-09-14T19:21:21.574-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='blogs'/><category scheme='http://www.blogger.com/atom/ns#' term='faceted navigation'/><title type='text'>Is Blog Search Different?</title><content type='html'>Alerted by &lt;a href="http://www.searchenginecaffe.com/2008/09/trec-blog-search-2008-and-beyond.html"&gt;Jeff &lt;/a&gt;and &lt;a href="http://terrierteam.blogspot.com/2008/09/about-blog-search-tasks.html"&gt;Iadh&lt;/a&gt;, I recently read &lt;a href="http://people.ischool.berkeley.edu/%7Ehearst/papers/blogsearch08.pdf"&gt;What Should Blog Search Look Like?&lt;/a&gt;, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.&lt;br /&gt;&lt;br /&gt;The position paper suggests focusing on 3 three kinds of search tasks:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Find out what are people thinking or feeling about X over time.&lt;/li&gt;&lt;li&gt;Find good blogs/authors to read.&lt;/li&gt;&lt;li&gt;Find useful information that was published in blogs sometime in the past.&lt;/li&gt;&lt;/ol&gt;The authors generally recommend the use of &lt;a href="http://en.wikipedia.org/wiki/Faceted_classification"&gt;faceted&lt;/a&gt; navigation interfaces--something I'd hope would be uncontroversial by now for search in general.&lt;br /&gt;&lt;br /&gt;But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on &lt;a href="http://staff.science.uva.nl/%7Egilad/pubs/ecir06-blogsearch.pdf"&gt;work by Mishne and de Rijke&lt;/a&gt;, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.&lt;br /&gt;&lt;br /&gt;So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8621105808477674565?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8621105808477674565/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8621105808477674565' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8621105808477674565'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8621105808477674565'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/is-blog-search-different.html' title='Is Blog Search Different?'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-5313245715794704920</id><published>2008-09-13T10:08:00.002-04:00</published><updated>2008-09-13T10:22:48.226-04:00</updated><title type='text'>Progress on the Migration</title><content type='html'>Please check out &lt;a href="http://thenoisychannel.com/"&gt;http://thenoisychannel.com/&lt;/a&gt; to see the future of The Noisy Channel in progress. I'm using WordPress hosted on GoDaddy and did the minimum work to port all posts and comments (not including this one).&lt;br /&gt;&lt;br /&gt;Here is the my current list of tasks that I'd like to get done before we move.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Design!&lt;/span&gt; I'm currently using the default WordPress theme, which is pretty lame. I'm inclined to use  a clean but stylish two-column theme that is widget-friendly. Maybe &lt;a href="http://cutline.tubetorial.com/"&gt;Cutline&lt;/a&gt;. In any case, I'd like the new site to be a tad less spartan before we move into it.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Internal Links.&lt;/span&gt; My habit of linking back to previous posts now means I have to map those links to the new posts. I suspect I'll do it manually, since I don't see an easy way to automate it.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Redirects.&lt;/span&gt; Unfortunately I don't think I can actually get Blogger to redirect traffic automatically. So my plan is to post signage throughout this blog making it clear that the blog has moved.&lt;/li&gt;&lt;/ul&gt;I'd love help, particularly in the form of advice on the design side. And I'll happily give administration access to anyone who has the cycles to help implement any of these or other ideas. Please let me know by posting here or by emailing me: dtunkelang@{endeca,gmail}.com.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-5313245715794704920?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/5313245715794704920/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=5313245715794704920' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5313245715794704920'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5313245715794704920'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/progress-on-migration.html' title='Progress on the Migration'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-5699963373581913666</id><published>2008-09-12T09:54:00.003-04:00</published><updated>2008-09-12T10:01:10.124-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Quick Bites: Probably Irrelevant. (Not!)</title><content type='html'>Thanks to &lt;a href="http://www.searchenginecaffe.com/2008/09/new-information-retrieval-group-blog.html"&gt;Jeff Dalton&lt;/a&gt; for spreading the word about a new information retrieval  blog: &lt;a href="http://probablyirrelevant.org/"&gt;Probably Irrelevant&lt;/a&gt;. It's a group blog, currently listing &lt;a href="http://ciir.cs.umass.edu/%7Efdiaz/"&gt;Fernando Diaz&lt;/a&gt; and &lt;a href="http://www.cs.cmu.edu/%7Ejelsas/"&gt;Jon Elsas&lt;/a&gt; as contributors. Given the authors and the blog name's anagram of "&lt;a href="http://probablyirrelevant.org/about/"&gt;Re-plan IR revolt, baby!&lt;/a&gt;", I expect great things!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-5699963373581913666?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/5699963373581913666/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=5699963373581913666' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5699963373581913666'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5699963373581913666'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-probably-irrelevant-not.html' title='Quick Bites: Probably Irrelevant. (Not!)'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-622144566471757259</id><published>2008-09-10T23:03:00.002-04:00</published><updated>2008-09-10T23:09:23.839-04:00</updated><title type='text'>Fun with Twitter</title><content type='html'>I recently joined Twitter and &lt;a href="http://twitter.com/dtunkelang/statuses/913257696"&gt;asked the twitterverse&lt;/a&gt; for opinions about DreamHost vs. GoDaddy as a platform to host this blog on WordPress. I was shocked when I noticed today that I'd gotten &lt;a href="http://twitter.com/asocialcontract/statuses/913336687"&gt;this response&lt;/a&gt; from the President / COO of GoDaddy (or perhaps a sales rep posing as such).&lt;br /&gt;&lt;br /&gt;Seems like a lot of work for customer acquisition!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-622144566471757259?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/622144566471757259/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=622144566471757259' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/622144566471757259'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/622144566471757259'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/fun-with-twitter.html' title='Fun with Twitter'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6309499338511604315</id><published>2008-09-10T21:53:00.002-04:00</published><updated>2008-09-10T22:04:11.905-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>Quick Bites: Email becomes a Dangerous Distraction</title><content type='html'>Just read &lt;a href="http://www.smh.com.au/news/biztech/youve-got-interruptions/2008/09/08/1220857455459.html"&gt;this article&lt;/a&gt; citing a number of studies to the effect that email is a major productivity drain. Nothing surprising to me--a lot of us have learned the hard way that the only way to be productive is to not check email constantly.&lt;br /&gt;&lt;br /&gt;But I am curious if anyone has made progress on tools that alert you to emails that do call for immediate attention. I'm personally a fan of &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.7827"&gt;attention bonds&lt;/a&gt; approaches, but I imagine that the machine learning folks have at least thought about this as a sort of inverse spam filtering problem.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6309499338511604315?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6309499338511604315/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6309499338511604315' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6309499338511604315'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6309499338511604315'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-email-becomes-dangerous.html' title='Quick Bites: Email becomes a Dangerous Distraction'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-4720478448976598213</id><published>2008-09-09T01:04:00.005-04:00</published><updated>2008-09-09T01:13:49.634-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Fitts&apos;s Law'/><category scheme='http://www.blogger.com/atom/ns#' term='HCI'/><title type='text'>Quick Bites: The Clickwheel Must Die</title><content type='html'>As someone who's long felt that the iPod's clickwheel violates &lt;a href="http://en.wikipedia.org/wiki/Fitts%27s_law"&gt;Fitts's law&lt;/a&gt;, I was delighted to read this Gizmodo article asserting that &lt;a href="http://gizmodo.com/5042072/a-sad-fact-the-ipods-clickwheel-must-die"&gt;the iPod's clickwheel must die&lt;/a&gt;. My choice quote:&lt;br /&gt;&lt;blockquote&gt;Quite simply, the clickwheel hasn't scaled to handle the long, modern day menus in powerful iPods.&lt;/blockquote&gt;Fortunately Apple recognized its mistake on this one and fixed the problem in its touch interface. Though, to be clear, the problem was not inherent in the choice of a wheel interface, but rather in the requirement to make gratuitously precise selections.&lt;br /&gt;&lt;br /&gt;Now I'm waiting to see someone fix the tiny minimize/maximize/close buttons in the upper right corner on Windows, which I suspect have become the textbook example of violating Fitts's law.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-4720478448976598213?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/4720478448976598213/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=4720478448976598213' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4720478448976598213'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4720478448976598213'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-clickwheel-must-die.html' title='Quick Bites: The Clickwheel Must Die'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-4932870601512659432</id><published>2008-09-08T23:16:00.004-04:00</published><updated>2008-09-08T23:23:24.962-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Collaborative tagging'/><category scheme='http://www.blogger.com/atom/ns#' term='LinkedIn'/><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='knowledge management'/><category scheme='http://www.blogger.com/atom/ns#' term='Enterprise Search'/><title type='text'>Incentives for Active Users</title><content type='html'>Some of the most successful web sites today are social networks, such as Facebook and LinkedIn. These are not only popular web sites; they are also remarkably effective people search tools. For example, I can use LinkedIn to find the &lt;a href="http://www.linkedin.com/search?search=&amp;amp;keywords=%22information+retrieval%22&amp;amp;searchLocationType=I&amp;amp;countryCode=us&amp;amp;postalCode=11201&amp;amp;distance=50&amp;amp;sortCriteria=4"&gt;163 people&lt;/a&gt; in my network who mention "information retrieval" in their profiles and live within 50 miles of my ZIP code (I can't promise you'll see the same results!).&lt;br /&gt;&lt;br /&gt;A couple of observations about social networking sites (I'll focus on LinkedIn) are in order.&lt;br /&gt;&lt;br /&gt;First, this functionality is a very big deal, and it's something Google, Yahoo, and Microsoft have not managed to provide, even though their own technology is largely built on a social network--citation ranking.&lt;br /&gt;&lt;br /&gt;Second, the "secret sauce" for sites like LinkedIn is hardly their technology (a search engine built on Lucene and a good implementation of breadth-first search), but rather the way they have incented users to be active participants, in everything from virally marketing the site to their peers to inputting high-quality semi-structured profiles that make the site useful. In other words, active users ensure both the quantity and quality of information on the site.&lt;br /&gt;&lt;br /&gt;Many people have noted the &lt;a href="http://en.wikipedia.org/wiki/Network_effect"&gt;network effect&lt;/a&gt; that drove the run-away success of Microsoft Office and eBay. But I think that social networking sites are taking this idea further, because users not only flock to the crowds, but become personally invested not only in the success of the site generally, but especially in the quality and accuracy of their personal information.&lt;br /&gt;&lt;br /&gt;Enterprises need to learn from these consumer-oriented success stories. Some have already. For example, a couple of years ago, IBM established a &lt;a href="http://endeca.com/corporate-info/press-room/nr/n_072005_wsj.html"&gt;Professional Marketplace&lt;/a&gt;, powered by &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt;, to maintain a skills and availability inventory of IBM employees. This effort was a run-away success, &lt;a href="http://www-01.ibm.com/software/success/cssdb.nsf/CS/LJKS-6RMJZS"&gt;saving IBM $500M&lt;/a&gt; in its first year. But there's more: IBM employees have reacted to the success of the system by being more active in maintaining their own profiles. I spent the day with folks at the &lt;a href="http://www.acm.org/"&gt;ACM&lt;/a&gt;, and their seeing great uptake in their &lt;a href="http://portal.acm.org/author_page.cfm"&gt;author profile pages&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I've argued before that there's no free lunch when it comes to enterprise search and information access. The good news, however, is that, if you create the right incentives, you can get other folks to happily pay for lunch.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-4932870601512659432?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/4932870601512659432/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=4932870601512659432' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4932870601512659432'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4932870601512659432'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/incentives-for-active-users.html' title='Incentives for Active Users'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-5882619379845084217</id><published>2008-09-08T23:06:00.002-04:00</published><updated>2008-09-08T23:16:03.122-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Collaborative tagging'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><title type='text'>Quick Bites: Taxonomy Directed Folksonomies</title><content type='html'>Props to &lt;a href="http://taxonomy2watch.blogspot.com/2008/09/taxonomy-directed-folksonomies.html"&gt;Gwen Harris&lt;/a&gt; at Taxonomy Watch for posting a paper by Sarah Hayman and Nick Lothian on &lt;a href="http://www.ifla.org/IV/ifla73/papers/157-Hayman_Lothian-en.pdf"&gt;Taxonomy Directed Folksonomies&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The paper asks whether folksonomies and formal taxonomy can be used together and answers in the affirmative. The work is in the spirit of some of our recent work at Endeca to bootstrap from vocabularies (though not necessarily controlled vocabularies) to address the inconsistency and sparsity of tagging in folksonomies.&lt;br /&gt;&lt;br /&gt;I'm personally excited to see the walls coming down between the two approaches, which many people seem to think of as mutually exclusive approaches to the tagging problem.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-5882619379845084217?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/5882619379845084217/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=5882619379845084217' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5882619379845084217'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5882619379845084217'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-taxonomy-directed.html' title='Quick Bites: Taxonomy Directed Folksonomies'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8085838487673818132</id><published>2008-09-07T19:56:00.003-04:00</published><updated>2008-09-07T20:27:00.518-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><title type='text'>Quick Bites: Is Search Really 90% Solved?</title><content type='html'>Props to Michael Arrington for&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;&lt;a href="http://www.techcrunch.com/2008/09/07/is-search-really-90-solved/"&gt;&lt;/a&gt; &lt;a href="http://www.techcrunch.com/2008/09/07/is-search-really-90-solved/"&gt;calling out&lt;/a&gt; this snippet in an &lt;a href="http://latimesblogs.latimes.com/technology/2008/09/marissa-mayer-t.html"&gt;interview with Marissa Mayer&lt;/a&gt;, Google Vice President of Search Product and User Experience on the occasion of Google's 10th birthday:&lt;br /&gt;&lt;blockquote&gt;Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.&lt;/blockquote&gt;I agree with Michael that search isn't even close to being solved yet. I've &lt;a href="http://thenoisychannel.blogspot.com/2008/08/is-google-good-enough_05.html"&gt;criticized&lt;/a&gt; the way many web search start-ups--and even the giants Yahoo and Microsoft--are going about trying to dethrone Google through incremental improvements or technologies that don't address any need that Google does not already adequately (if not optimally) address. But there is no lack of &lt;a href="http://thenoisychannel.blogspot.com/2008/08/where-google-isnt-good-enough.html"&gt;open problems&lt;/a&gt; in search for those ambitious enough to tackle them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8085838487673818132?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8085838487673818132/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8085838487673818132' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8085838487673818132'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8085838487673818132'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-is-search-really-90-solved.html' title='Quick Bites: Is Search Really 90% Solved?'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3421828675909821375</id><published>2008-09-07T11:11:00.003-04:00</published><updated>2008-09-07T11:21:44.581-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><title type='text'>Quick Bites: Applying Turing's Ideas to Search</title><content type='html'>A colleague of mine at &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt; recently pointed me to a post by John Ferrara at Boxes and Arrows entitled &lt;a href="http://www.boxesandarrows.com/view/applying-turings"&gt;Applying Turing's Ideas to Search&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;One of the points he makes echoes the "computers aren't mind readers" theme I've been hammering at for a while:&lt;br /&gt;&lt;blockquote&gt;If the user has not phrased her search clearly enough for another person to understand what she’s trying to find, then it’s not reasonable to expect that a comparatively "dumb" machine could do better. In a Turing test, the response to a question incomprehensible even to humans would prove nothing, because it wouldn’t provide any distinction between person and machine.&lt;/blockquote&gt;While I'm not convinced that search engine designers should be aspiring to pass the &lt;a href="http://en.wikipedia.org/wiki/Turing_test"&gt;Turing test&lt;/a&gt;, I agree wholeheartedly with the vision John puts forward:&lt;br /&gt;&lt;blockquote&gt;It describes an ideal form of human-computer interaction in which people express their information needs in their own words, and the system understands and responds to their requests as another human being would. During my usability test, it became clear that this was the very standard to which my test participants held search engines.&lt;/blockquote&gt;It's not about the search engine convincing the user that another human being is producing the answers, but rather engaging users in a conversation that helps them articulate and elaborate their information needs. Or, as we like to call it around here, &lt;a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval"&gt;HCIR&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3421828675909821375?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3421828675909821375/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3421828675909821375' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3421828675909821375'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3421828675909821375'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-applying-turings-ideas-to.html' title='Quick Bites: Applying Turing&apos;s Ideas to Search'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2007242846394759544</id><published>2008-09-06T20:58:00.003-04:00</published><updated>2008-09-07T11:23:26.290-04:00</updated><title type='text'>Migrating Soon</title><content type='html'>Just another reminder that I expect to migrate this blog to a hosted WordPress platform in the next days. If you have opinions about hosting platforms, please let me know by commenting here. Right now, I'm debating between DreamHost and GoDaddy, but I'm very open to suggestions.&lt;br /&gt;&lt;br /&gt;I will do everything in my power to minimize disruption--not sure how easy Blogger will make it to redirect users to the new site. I'll probably post here for a while after to the move to try to direct traffic.&lt;br /&gt;&lt;br /&gt;I do expect the new site to be under a domain name I've already reserved: &lt;a href="http://thenoisychannel.com/"&gt;http://thenoisychannel.com&lt;/a&gt;. It currently forwards to Blogger.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2007242846394759544?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2007242846394759544/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2007242846394759544' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2007242846394759544'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2007242846394759544'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/migrating-soon.html' title='Migrating Soon'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-952778772566536203</id><published>2008-09-06T20:43:00.003-04:00</published><updated>2008-09-06T20:58:02.580-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='intelligence analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Back from the Endeca Government Summit</title><content type='html'>I spent Thursday at the &lt;a href="http://endeca.com/government-summit/index.html"&gt;Endeca Government Summit&lt;/a&gt;, where I had the privilege to chat face-to-face with some Noisy Channel readers. Mostly, I was there to learn more about the sorts of information seeking problems people are facing in the public sector in general, and in the intelligence agencies in particular.&lt;br /&gt;&lt;br /&gt;While I can't go into much detail, the key concern was exploration of information availability. This problem is the antithesis of &lt;a href="http://www.db.dk/bh/core%20concepts%20in%20lis/articles%20a-z/known_item_search.htm"&gt;known-item search&lt;/a&gt;: rather than you are trying to retrieve information you know exist (and which you know how to specify), you are trying to determine if there is information available that would help you with a particular task.&lt;br /&gt;&lt;br /&gt;Despite being lost in a sea of &lt;a href="http://en.wikipedia.org/wiki/Three-letter_acronym"&gt;TLAs&lt;/a&gt;, I came away with a deepened appreciation of both the problems the intelligence agencies are trying to address and the relevance of exploratory search approaches to those problems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-952778772566536203?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/952778772566536203/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=952778772566536203' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/952778772566536203'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/952778772566536203'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/back-from-endeca-government-summit.html' title='Back from the Endeca Government Summit'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-7443446282369767154</id><published>2008-09-04T01:05:00.001-04:00</published><updated>2008-09-04T01:07:16.362-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='transparency'/><title type='text'>Query Elaboration as a Dialogue</title><content type='html'>&lt;div&gt;I ended my post on &lt;a href="http://thenoisychannel.blogspot.com/2008/08/transparency-in-information-retrieval.html"&gt;transparency in information retrieval&lt;/a&gt; with a teaser: if users aren't great at composing queries for set retrieval, which I argue is more transparent than ranked retrieval, then how will we ever deliver an information retrieval system that offers both usefulness and transparency?&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The answer is that the system needs to help the user elaborate the query. Specifically, the process of composing a query should be a dialogue between the user and the system that allows the user to progressively articulate and explore an information need.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Those of you who have been reading this blog for a while or who are familiar with what I do at &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt; shouldn't be surprised to see dialogue as the punch line. But I want to emphasize that the dialogue I'm describing isn't just a back-and-forth between the user and the system. After all, there are query suggestion mechanisms that operate in the context of ranked retrieval algorithms--algorithms which do not offer the user transparency. While such mechanisms sometimes work, they risk doing more harm than good. Any interactive approach requires the user to do more work; if this added work does not result in added effectiveness, users will be frustrated.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;That is why the dialogue has to be based on a transparent retrieval model--one where the system responds to queries in a way that is intuitive to users. Then, as users navigate in query space, transparency ensures that they can make informed choices about query refinement and thus make progress. I'm partial to set retrieval models, though I'm open to probabilistic ones. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But of course we've just shifted the problem. How do we decide what query refinements to offer to a user in order to support this progressive refinement process? Stay tuned...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-7443446282369767154?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/7443446282369767154/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=7443446282369767154' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7443446282369767154'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7443446282369767154'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/query-elaboration-as-dialogue.html' title='Query Elaboration as a Dialogue'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-7838439081764754401</id><published>2008-09-02T20:40:00.003-04:00</published><updated>2008-09-02T20:43:04.171-04:00</updated><title type='text'>Migrating to WordPress</title><content type='html'>Just a quick note to let folks know that I'll be migrating to WordPress in the next days. I'll make every effort to have to move be seamless. I have secured the domain name &lt;a href="http://thenoisychannel.com"&gt;http://thenoisychannel.com&lt;/a&gt;, which currently forwards Blogger, but will shift to wherever the blog is hosted. I apologize in advance for any disruption.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-7838439081764754401?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/7838439081764754401/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=7838439081764754401' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7838439081764754401'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7838439081764754401'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/migrating-to-wordpress.html' title='Migrating to WordPress'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1482711048472325351</id><published>2008-09-02T02:16:00.003-04:00</published><updated>2008-09-02T02:31:36.503-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><title type='text'>Quick Bites: Google Chrome</title><content type='html'>For those of you who thought that no major technology news would come out during the Labor Day weekend, check out the prematurely released &lt;a href="http://books.google.com/books?id=8UsqHohwwVYC&amp;amp;prin"&gt;comic book&lt;/a&gt; hailing Google Chrome, Google's long rumored entry into browser wars. By the time you are reading this, the (Windows only) beta may even be available for download. The official Google announcement is &lt;a href="http://googleblog.blogspot.com/2008/09/fresh-take-on-browser.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If the software lives up to the promise of the comic book, then Google may have a real shot of taking market share from IE and Firefox. More significantly, if they can &lt;a href="http://furrier.org/2008/09/01/google-chrome-what-does-it-mean-its-official-the-search-wars-just-turned-into-operating-system-war/"&gt;supplant the operating system with the browser&lt;/a&gt;, then they'll have a much more credible opportunity to take on desktop software with their web-based applications.&lt;br /&gt;&lt;br /&gt;Interestingly, even though all of the search blogs are reporting about Chrome, I haven't seen any analysis on what this might mean for web search.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1482711048472325351?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1482711048472325351/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1482711048472325351' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1482711048472325351'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1482711048472325351'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-google-chrome.html' title='Quick Bites: Google Chrome'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-7737492433471169850</id><published>2008-09-01T22:03:00.004-04:00</published><updated>2008-09-01T22:48:01.656-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='e-Discovery'/><category scheme='http://www.blogger.com/atom/ns#' term='transparency'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><title type='text'>Quick Bites: E-Discovery and Transparency</title><content type='html'>One change I'm thinking of making to this blog is to introduce "quick bites" as a way of mentioning interesting sites or articles I've come across without going into deep analysis. Here's a first one to give you a flavor of the concept. Let me know what you think.&lt;br /&gt;&lt;br /&gt;I just read an article on how &lt;a href="http://www.ferris.com/2008/07/22/courts-will-tolerate-search-inaccuracies/"&gt;courts will tolerate search inaccuracies&lt;/a&gt; in e-Discovery by way of &lt;a href="http://www.texttechnologies.com/2008/09/01/how-good-does-e-discovery-search-need-to-be/"&gt;Curt Monash&lt;/a&gt;. It reminded me of our recent discussion of &lt;a href="http://thenoisychannel.blogspot.com/2008/08/transparency-in-information-retrieval.html"&gt;transparency in information retrieval&lt;/a&gt;. I agree that "explanations of [search] algorithms are of questionable value" for convincing a court of the relevance and accuracy of the results. But that's because those algorithms aren't sufficiently intuitive for those explanations to be meaningful except in a theoretical sense to an information retreival researcher.&lt;br /&gt;&lt;br /&gt;I realize that user-entered Boolean queries (the traditional approach to e-Discovery) aren't effective because users aren't great at composing queries for set retrieval. But that's why machines need to help users with query elaboration--a topic for an upcoming post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-7737492433471169850?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/7737492433471169850/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=7737492433471169850' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7737492433471169850'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7737492433471169850'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/quick-bites-e-discovery-and.html' title='Quick Bites: E-Discovery and Transparency'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-7070632457865532968</id><published>2008-09-01T16:25:00.002-04:00</published><updated>2008-09-01T16:29:46.334-04:00</updated><title type='text'>POLL: Blogging Platform</title><content type='html'>I've gotten a fair amount of feedback suggesting that I switch blogging platforms. Since I'd plan to make such changes infrequently, I'd like to get input from readers before doing so, especially since migration may have hiccups.&lt;br /&gt;&lt;br /&gt;I've just posted a poll on the &lt;a href="http://thenoisychannel.blogspot.com/"&gt;home page&lt;/a&gt; to ask if folks here have a preference as to which blogging platform I use. Please vote this week, and feel free to post comments here.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-7070632457865532968?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/7070632457865532968/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=7070632457865532968' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7070632457865532968'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7070632457865532968'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/09/poll-blogging-platform.html' title='POLL: Blogging Platform'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3118372005871268564</id><published>2008-08-29T00:17:00.003-04:00</published><updated>2008-08-29T00:25:45.565-04:00</updated><title type='text'>Improving The Noisy Channel: A Call for Ideas</title><content type='html'>Over the past five months, this blog has grown from a suggestion &lt;a href="http://www.searchenginecaffe.com/"&gt;Jeff Dalton&lt;/a&gt; put in my ear to a community to which I'm proud to belong.&lt;br /&gt;&lt;br /&gt;Some milestones:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Over 70 posts to date.&lt;/li&gt;&lt;li&gt;94 subscribers, as reported by Google Reader.&lt;/li&gt;&lt;li&gt;100 unique visitors on.a typical day.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;To be honest, I thought I'd struggle to keep up with posting weekly, and that I'd need to convince my mom to read this blog so that I wouldn't be speaking to an empty room. The results so far have wildly exceeded the expectations I came in with.&lt;br /&gt;&lt;br /&gt;But now that I've seen the potential of this blog, I'd like to "take it to the next level," as the MBA types say.&lt;br /&gt;&lt;br /&gt;My goals:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Increase the readership. My motive isn't (only) to inflate my own ego. I've seen that this blog succeeds most when it stimulates conversation, and a conversation needs participants.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Increase participation. Given the quantity and quality of comments on recent posts, it's clear that readers here contribute the most valuable content. I'd like to step that up a notch by having readers guest-blog and perhaps going as far as to turning The Noisy Channel into a group blog about information seeking that transcends my personal take on the subject. I've very open to suggestions here.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Add some style. Various folks have offered suggestions for improving the blog, such as changing platforms to WordPress, modifying the layout to better use screen real estate, adding more images, etc. I'm the first to admit that I am not a designer, and I'd really appreciate ideas from you all on how to make this site more attractive and usable.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;In short, I'm asking you to help me help you make The Noisy Channel a better and noisier place. Please post your comments here or &lt;a href="mailto:dt@endeca.com"&gt;email me&lt;/a&gt; if you'd prefer to make suggestions privately.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3118372005871268564?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3118372005871268564/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3118372005871268564' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3118372005871268564'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3118372005871268564'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/improving-noisy-channel-call-for-ideas.html' title='Improving The Noisy Channel: A Call for Ideas'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8945585109734477146</id><published>2008-08-27T21:39:00.007-04:00</published><updated>2008-09-01T22:48:20.677-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='transparency'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Transparency in Information Retrieval</title><content type='html'>It's been hard to find time to write another post while keeping up with the comment stream on my &lt;a href="http://thenoisychannel.blogspot.com/2008/08/set-retrieval-vs-ranked-retrieval.html"&gt;previous post about set retrieval&lt;/a&gt;! I'm very happy to see this level of interest, and I hope to continue catalyzing such discussions.&lt;br /&gt;&lt;br /&gt;Today, I'd like to discuss transparency in the context of information retrieval. Transparency is an &lt;a href="http://googleblog.blogspot.com/2008/07/more-transparency-in-customized-search.html"&gt;increasingly popular term&lt;/a&gt; these days in the context of search--perhaps not surprising, since users are finally starting to question the idea of search as a black box.&lt;br /&gt;&lt;br /&gt;The idea of transparency is simple: users should know why a search engine returns a particular response to their query. Note the emphasis on "why" rather than "how". Most users don't care what algorithms a search engine uses to compute a response. What they do care about is how the engine ultimately "understood" their query--in other words, what question the engine thinks it's answering.&lt;br /&gt;&lt;br /&gt;Some of you might find this description too anthropomorphic. But a recent study reported that &lt;a href="http://www.slideshare.net/dtunkelang/is-search-broken/14"&gt;most users expect search engines to read their minds&lt;/a&gt;--never mind that the general case goes beyond &lt;a href="http://en.wikipedia.org/wiki/AI-complete"&gt;AI-complete&lt;/a&gt; (should we create a new class of ESP-complete problems)? But what frustrates users most is when a search engine not only fails to read their minds, but gives no indication of where the communication broke down, let alone how to fix it. In short, a failure to provide transparency.&lt;br /&gt;&lt;br /&gt;What does this have to do with set retrieval vs. ranked retrieval? Plenty!&lt;br /&gt;&lt;br /&gt;Set retrieval predates the Internet by a few decades, and was the first approach used to implement search engines. These search engines allowed users to enter queries by stringing together search terms with Boolean operators (AND, OR, etc.). Today, Boolean retrieval seem arcane, and most people see set retrieval as suitable for querying databases, rather than for querying search engines.&lt;br /&gt;&lt;br /&gt;The biggest problem with set retrieval is that users find it extremely difficult to compose effective Boolean queries. Nonetheless, there is no question that set retrieval offers transparency: what you ask is what you get. And, if you prefer a particular sort order for your results, you can specify it.&lt;br /&gt;&lt;br /&gt;In contrast, ranked retrieval makes it much easier for users to compose queries: users simply enter a few top-of-mind keywords. And for many use cases (in particular, &lt;a href="http://www.db.dk/bh/core%20concepts%20in%20lis/articles%20a-z/known_item_search.htm"&gt;known-item search&lt;/a&gt;) , a state-of-the-art implementation of ranked retrieval yields results that are &lt;a href="http://thenoisychannel.blogspot.com/2008/08/is-google-good-enough_05.html"&gt;good enough&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;But ranked retrieval approaches generally shed transparency. At best, they employ standard information retrieval models that, although published in all of their gory detail, are opaque to their users--who are unlikely to be &lt;a href="http://sigir.org/"&gt;SIGIR&lt;/a&gt; regulars. At worst, they employ &lt;a href="http://thenoisychannel.blogspot.com/2008/04/q-with-amit-singhal.html"&gt;secret, proprietary models&lt;/a&gt;, either to protect their competitive differentiation or to thwart spammers.&lt;br /&gt;&lt;br /&gt;Either way, the only clues that most ranked retrieval engines provide to users are text snippets from the returned documents. Those snippets may validate the relevance of the results that are shown, but the user does not learn what distinguishes the top-ranked results from other documents that contain some or all of the query terms.&lt;br /&gt;&lt;br /&gt;If the user is satisfied with one of the top results, then transparency is unlikely to even come up. Even if the selected result isn't optimal, users may do well to &lt;a href="http://en.wikipedia.org/wiki/Satisficing"&gt;satisfice&lt;/a&gt;. But when the search engine fails to read the user's mind, transparency offer the best hope of recovery.&lt;br /&gt;&lt;br /&gt;But, as I mentioned earlier, users aren't great at composing queries for set retrieval, which was how ranked retrieval became so popular in the first place despite its lack of transparency. How do we resolve this dilemma?&lt;br /&gt;&lt;br /&gt;To be continued...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8945585109734477146?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8945585109734477146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8945585109734477146' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8945585109734477146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8945585109734477146'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/transparency-in-information-retrieval.html' title='Transparency in Information Retrieval'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6571559226888022735</id><published>2008-08-24T12:17:00.003-04:00</published><updated>2008-08-24T12:23:31.441-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Set Retrieval vs. Ranked Retrieval</title><content type='html'>After last week's post about a &lt;a href="http://thenoisychannel.blogspot.com/2008/08/thinking-outside-black-box.html"&gt;racially targeted web search engine&lt;/a&gt;, you'd think I'd avoid controversy for a while. To the contrary, I now feel bold enough like to bring up what I have found to be my most controversial position within the information retrieval community: my preference for set retrieval over ranked retrieval.&lt;br /&gt;&lt;br /&gt;This will be the first of several posts along this theme, so I'll start by introducing the terms.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;In a ranked retrieval approach, the system responds to a search query by ranking all documents in the corpus based on its estimate of their relevance to the query.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;In a set retrieval approach, the system partitions the corpus into two subsets of documents: those it considers relevant to the search query, and those it does not.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;An information retrieval system can combine set retrieval and ranked retrieval by first determining a set of matching documents and then ranking the matching documents. Most industrial search engines, such as Google, take this approach, at least in principle. But, because the set of matching documents is typically much larger than the set of documents displayed to a user, these approaches are, in practice, ranked retrieval.&lt;br /&gt;&lt;br /&gt;What is set retrieval in practice? In my view, a set retrieval approach satisfies two expectations:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The number of documents reported to match my search should be meaningful--or at least should be a meaningful estimate. More generally, any summary information reported about this set should be useful.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Displaying a random subset of the set of matching documents to the user should be a plausible behavior, even if it is not as good as displaying the top-ranked matches. In other words, relevance ranking should help distinguish more relevant results from less relevant results, rather than distinguishing relevant results from irrelevant results.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Despite its popularity, the ranked retrieval model suffers because it does not provide a clear split between relevant and irrelevant documents. This weakness makes it impossible to obtain even basic analysis of the query results, such as the number of relevant documents, let alone a more complicated one, such as the result quality. In contrast, a set retrieval model partitions the corpus into two subsets of documents: those that are considered relevant, and those that are not. A set retrieval model does not rank the retrieved documents; instead, it establishes a clear split between documents that are in and out of the retrieved set. As a result, set retrieval models enable rich analysis of query results, which can then be applied to improve user experience.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6571559226888022735?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6571559226888022735/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6571559226888022735' title='27 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6571559226888022735'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6571559226888022735'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/set-retrieval-vs-ranked-retrieval.html' title='Set Retrieval vs. Ranked Retrieval'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>27</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8561171675903156167</id><published>2008-08-23T09:01:00.003-04:00</published><updated>2008-08-23T09:07:50.056-04:00</updated><title type='text'>Back from the Cone of Silence</title><content type='html'>Regular readers may have noticed the lack of posts this week. My apologies to anyone who was waiting by the RSS feed. Yesterday was the submission deadline for &lt;a href="http://research.microsoft.com/%7Eryenw/hcir2008/"&gt;HCIR '08&lt;/a&gt;, which means that today is a new day! So please stay tuned for your regularly scheduled programming.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8561171675903156167?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8561171675903156167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8561171675903156167' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8561171675903156167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8561171675903156167'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/back-from-cone-of-silence.html' title='Back from the Cone of Silence'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2814332276210141325</id><published>2008-08-16T17:54:00.004-04:00</published><updated>2008-08-16T18:05:03.868-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Thinking Outside the Black Box</title><content type='html'>I was reading &lt;a href="http://techmeme.com/"&gt;Techmeme&lt;/a&gt; today, and I noticed an &lt;a href="http://latimesblogs.latimes.com/technology/2008/08/rushmore-black.html"&gt;LA Times article&lt;/a&gt; about &lt;a href="http://www.rushmoredrive.com/"&gt;RushmoreDrive&lt;/a&gt;, described on its &lt;a href="http://www.rushmoredrive.com/ContentPages/AboutUs.aspx"&gt;About Us&lt;/a&gt; page as "a first-of-its-kind search engine for the Black community." My first reaction, blogged by others already, was that this idea was &lt;a href="http://www.mathewingram.com/work/2008/08/16/rushmoredrive-dumb-idea/"&gt;dumb&lt;/a&gt; and &lt;a href="http://tech.blorge.com/Structure:%20/2008/08/16/rushmoredrive-the-black-google-offensive-possibly-even-racist/"&gt;racist&lt;/a&gt;. In fact, it took some work to find &lt;a href="http://www.blackweb20.com/category/rushmoredrive/"&gt;positive commentary&lt;/a&gt; about RushmoreDrive.&lt;br /&gt;&lt;br /&gt;But I've learned from the way the blogosphere handled the &lt;a href="http://thenoisychannel.blogspot.com/2008/07/not-as-cuil-as-i-expected.html"&gt;Cuil launch&lt;/a&gt; not to trust anyone who evaluates a search engine without having tried it, myself included. My wife and I have been the only white people at &lt;a href="http://www.amyruthsharlem.com/"&gt;Amy Ruth's&lt;/a&gt; and the service was as gracious as the &lt;a href="http://en.wikipedia.org/wiki/Chicken_and_waffles"&gt;chicken and waffles&lt;/a&gt; were delicious; I decided I'd try my luck on a search engine not targeted at my racial profile.&lt;br /&gt;&lt;br /&gt;The search quality is solid, comparable to that of Google, Yahoo, and Microsoft. In fact, the site looks a lot like a re-skinning (no pun intended) of &lt;a href="http://www.ask.com/"&gt;Ask.com&lt;/a&gt;, a corporate sibling of &lt;a href="http://www.iac.com/"&gt;IAC&lt;/a&gt;-owned RushmoreDrive. Like Ask.com, RushmoreDrive emphasizes search refinement through narrowing and broadening refinements.&lt;br /&gt;&lt;br /&gt;What I find ironic is that the whole controversy about racial bias in relevance ranking reveals the much bigger problem--that relevance ranking should not be a black box (ok, maybe this time I'll take responsibility for the pun). I've been beating this drum at The Noisy Channel ever since I &lt;a href="http://thenoisychannel.blogspot.com/2008/04/q-with-amit-singhal.html"&gt;criticized Amit Singhal&lt;/a&gt; for Google's lack of transparency. I think that sites like RushmoreDrive are inevitable if search engines refuse to cede more control of search results to users.&lt;br /&gt;&lt;br /&gt;I don't know how much information race provides as &lt;a href="http://en.wikipedia.org/wiki/Prior_probability"&gt;prior&lt;/a&gt; to influence statistical ranking approaches, but I'm skeptical that the effects are useful or even noticeable beyond a few well-chosen examples. I'm more inclined to see RushmoreDrive as a marketing ploy by the folks at IAC--and perhaps a successful one. I doubt that Google is running scared, but I think this should be a wake-up call to folks who are convinced that personalized relevance ranking is the end goal of user experience for search engines.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2814332276210141325?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2814332276210141325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2814332276210141325' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2814332276210141325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2814332276210141325'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/thinking-outside-black-box.html' title='Thinking Outside the Black Box'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-7002769306472908224</id><published>2008-08-15T10:55:00.004-04:00</published><updated>2008-08-15T10:59:56.075-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>New Information Retrieval Book Available Online</title><content type='html'>Props to &lt;a href="http://www.searchenginecaffe.com/2008/08/yahoo-research-at-sigir-2008.html"&gt;Jeff Dalton&lt;/a&gt; for alerting me about the new book on information retrieval by Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. You can buy a hard copy, but you can also access it online for free at the book &lt;a href="http://www-csli.stanford.edu/%7Ehinrich/information-retrieval-book.html"&gt;website&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-7002769306472908224?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/7002769306472908224/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=7002769306472908224' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7002769306472908224'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7002769306472908224'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/new-information-retrieval-book.html' title='New Information Retrieval Book Available Online'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1875750467060722151</id><published>2008-08-13T23:10:00.006-04:00</published><updated>2008-08-13T23:56:23.534-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>David Huynh's Freebase Parallax</title><content type='html'>One of the perks of working in &lt;a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval"&gt;HCIR&lt;/a&gt; is that you get to meet some of the coolest people in academic and industrial research. I met &lt;a href="http://davidhuynh.net/"&gt;David Huynh&lt;/a&gt; a few years ago, while he was a graduate student at MIT, working in the &lt;a href="http://groups.csail.mit.edu/haystack/"&gt;Haystack&lt;/a&gt; group and on the &lt;a href="http://simile.mit.edu/"&gt;Simile&lt;/a&gt; project. You've probably seen some of his work: his &lt;a href="http://simile.mit.edu/timeline/"&gt;Timeline&lt;/a&gt; project has been deployed all over the web.&lt;br /&gt;&lt;br /&gt;Despite efforts by me and other to persuade David to stay in the Northeast, he went out west a few months ago to join &lt;a href="http://www.metaweb.com/"&gt;Metaweb&lt;/a&gt;, a company with ambitions "to build a better infrastructure for the Web." While I (and &lt;a href="http://www.readwriteweb.com/archives/freebase_parallax_taunts_us_wi.php"&gt;others&lt;/a&gt;) am not persuaded by &lt;a href="http://www.freebase.com/"&gt;Freebase&lt;/a&gt;, Metaweb's "open database of the world’s information," I am happy to see that David is still doing great work.&lt;br /&gt;&lt;br /&gt;I encourage you to check out David's latest project: &lt;a href="http://mqlx.com/%7Edavid/parallax/"&gt;Freebase Parallax&lt;/a&gt;. In it, he does something I've never seen outside &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt; (excepting David's earlier work on a &lt;a href="http://people.csail.mit.edu/dfhuynh/projects/nfb/"&gt;Nested Faceted Browser&lt;/a&gt;) he allows you to navigate using the facets of multiple entity types, joining between sets of entities through their relationships. At Endeca, we call this "record relationship navigation"--we presented it at &lt;a href="http://projects.csail.mit.edu/hcir/"&gt;HCIR '07&lt;/a&gt;, showing an how it can enable &lt;a href="http://thenoisychannel.blogspot.com/2008/04/social-navigation.html"&gt;social navigation&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;David includes a video where he eloquently demonstrates how Parallax works, and the interface is quite compelling. I'm not sure how well it scales with large data sets, but David's focus has been on interfaces rather than systems.  My biggest complaint--which isn't David's fault--is that the Freebase content is a bit sparse. But his interface strikes me as a great fit for &lt;a href="http://thenoisychannel.blogspot.com/search/label/exploratory%20search"&gt;exploratory search&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1875750467060722151?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1875750467060722151/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1875750467060722151' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1875750467060722151'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1875750467060722151'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/david-huynhs-freebase-parallax.html' title='David Huynh&apos;s Freebase Parallax'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8931954760619102686</id><published>2008-08-13T19:27:00.005-04:00</published><updated>2008-08-13T19:40:04.496-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='Enterprise Search'/><title type='text'>Conversation with Seth Grimes</title><content type='html'>I had an great conversation with &lt;a href="http://www.intelligententerprise.com/"&gt;Intelligent Enterprise&lt;/a&gt; columnist &lt;a href="http://www.intelligententerprise.com/movabletype/blog/sgrimes.html"&gt;Seth Grimes&lt;/a&gt; today. Apparently there's an upside to writing &lt;a href="http://thenoisychannel.blogspot.com/2008/08/why-enterprise-search-will-never-be.html"&gt;critical commentary&lt;/a&gt; on Google's aspirations in the enterprise!&lt;br /&gt;&lt;br /&gt;One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I've been &lt;a href="http://people.ischool.berkeley.edu/%7Eryanshaw/wordpress/2008/08/11/structured-yet-permeable/#comments"&gt;discussing&lt;/a&gt; with &lt;a href="http://people.ischool.berkeley.edu/%7Eryanshaw/wordpress/"&gt;Ryan Shaw&lt;/a&gt; &lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;, I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.&lt;br /&gt;&lt;br /&gt;Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they're familiar with searching the web using Google. I don't blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, &lt;a href="http://weblog.infoworld.com/ny-cto/archives/2008/03/search_as_a_uti.html"&gt;wishful thinking&lt;/a&gt; and &lt;a href="http://www.youtube.com/watch?v=U1JjMUjPCvc"&gt;clever advertising&lt;/a&gt; notwithstanding, it doesn't.&lt;br /&gt;&lt;br /&gt;I've blogged about this subject from several different perspectives over the past weeks, so I'll refer recent readers to &lt;a href="http://thenoisychannel.blogspot.com/search/label/Enterprise%20Search"&gt;earlier posts on the subject&lt;/a&gt; rather than bore the regulars.&lt;br /&gt;&lt;br /&gt;But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search--not because of their enterprise offerings, but rather because of the web search they offer for free.&lt;br /&gt;&lt;br /&gt;I'm not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today's APIs to that information woefully inadequate; for example, I can't even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see "federated" information seeking that goes beyond merging ranked lists from different sources.&lt;br /&gt;&lt;br /&gt;Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on &lt;a href="http://thenoisychannel.blogspot.com/2008/04/q-with-amit-singhal.html"&gt;black box relevance ranking&lt;/a&gt; in favor of an approach that offers users control and interaction.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8931954760619102686?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8931954760619102686/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8931954760619102686' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8931954760619102686'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8931954760619102686'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/conversation-with-seth-grimes.html' title='Conversation with Seth Grimes'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2626790519560190896</id><published>2008-08-11T20:52:00.003-04:00</published><updated>2008-08-11T21:01:59.115-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Position papers for NSF IS3 Workshop</title><content type='html'>I just wanted to let folks know that the position papers for the &lt;a href="http://www.ils.unc.edu/ISSS/"&gt;NSF Information Seeking Support Systems Workshop&lt;/a&gt; are now available at &lt;a href="http://www.ils.unc.edu/ISSS/papers/"&gt;this link&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Here is a listing to whet your curiosity:&lt;ul&gt;&lt;li&gt;Supporting Interaction and Familiarity&lt;br /&gt;James Allan, University of Massachusetts Amherst, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;From Web Search to Exploratory Search: Can we get there from here?&lt;br /&gt;Peter Anick, Yahoo! Inc., USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Complex and Exploratory Web Search (with Daniel Russell)&lt;br /&gt;Anne Aula, Google, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Really Supporting Information Seeking: A Position Paper&lt;br /&gt;Nicholas J. Belkin, Rutgers University, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Transparent and User-Controllable Personalization For Information Exploration&lt;br /&gt;Peter Brusilovsky, University of Pittsburgh, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Faceted Exploratory Search Using the Relation Browser&lt;br /&gt;Robert Capra, UNC, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Towards a Model of Understanding Social Search&lt;br /&gt;Ed Chi, Palo Alto Research Center, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Building Blocks For Rapid Development of Information Seeking Support Systems&lt;br /&gt;Gary Geisler, University of Texas at Austin, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Collaborative Information Seeking in Electronic Environments&lt;br /&gt;Gene Golovchinsky, FX Palo Alto Laboratory, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;NeoNote: User Centered Design Suggestions for a Global Shared Scholarly Annotation System&lt;br /&gt;Brad Hemminger, UNC, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Speaking the Same Language About Exploratory Information Seeking&lt;br /&gt;Bill Kules, The Catholic University of America, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Musings on Information Seeking Support Systems&lt;br /&gt;Michael Levi, U.S. Bureau of Labor Statistics, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Social Bookmarking and Information Seeking&lt;br /&gt;David Millen, IBM Research, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Making Sense of Search Result Pages&lt;br /&gt;Jan Pedersen, Yahoo, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A Multilevel Science of Social Information Foraging and Sensemaking&lt;br /&gt;Peter Pirolli, XEROX PARC USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Characterizing, Supporting and Evaluating Exploratory Search&lt;br /&gt;Edie Rasmussen, University of British Columbia, Canada&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The Information-Seeking Funnel&lt;br /&gt;Daniel Rose, A9.com, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Complex and Exploratory Web Search (with Anne Aula)&lt;br /&gt;Daniel Russell, Google, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Research Agenda: Visual Overviews for Exploratory Search&lt;br /&gt;Ben Shneiderman, University of Maryland, USA&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Five Challenges for Research to Support IS3&lt;br /&gt;Elaine Toms, Dalhousie University, Canada&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Resolving the Battle Royale between Information Retrieval and Information Science&lt;br /&gt;Daniel Tunkelang, Endeca, USA&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2626790519560190896?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2626790519560190896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2626790519560190896' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2626790519560190896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2626790519560190896'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/position-papers-for-nsf-is3-workshop.html' title='Position papers for NSF IS3 Workshop'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6540353774035565170</id><published>2008-08-10T23:46:00.005-04:00</published><updated>2008-08-10T23:58:59.736-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><category scheme='http://www.blogger.com/atom/ns#' term='Enterprise Search'/><title type='text'>Why Enterprise Search Will Never Be Google-y</title><content type='html'>As I prepared to end my trilogy of Google-themed posts, I ran into two recently published items. They provide an excellent context for what I intended to talk about: the challenges and opportunities of enterprise search.&lt;br /&gt;&lt;br /&gt;The first is Google's &lt;a href="http://www.google.com/intl/en/press/pressrel/20080806_new_gsa.html"&gt;announcement&lt;/a&gt; of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.&lt;br /&gt;&lt;br /&gt;The second is an article by Chris Sherman in the &lt;a href="http://www.nxtbook.com/nxtbooks/infotoday/enterprisesearchsourcebook08/"&gt;Enterprise Search Sourcebook 2008&lt;/a&gt; entitled &lt;a href="http://www.nxtbook.com/nxtbooks/infotoday/enterprisesearchsourcebook08/index.php?startpage=16"&gt;Why Enterprise Search Will Never Be Google-y&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google's scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn't been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google's strongest selling point for the GSA, their claim it works "out of the box", is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.&lt;br /&gt;&lt;br /&gt;Second, the Chris Sherman piece. Here is an excerpt:&lt;br /&gt;&lt;blockquote&gt;Enterprise search and web search are fundamentally different animals, and I'd argue that enterprise search won't--and shouldn't--be Google-y any time soon....Like web search, Google's enterprise search is easy to use--if you're willing to go along with how Google's algorithms view and present your business information....Ironically, enterprises, with all of their highly structures and carefully organized silos of information, require a very different and paradoxically more complex approach.&lt;/blockquote&gt;I highly recommend you read &lt;a href="http://www.nxtbook.com/nxtbooks/infotoday/enterprisesearchsourcebook08/index.php?startpage=16"&gt;the whole article&lt;/a&gt; (it's only 2 pages), not only because it informative and well written, but also because the author isn't working for one of the Big Three.&lt;br /&gt;&lt;br /&gt;The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn't recommend that anyone try to compete with the GSA on its turf.&lt;br /&gt;&lt;br /&gt;But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for "enterprise search" is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.&lt;br /&gt;&lt;br /&gt;If you're interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6540353774035565170?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6540353774035565170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6540353774035565170' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6540353774035565170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6540353774035565170'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/why-enterprise-search-will-never-be.html' title='Why Enterprise Search Will Never Be Google-y'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1001891273795192124</id><published>2008-08-07T17:25:00.002-04:00</published><updated>2008-08-07T17:34:55.516-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Where Google Isn't Good Enough</title><content type='html'>My last post, &lt;a href="http://thenoisychannel.blogspot.com/2008/08/is-google-good-enough.html"&gt;Is Google Good Enough?&lt;/a&gt;, challenged would-be Google killers to identify and address clear consumer needs for which Google isn't good enough as a solution. I like helping my readers, so here are some ideas.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Shopping.&lt;/span&gt; &lt;a href="http://www.google.com/products"&gt;Google Product Search&lt;/a&gt; (fka Froogle) is not one of Google's crown jewels. At best, it works well when you know the exact name of the product you are looking for. But it pales in contrast to any modern ecommerce site, such as &lt;a href="http://www.amazon.com/"&gt;Amazon&lt;/a&gt; or &lt;a href="http://www.homedepot.com/"&gt;Home Depot&lt;/a&gt;. What makes a shopping site successful? Put simply, it helps users find what they want, even when they didn't know exactly what they wanted when they started.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Finding a job.&lt;/span&gt; Google has not thrown its hat into the ring of job search, and even the page they offer for finding jobs at Google could use some improvement. The two biggest job sites, &lt;a href="http://www.monster.com/"&gt;Monster&lt;/a&gt; and &lt;a href="http://www.careerbuilder.com/"&gt;Careerbuilder&lt;/a&gt;, succeed in terms of the number of jobs posted, but aren't exactly optimized for user experience. &lt;a href="http://dice.com/"&gt;Dice&lt;/a&gt; does better, but only for technology jobs. Interestingly, the best job finding site may be &lt;a href="http://linkedin.com/"&gt;LinkedIn&lt;/a&gt;--not because of their search implementation (which is adequate but not innovative), but because of their success in getting millions of professionals to provide high-quality data.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Finding employees.&lt;/span&gt; Again, LinkedIn has probably come closest to providing a good employee finding site. The large job sites (all of which I've used at some point) not only fail to support exploratory search, but also suffer from a skew towards ineligible candidates and a nuisance of recruiters posing as job seekers. Here again, Google has not tried to compete.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Planning a trip.&lt;/span&gt; Sure, you can use &lt;a href="http://www.expedia.com/"&gt;Expedia&lt;/a&gt;, &lt;a href="http://www.travelocity.com/"&gt;Travelocity&lt;/a&gt;, or &lt;a href="http://www.kayak.com/"&gt;Kayak&lt;/a&gt; to find a flight, hotel, and car rental. But there's a lot of room for improvement when it comes to planning a trip, whether for business or pleasure. The existing tools do a poor job of putting together a coordinated itinerary (e.g., meals, activities), and also don't integrate with relevant information sources, such as local directories and reviews. This is another area where Google has not tried to play.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Note two general themes here. The first is thinking beyond the mechanics of search and focusing on the ability to meet user needs at the task level. The second is the need for &lt;a href="http://thenoisychannel.blogspot.com/search/label/exploratory%20search"&gt;exploratory search&lt;/a&gt;. These only scratch the surface of opportunities in consumer-facing "search" applications. The opportunities within the enterprise are even greater, but I'll save that for my next post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1001891273795192124?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1001891273795192124/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1001891273795192124' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1001891273795192124'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1001891273795192124'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/where-google-isnt-good-enough.html' title='Where Google Isn&apos;t Good Enough'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3614865966214953506</id><published>2008-08-05T11:39:00.001-04:00</published><updated>2008-08-08T11:41:10.623-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><title type='text'>Is Google Good Enough?</title><content type='html'>As Chief Scientist of &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt;, I spend a lot of my time explaining to people why they should not be satisfied with an information seekin interface that only offers them keyword search as an input mechanism and a ranked list of results as output. I tell them about query clarification dialogs, faceted navigation, and set analysis. More broadly, I evangelize &lt;a href="http://thenoisychannel.blogspot.com/search/label/exploratory%20search"&gt;exploratory search&lt;/a&gt; and &lt;a href="http://thenoisychannel.blogspot.com/search/label/HCIR"&gt;human computer information retrieval&lt;/a&gt; as critical to addressing the inherent weakness of conventional ranked retrieval. If you haven't heard me expound on the subject, feel free to check out this slide show on &lt;a href="http://thenoisychannel.blogspot.com/2008/05/is-search-broken.html"&gt;Is Search Broken?&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I'll focus on web search, since, as I've discussed before on this blog, &lt;a href="http://thenoisychannel.blogspot.com/search/label/Enterprise%20Search"&gt;enterprise search&lt;/a&gt; is different.&lt;br /&gt;&lt;br /&gt;1) Google does well enough on result quality, enough of the time.&lt;br /&gt;&lt;br /&gt;While Google doesn't publish statistics about user satisfaction, it's commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at &lt;a href="http://www.langreiter.com/exec/yahoo-vs-google.html"&gt;this site&lt;/a&gt;. But the question is not whether other search engines are also good enough--or even whether they are better. The point is that Google is good enough.&lt;br /&gt;&lt;br /&gt;2) Google doesn't support exploratory search. But it often leads you to a tool that does.&lt;br /&gt;&lt;br /&gt;The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up &lt;a href="http://www.google.com/search?q=daniel+kahneman"&gt;Daniel Kahneman&lt;/a&gt; on Google. The top results is &lt;a href="http://en.wikipedia.org/wiki/Daniel_Kahneman"&gt;his Wikipedia entry&lt;/a&gt;. From there, I can traverse links to learn about his research areas, his colleagues, etc.&lt;br /&gt;&lt;br /&gt;3) Google is a benign monopoly that mitigates choice overload.&lt;br /&gt;&lt;br /&gt;Many people, myself includes, have concerns about Google's increasing role in mediating our access to information. But it's hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it's all "free"--at least in so far as ad-supported services can be said to be free.&lt;br /&gt;&lt;br /&gt;In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn't even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn't good enough as a solution. Given Google's near-monopoly in web search, parity or even incremental advantage isn't enough.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3614865966214953506?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3614865966214953506/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3614865966214953506' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3614865966214953506'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3614865966214953506'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/08/is-google-good-enough_05.html' title='Is Google Good Enough?'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6487120296408737382</id><published>2008-07-28T11:21:00.003-04:00</published><updated>2008-07-28T12:22:59.484-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><title type='text'>Not as Cuil as I Expected</title><content type='html'>Today's big tech news is the launch of &lt;a href="http://www.cuil.com/"&gt;Cuil&lt;/a&gt;, the latest challenger to Google's hegemony in Web search. Given the &lt;a href="http://www.cuil.com/info/management/"&gt;impressive team of Xooglers&lt;/a&gt; that put it together, I had high expectations for the launch.&lt;br /&gt;&lt;br /&gt;My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be "The World's Biggest Search Engine" based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including &lt;a href="http://www.cuil.com/search?q=noisy+channel+blog"&gt;noisy channel blog&lt;/a&gt; (compare to &lt;a href="http://www.google.com/search?q=noisy+channel+blog"&gt;Google&lt;/a&gt;). But I'm not taking it personally--after all, their own site doesn't show up when you &lt;a href="http://www.cuil.com/search?q=cuil"&gt;search for their name&lt;/a&gt; (again, compare to &lt;a href="http://www.google.com/search?q=cuil"&gt;Google&lt;/a&gt;). As for their interface features (column display, explore by category, query suggestions), they're fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.&lt;br /&gt;&lt;br /&gt;Perhaps I'm expecting too much on day 1. But they're not just trying to beat &lt;a href="http://gigablast.com/"&gt;Gigablast&lt;/a&gt;; they're trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they've made in indexing, they clearly need to do more work on their crawler. It's hard to judge the quality of results when it's clear that at least some of the problem is that the most relevant documents simply aren't in their index. I'm also surprised to not see Wikipedia documents showing up much for my searches--particularly for searches when I'm quite sure the most relevant document is in Wikipedia. Again, it's hard to tell if this is an indexing or results quality issue.&lt;br /&gt;&lt;br /&gt;I wish them luck--I speak for many in my desire to see Google face worthy competition in web search.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6487120296408737382?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6487120296408737382/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6487120296408737382' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6487120296408737382'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6487120296408737382'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/not-as-cuil-as-i-expected.html' title='Not as Cuil as I Expected'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8813700403187011099</id><published>2008-07-27T12:43:00.003-04:00</published><updated>2008-07-27T15:49:50.489-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Catching up on SIGIR '08</title><content type='html'>Now that &lt;a href="http://www.sigir2008.org/"&gt;SIGIR '08&lt;/a&gt; is over, I hope to see more folks blogging about it. I'm jealous of everyone who had the opportunity to attend, not only because of the &lt;a href="http://www.sigir2008.org/sg_food.html"&gt;culinary delights of Singapore&lt;/a&gt;, but because the program seems to reflect an increasing interest of the academic community in real-world IR problems.&lt;br /&gt;&lt;br /&gt;Some notes from looking over the proceedings:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Of the 27 &lt;a href="http://www.sigir2008.org/program_details.html"&gt;paper sessions&lt;/a&gt;, 2 include the word "user" in their titles, 2 include the word "social", 2 focus on Query Analysis &amp;amp; Models, and 1 is about exploratory search. Compared to the last few SIGIR conferences, this is a significant increase in focus on users and interaction.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A paper on &lt;a href="http://dis.shef.ac.uk/mark/publications/my_papers/fp440-almaskari.pdf"&gt;whether test collections predict users' effectiveness&lt;/a&gt; offers an admirable defense of the &lt;a href="http://thenoisychannel.blogspot.com/search/label/Cranfield"&gt;Cranfield&lt;/a&gt; paradigm, much along the lines I've been &lt;a href="http://thenoisychannel.blogspot.com/2008/07/resolving-battle-royale-between.html"&gt;advocating&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A nice paper from Microsoft Research looks at the problem of &lt;a href="http://people.csail.mit.edu/teevan/work/publications/papers/sigir08.pdf"&gt;whether to personalize&lt;/a&gt; results for a query, recognizing that not all queries benefit from personalization. This approach may well be able to reap the benefits of personaliztion while avoiding much of its harm.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Two papers on tag prediction: &lt;a href="http://portal.acm.org/citation.cfm?id=1390334.1390423"&gt;Real-time Automatic Tag Recommendation&lt;/a&gt; (ACM Digital Library subscription required) and &lt;a href="http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&amp;amp;doc=2008-18&amp;amp;format=pdf&amp;amp;compression=&amp;amp;name=2008-18.pdf"&gt;Social Tag Prediction&lt;/a&gt;. Semi-automated tagging tools are one of the best ways to leverage the best of both human and machine capabilities.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;And I haven't even gotten to the &lt;a href="http://www.sigir2008.org/posters.html"&gt;posters&lt;/a&gt;! I'm sad to see that they dropped the industry day, but perhaps they'll bring it back &lt;a href="http://sigir2009.org/"&gt;next year in Boston&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8813700403187011099?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8813700403187011099/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8813700403187011099' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8813700403187011099'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8813700403187011099'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/catching-up-on-sigir-08.html' title='Catching up on SIGIR &apos;08'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2954215414593104343</id><published>2008-07-23T14:42:00.003-04:00</published><updated>2008-07-23T15:41:37.200-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><title type='text'>Knol: Google takes on Wikipedia</title><content type='html'>Just a few days ago, I was commenting on a New York Times article about &lt;a href="http://bits.blogs.nytimes.com/2008/07/17/wikipedia-tries-approval-system-to-reduce-vandalism-on-pages/"&gt;Wikipedia's new approval system&lt;/a&gt; that the biggest problem with Wikipedia is anonymous authorship. By synchronous coincidence, Google unveiled &lt;a href="http://knol.google.com/k/knol/"&gt;Knol&lt;/a&gt; today, which is something of a cross between Wikipedia and &lt;a href="http://www.squidoo.com/"&gt;Squidoo&lt;/a&gt;. It's most salient feature is that each entry will have a clearly identified author. They even allow authors to verify their identities using credit cards or phone directories.&lt;br /&gt;&lt;br /&gt;It's a nice idea, since anonymous authorship is a a major factor in the adversarial nature of information retrieval on the web. Not only does the accountability of authorship inhibit vandalism and edit wars, but it also allows readers to decide for themselves whom to trust--at least to the extent that readers are able and willing to obtain reliable information about the authors. Without question, they are addressing Wikipedia's biggest weakness.&lt;br /&gt;&lt;br /&gt;But it's too little, too late. Wikipedia is already there. And, despite complaints about its inaccuracy and bias, Wikipedia is a fantastic, &lt;a href="http://www.alexa.com/data/details/traffic_details/wikipedia.org"&gt;highly utilized&lt;/a&gt; resource. The only way I see for Knol to supplant Wikipedia in reasonable time frame is through a massive cut-and-paste to make up for the huge difference in content.&lt;br /&gt;&lt;br /&gt;Interestingly, Wikipedia does not seem to place any onerous restrictions on &lt;a href="http://en.wikipedia.org/wiki/Wikipedia:Verbatim_copying"&gt;verbatim copying&lt;/a&gt;. However, unless a single author is 100% responsible for authoring a Wikipedia entry, it isn't clear that anyone can simply copy the entry into Knol.&lt;br /&gt;&lt;br /&gt;I know that it's dangerous to bet against Google. But I'm really skeptical about this latest effort. It's a pity, because I think their emphasis is the right one. But for once I wish they'd been a bit more humble and accepted that they aren't going to build a better Wikipedia from scratch.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2954215414593104343?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2954215414593104343/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2954215414593104343' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2954215414593104343'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2954215414593104343'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/knol-google-takes-on-wikipedia.html' title='Knol: Google takes on Wikipedia'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1947424449060322158</id><published>2008-07-19T18:36:00.003-04:00</published><updated>2008-07-19T20:57:50.435-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='psychology'/><title type='text'>Predictably Irrational</title><content type='html'>As regular readers have surely noticed by now, I've been on a bit of a &lt;a href="http://thenoisychannel.blogspot.com/search/label/psychology"&gt;behavioral psychology&lt;/a&gt; kick lately. Some of this reflects long-standing personal interest and my latest reading. But I also feel increasingly concerned that researchers in information seeking--especially those working on tools--have neglected the impact of cognitive bias.&lt;br /&gt;&lt;br /&gt;For those who are unfamiliar with last few decades of research in this field, I highly recommend a &lt;a href="http://youtube.com/watch?v=VZv--sm9XXU"&gt;recent lecture by behavioral economist Dan Ariely&lt;/a&gt; on predictable irrationality. Not only is he a very informative and entertaining speaker, but he chooses very concrete and credible examples, starting with his contemplating how we experience pain based on his own experience of suffering&lt;br /&gt;third-degree burns over 70 percent of his body. I promise you, the lecture is an hour well spent, and the time will fly by.&lt;br /&gt;&lt;br /&gt;A running theme of through this and my other posts on cognitive bias is that the way a information is presented to us has dramatic effects on how we interpret that information.&lt;br /&gt;&lt;br /&gt;This is great news for anyone who wants to manipulate people. In fact, I once asked Dan about the relative importance of people's inherent preferences vs. those induced by presentation on retail web sites, and he all but dismissed the former (i.e., you can sell ice cubes to Eskimos, if you can manipulate their cognitive biases appropriately). But it's sobering news for those of us who want to empower user to evaluate information objectively to support decision making.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1947424449060322158?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1947424449060322158/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1947424449060322158' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1947424449060322158'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1947424449060322158'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/predictably-irrational.html' title='Predictably Irrational'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6897820631787075542</id><published>2008-07-18T10:34:00.004-04:00</published><updated>2008-07-18T12:38:59.028-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Information technology'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='Enterprise Search'/><title type='text'>Call to Action - A Follow-Up</title><content type='html'>The &lt;a href="http://thenoisychannel.blogspot.com/2008/07/call-to-action.html"&gt;call to action&lt;/a&gt; I sent out a couple of weeks ago has generated healthy interest.&lt;br /&gt;&lt;br /&gt;One of the several people who responded is the CTO of one of Endeca's competitors, whom I laud for understanding that the need to better articulate and communicate the technology of information access transcends competition among vendors. While we have differences on how to achieve this goal, I at least see hope from his responsiveness.&lt;br /&gt;&lt;br /&gt;The rest were analysts representing some of the leading firms in the space. They not only expressed interest, but also contributed their own ideas on how to make this effort successful. Indeed, I met with two analysts this week to discuss next steps.&lt;br /&gt;&lt;br /&gt;Here is where I see this going.&lt;br /&gt;&lt;br /&gt;In order for any efforts to communicate the technology of information access to be effective, the forum has to establish credibility as a vendor-neutral and analyst-neutral forum. Ideally, that means having at least two major vendors and two major analysts on board. What we want to avoid is having only one major vendor or analyst, since that will create a reasonable perception of bias.&lt;br /&gt;&lt;br /&gt;I'd also like to involve academics in information retrieval and library and information science. As one of the analysts suggested, we could reach out to the leading &lt;a href="http://www.ischools.org/"&gt;iSchools&lt;/a&gt;, who have expressed an open interest in engaging the broader community.&lt;br /&gt;&lt;br /&gt;What I'd like to see come together is a forum, probably a one-day workshop, that brings together credible representatives from the vendor, analyst, and academic communities. With a critical mass of participants and enough diversity to assuage concerns of bias, we can start making good on this call to action.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6897820631787075542?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6897820631787075542/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6897820631787075542' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6897820631787075542'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6897820631787075542'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/call-to-action-follow-up.html' title='Call to Action - A Follow-Up'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6202771048223829525</id><published>2008-07-15T22:27:00.002-04:00</published><updated>2008-07-15T22:44:49.912-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='intelligence analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='psychology'/><category scheme='http://www.blogger.com/atom/ns#' term='uncertainty'/><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><title type='text'>Beyond a Reasonable Doubt</title><content type='html'>In &lt;a href="http://thenoisychannel.blogspot.com/2008/07/psychology-of-intelligence-analysis.html"&gt;Psychology of Intelligence Analysis&lt;/a&gt;, Richards Heuer advocates that we quantify expressions of uncertainty: "To avoid ambiguity, insert an odds ratio or probability range in parentheses after expressions of uncertainty in key judgments."&lt;br /&gt;&lt;br /&gt;His suggestion reminds me of my pet peeve about the unquantified notion of &lt;a href="http://en.wikipedia.org/wiki/Reasonable_doubt"&gt;reasonable doubt&lt;/a&gt; in the American justice system. I've always wanted (but never had the opportunity) to ask a judge what probability of innocence constitutes a reasonable doubt.&lt;br /&gt;&lt;br /&gt;Unfortunately, as Heuer himself notes elsewhere in his book, we human beings are really bad at estimating probabilities. I suspect (with a confidence of 90 to 95%) that quantifying our uncertainties as probability ranges will only suggest a false sense of precision.&lt;br /&gt;&lt;br /&gt;So, what can we do to better communicate uncertainty? Here are a couple of thoughts:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;We can calibrate estimates based on past performance. It's unclear what will happen if people realize that their estimates are being translated, but, at worst, it feels like good fodder for research in judgment and decision making.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;We can ask people to express relative probability judgments. While these are also susceptible to bias, at least they don't demand as much precision. And we can always vary the framing of questions to try to factor out the cognitive biases they induce.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Also, we talk about uncertainty, it is important that we distinguish between &lt;a href="http://en.wikipedia.org/wiki/Uncertainty_quantification#Types_of_uncertainties"&gt;aleatory and epistemic uncertainty&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;When I flip a coin, I am certain it has a 50% chance of landing heads, because I know the probability distribution of the event space. This is aleatory uncertainty, and forms the basis of probability and statistics.&lt;br /&gt;&lt;br /&gt;But when I reason about less contrived uncertain events, such as estimating the likelihood that my bank will collapse this year, the challenge is my ignorance of the probability distribution. This is epistemic uncertainty, and it's a lot messier.&lt;br /&gt;&lt;br /&gt;If you'd like to learn more about aleatory and existential uncertainty, I recommend Nicholas Nassim Taleb's &lt;a href="http://en.wikipedia.org/wiki/Fooled_by_Randomness"&gt;Fooled by Randomness&lt;/a&gt; (which is a better read than his better-known &lt;a href="http://en.wikipedia.org/wiki/The_Black_Swan_%28book%29"&gt;Black Swan&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;In summary, we have to accept the bad news that the real world is messy. As a mathematician and computer scientist, I've learned to pursue theoretical rigor as an ideal. Like me, you may find it very disconcerting to not be able to treat all real-world uncertainty in terms of probability spaces. Tell it to the judge!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6202771048223829525?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6202771048223829525/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6202771048223829525' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6202771048223829525'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6202771048223829525'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/beyond-reasonable-doubt.html' title='Beyond a Reasonable Doubt'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3467777239988010342</id><published>2008-07-13T11:55:00.003-04:00</published><updated>2008-07-13T12:20:36.386-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Usability'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Information technology'/><title type='text'>Small is Beautiful</title><content type='html'>Today's New York Times has an article by John Markoff called &lt;a href="http://www.nytimes.com/2008/07/13/technology/13stream.html"&gt;On a Small Screen, Just the Salient Stuff&lt;/a&gt;. It argues that the design constraints of the iPhone (and of mobile devices in general) lead to an improved user experience, since site designers do a better job of focusing on the information that users will find relevant.&lt;br /&gt;&lt;br /&gt;Of course, on a blog entitled The Noisy Channel, I can't help praising approaches that strive to improve the signal-to-noise ratio in information seeking applications. And I'm glad to see them quoting &lt;a href="http://www.cs.umd.edu/%7Eben/"&gt;Ben Shneiderman&lt;/a&gt;, a colleague of mine at the University of Maryland who has spent much of his career focusing on &lt;a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval"&gt;HCIR&lt;/a&gt; issues.&lt;br /&gt;&lt;br /&gt;Still, I think they could have taken the idea much further. Their discussion of more efficient or ergonomic use of real estate boils down to stripping extraneous content (a good idea, but hardly novel), and making sites vertically oriented (i.e., no horizontal scrolling). They don't consider the question of what information is best to present in the limited space--which, in my mind, is the most important question to consider as we optimize interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice.&lt;br /&gt;&lt;br /&gt;Perhaps I am asking too much to expect them to call out the extreme inefficiency of ranked lists, compared to &lt;a href="http://thenoisychannel.blogspot.com/2008/05/guided-summarization.html"&gt;summarization-oriented approaches&lt;/a&gt;. Certainly the mobile space opens great opportunities for someone to get this right on the web.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3467777239988010342?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3467777239988010342/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3467777239988010342' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3467777239988010342'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3467777239988010342'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/small-is-beautiful.html' title='Small is Beautiful'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-672160150793502296</id><published>2008-07-11T10:47:00.004-04:00</published><updated>2008-07-15T22:46:02.084-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='intelligence analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Usability'/><category scheme='http://www.blogger.com/atom/ns#' term='psychology'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><title type='text'>Psychology of Intelligence Analysis</title><content type='html'>In the course of working with some of &lt;a href="http://endeca.com/byIndustry/government/intelligence.html#"&gt;Endeca's more interesting clients&lt;/a&gt;, I started reading up on how the intelligence agencies address the challenges of making decisions, especially in the face of incomplete and contradictory evidence. I ran into a book called &lt;a href="https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/PsychofIntelNew.pdf"&gt;Psychology of Intelligence Analysis&lt;/a&gt; by former CIA analyst Richards Heuer. The entire book is available &lt;a href="https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/PsychofIntelNew.pdf"&gt;online&lt;/a&gt;, or you can hunt down a hard copy of the out-of-print book from your favorite used book seller.&lt;br /&gt;&lt;br /&gt;Given the mixed record of the intelligence agencies over the past few decades, you might be wondering if the CIA is the best source for learning how to analyze intelligence. But this book is a gem. Even if the agencies don't always practice what they preach (and the book makes a good case as to why), the book is an excellent tour through the literature on judgment and decision making.&lt;br /&gt;&lt;br /&gt;If you're already familiar with work by &lt;a href="http://en.wikipedia.org/wiki/Herbert_Simon"&gt;Herb Simon&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Daniel_Kahneman"&gt;Danny Kahneman&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/Amos_Tversky"&gt;Amos Tversky&lt;/a&gt;, then a lot of the ground he covers will be familiar--especially the third of the book that enumerates cognitive biases. I'm a big fan of the judgment and decision making literature myself. But I still found some great nuggets, particularly Chapter 8 on Analysis of Competing Hypotheses. Unlike most of the literature that focuses exclusively on demonstrating our systematic departures from rationality,  Heuer hopes offer at least some constructive advice.&lt;br /&gt;&lt;br /&gt;As someone who builds tools to help people make decisions using information that not only may be incomplete and contradictory, but also challenging to find in the first place, I'm very sensitive to how people's cognitive biases affect their ability to use these tools effectively. One of the &lt;a href="http://projects.csail.mit.edu/hcir/"&gt;HCIR '07&lt;/a&gt; presentations by Jolie Martin and Michael Norton (who have worked with &lt;a href="http://www.people.hbs.edu/mbazerman/"&gt;Max Bazerman&lt;/a&gt;) showed how the manner in which information was partitioned on retail web sites drove decisions, i.e., re-organizing the same information affected consumer's decision process.&lt;br /&gt;&lt;br /&gt;It may be tempting for us on the software side to wash our hands of our users' cognitive biases. But such an approach would be short-sighted. As Heuer shows in his well-researched book, people not only have cognitive biases, but are unable to counter those biases simply by being made aware of them. Hence, if software tools are to help people make effective decisions, it is the job of us tool builders to build with those biases in mind, and to support processes like Analysis of Competing Hypotheses that try to compensate for human bias.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-672160150793502296?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/672160150793502296/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=672160150793502296' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/672160150793502296'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/672160150793502296'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/psychology-of-intelligence-analysis.html' title='Psychology of Intelligence Analysis'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-4093860569089042744</id><published>2008-07-10T10:00:00.002-04:00</published><updated>2008-07-10T10:19:18.917-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Nice Selection of Machine Learning Papers</title><content type='html'>John Langford just posted a list of &lt;a href="http://hunch.net/?p=340"&gt;seven ICML '08 papers that he found interesting&lt;/a&gt;. I appreciate his taste in papers, and I particularly liked a paper on &lt;a href="http://www.conflate.net/icml/paper/2008/264"&gt;Learning Diverse Rankings with Multi-Armed Bandits&lt;/a&gt; that addresses learning a diverse ranking of documents based on users' clicking behavior. If you liked the &lt;a href="http://people.csail.mit.edu/harr/papers/sigir2006.ppt"&gt;Less is More&lt;/a&gt; work that Harr Chen and David Karger presented at SIGIR '06, then I recommend you check this one out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-4093860569089042744?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/4093860569089042744/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=4093860569089042744' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4093860569089042744'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4093860569089042744'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/nice-selection-of-machine-learning.html' title='Nice Selection of Machine Learning Papers'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-5025973297536935090</id><published>2008-07-08T01:05:00.002-04:00</published><updated>2008-07-08T01:09:42.782-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Librarian 2.0</title><content type='html'>Many of the words that mark milestones in the history of technology, such as calculator and word processor, originally corresponded to people. Calculating had at least two lives as a technology breakthrough--first as a process, and then as a automatic means for executing that process. Thanks to inventions like calculators and computers, human beings have moved up the value chain to become scientists and engineers who take low-level details for granted.&lt;br /&gt;&lt;br /&gt;Similarly, the advances in information science and retrieval have dramatically changed the role of a reference librarian.&lt;br /&gt;&lt;br /&gt;Hopefully some of you old enough to remember card catalogs, They were certainly functional if you knew the exact title or author you were looking for, assuming the title wasn't too generic or author too prolific. Where card catalogs fell short was in supporting exploratory search. In many cases, your best bet was to quite literally explore the stacks and hope that locality within the Dewey Decimal system sufficed for to support your information seeking needs. Alternatively, you could follow citation paths--the dead-tree precursor of surfing a hypertext collection.&lt;br /&gt;&lt;br /&gt;For exploratory tasks, library patrons would turn to reference librarians, who would clarify the patrons' needs through a process called the &lt;a href="http://en.wikipedia.org/wiki/Reference_interview"&gt;reference interview&lt;/a&gt;. According to Wikipedia:&lt;br /&gt;&lt;blockquote&gt;A reference interview is composed of two segments:&lt;br /&gt;&lt;br /&gt;   1. An initial segment in which the librarian encourages the user to fully discuss the request.&lt;br /&gt;   2. A final segment in which the librarian asks questions to relate the request to the materials available in the library&lt;br /&gt;&lt;br /&gt;A reference interview is structured (ideally) according to the following series of steps. First the library user states a question or describes a problem. The librarian then clarifies the user's information need, sometimes leading him or her back from a request for a specific resource (which may not be the best one for the problem at hand) to the actual information need as it manifests in the library user's life. Following that, the librarian suggests information resources that address the user's information need, explaining the nature and scope of information they contain and soliciting feedback. The reference interview closes when the librarian has provided the appropriate information or a referral to an outside resource where it can be found, and the user confirms that he or she has received the information needed.&lt;/blockquote&gt;Fast forward to the present day. Thanks to modern search engines, title and author search are no longer tedious processes. Moreover, search engines are somewhat forgiving of users, offering spelling correction and inexact query matching. Libraries are still catching up with advances in technology, but the evolution is clearly under way.&lt;br /&gt;&lt;br /&gt;However, search engines have not obviated the need for a reference interview. Excepting the simple cases of known item search, the typical information seeker needs help translating an information need into one or more search queries. And that information need may change as the seeker learns from the process.&lt;br /&gt;&lt;br /&gt;But it should come as no surprise that information seeking support systems need to be more than search engines. The ideal information seeking support system emulates a reference librarian, stepping users through a structured process of clarification. Indeed, this is exactly what my colleagues and I at Endeca are trying to do in our work with libraries and more broadly in pursuing a vision of human computer information retrieval.&lt;br /&gt;&lt;br /&gt;What then becomes of librarians? Much as calculators and computers did not obviate the need for mathematicians, I don't see technology obviating the need for information scientists. Library schools have already evolved into information schools, and I have no doubt that their graduates will help establish the next generation of information seeking technology that makes today's search engines seem as quaint as card catalogs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-5025973297536935090?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/5025973297536935090/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=5025973297536935090' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5025973297536935090'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5025973297536935090'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/librarian-20.html' title='Librarian 2.0'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3580511157496895025</id><published>2008-07-06T19:28:00.005-04:00</published><updated>2008-07-28T16:27:07.189-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cranfield'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Resolving the Battle Royale between Information Retrieval and Information Science</title><content type='html'>&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:latentstyles&gt; &lt;/xml&gt;&lt;!--[endif]--&gt;&lt;!--[if !mso]&gt;&lt;object classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id="ieooui"&gt;&lt;/object&gt; &lt;style&gt; st1\:*{behavior:url(#ieooui) } &lt;/style&gt; &lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Font Definitions */  @font-face  {font-family:Helvetica;  panose-1:2 11 6 4 2 2 2 2 2 4;  mso-font-charset:0;  mso-generic-font-family:swiss;  mso-font-pitch:variable;  mso-font-signature:536902279 -2147483648 8 0 511 0;} @font-face  {font-family:Wingdings;  panose-1:5 0 0 0 0 0 0 0 0 0;  mso-font-charset:2;  mso-generic-font-family:auto;  mso-font-pitch:variable;  mso-font-signature:0 268435456 0 0 -2147483648 0;}  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal  {mso-style-parent:"";  margin-top:0in;  margin-right:0in;  margin-bottom:4.0pt;  margin-left:0in;  text-align:justify;  mso-pagination:widow-orphan;  font-size:9.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-fareast-font-family:"Times New Roman";} h1  {mso-style-next:Normal;  margin-top:2.0pt;  margin-right:0in;  margin-bottom:0in;  margin-left:0in;  margin-bottom:.0001pt;  text-indent:0in;  mso-pagination:widow-orphan;  page-break-after:avoid;  mso-outline-level:1;  mso-list:l0 level1 lfo1;  font-size:12.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-font-kerning:14.0pt;  mso-bidi-font-weight:normal;} h2  {mso-style-parent:"Heading 1";  mso-style-next:Normal;  margin-top:2.0pt;  margin-right:0in;  margin-bottom:0in;  margin-left:0in;  margin-bottom:.0001pt;  text-indent:0in;  mso-pagination:widow-orphan;  page-break-after:avoid;  mso-outline-level:2;  mso-list:l0 level2 lfo1;  font-size:12.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-font-kerning:14.0pt;  mso-bidi-font-weight:normal;} h3  {mso-style-parent:"Heading 2";  mso-style-next:Normal;  margin-top:2.0pt;  margin-right:0in;  margin-bottom:0in;  margin-left:0in;  margin-bottom:.0001pt;  text-indent:0in;  mso-pagination:widow-orphan;  page-break-after:avoid;  mso-outline-level:3;  mso-list:l0 level3 lfo1;  font-size:11.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-font-kerning:14.0pt;  font-weight:normal;  font-style:italic;  mso-bidi-font-style:normal;} h4  {mso-style-parent:"Heading 3";  mso-style-next:Normal;  margin-top:2.0pt;  margin-right:0in;  margin-bottom:0in;  margin-left:0in;  margin-bottom:.0001pt;  text-indent:0in;  mso-pagination:widow-orphan;  page-break-after:avoid;  mso-outline-level:4;  mso-list:l0 level4 lfo1;  font-size:11.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-font-kerning:14.0pt;  font-weight:normal;  font-style:italic;  mso-bidi-font-style:normal;} h5  {mso-style-parent:"List Number 3";  mso-style-next:Normal;  margin-top:2.0pt;  margin-right:0in;  margin-bottom:0in;  margin-left:0in;  margin-bottom:.0001pt;  text-indent:0in;  mso-pagination:widow-orphan;  mso-outline-level:5;  mso-list:l0 level5 lfo1;  font-size:11.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  font-weight:normal;  font-style:italic;  mso-bidi-font-style:normal;} h6  {mso-style-next:Normal;  margin-top:12.0pt;  margin-right:0in;  margin-bottom:3.0pt;  margin-left:0in;  text-align:justify;  text-indent:0in;  mso-pagination:widow-orphan;  mso-outline-level:6;  mso-list:l0 level6 lfo1;  font-size:11.0pt;  mso-bidi-font-size:10.0pt;  font-family:Arial;  mso-bidi-font-family:"Times New Roman";  font-weight:normal;  font-style:italic;  mso-bidi-font-style:normal;} p.MsoHeading7, li.MsoHeading7, div.MsoHeading7  {mso-style-next:Normal;  margin-top:12.0pt;  margin-right:0in;  margin-bottom:3.0pt;  margin-left:0in;  text-align:justify;  text-indent:0in;  mso-pagination:widow-orphan;  mso-outline-level:7;  mso-list:l0 level7 lfo1;  font-size:9.0pt;  mso-bidi-font-size:10.0pt;  font-family:Arial;  mso-fareast-font-family:"Times New Roman";  mso-bidi-font-family:"Times New Roman";} p.MsoHeading8, li.MsoHeading8, div.MsoHeading8  {mso-style-next:Normal;  margin-top:12.0pt;  margin-right:0in;  margin-bottom:3.0pt;  margin-left:0in;  text-align:justify;  text-indent:0in;  mso-pagination:widow-orphan;  mso-outline-level:8;  mso-list:l0 level8 lfo1;  font-size:9.0pt;  mso-bidi-font-size:10.0pt;  font-family:Arial;  mso-fareast-font-family:"Times New Roman";  mso-bidi-font-family:"Times New Roman";  font-style:italic;  mso-bidi-font-style:normal;} p.MsoHeading9, li.MsoHeading9, div.MsoHeading9  {mso-style-next:Normal;  margin-top:12.0pt;  margin-right:0in;  margin-bottom:3.0pt;  margin-left:0in;  text-align:justify;  text-indent:0in;  mso-pagination:widow-orphan;  mso-outline-level:9;  mso-list:l0 level9 lfo1;  font-size:9.0pt;  mso-bidi-font-size:10.0pt;  font-family:Arial;  mso-fareast-font-family:"Times New Roman";  mso-bidi-font-family:"Times New Roman";  font-style:italic;  mso-bidi-font-style:normal;} p.MsoFooter, li.MsoFooter, div.MsoFooter  {margin-top:0in;  margin-right:0in;  margin-bottom:4.0pt;  margin-left:0in;  text-align:justify;  mso-pagination:widow-orphan;  tab-stops:center 3.0in right 6.0in;  font-size:9.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-fareast-font-family:"Times New Roman";} p.MsoListNumber3, li.MsoListNumber3, div.MsoListNumber3  {margin-top:0in;  margin-right:0in;  margin-bottom:4.0pt;  margin-left:0in;  text-align:justify;  text-indent:0in;  mso-pagination:widow-orphan;  mso-list:l0 level1 lfo1;  font-size:9.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-fareast-font-family:"Times New Roman";} p.MsoBodyText, li.MsoBodyText, div.MsoBodyText  {margin:0in;  margin-bottom:.0001pt;  text-align:justify;  mso-pagination:widow-orphan;  mso-element:frame;  mso-element-frame-width:3.25in;  mso-element-frame-height:105.6pt;  mso-element-frame-hspace:9.35pt;  mso-element-wrap:around;  mso-element-anchor-vertical:page;  mso-element-anchor-horizontal:page;  mso-element-left:57.75pt;  mso-element-top:612.25pt;  mso-height-rule:exactly;  mso-element-anchor-lock:locked;  font-size:8.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-fareast-font-family:"Times New Roman";} p.MsoBodyTextIndent, li.MsoBodyTextIndent, div.MsoBodyTextIndent  {margin:0in;  margin-bottom:.0001pt;  text-align:justify;  text-indent:.25in;  mso-pagination:widow-orphan;  font-size:9.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-fareast-font-family:"Times New Roman";} a:link, span.MsoHyperlink  {color:blue;  text-decoration:underline;  text-underline:single;} a:visited, span.MsoHyperlinkFollowed  {color:purple;  text-decoration:underline;  text-underline:single;} p.Author, li.Author, div.Author  {mso-style-name:Author;  margin-top:0in;  margin-right:0in;  margin-bottom:4.0pt;  margin-left:0in;  text-align:center;  mso-pagination:widow-orphan;  font-size:12.0pt;  mso-bidi-font-size:10.0pt;  font-family:Helvetica;  mso-fareast-font-family:"Times New Roman";  mso-bidi-font-family:"Times New Roman";} p.Paper-Title, li.Paper-Title, div.Paper-Title  {mso-style-name:Paper-Title;  margin-top:0in;  margin-right:0in;  margin-bottom:6.0pt;  margin-left:0in;  text-align:center;  mso-pagination:widow-orphan;  font-size:18.0pt;  mso-bidi-font-size:10.0pt;  font-family:Helvetica;  mso-fareast-font-family:"Times New Roman";  mso-bidi-font-family:"Times New Roman";  font-weight:bold;  mso-bidi-font-weight:normal;} p.E-Mail, li.E-Mail, div.E-Mail  {mso-style-name:E-Mail;  mso-style-parent:Author;  margin-top:0in;  margin-right:0in;  margin-bottom:3.0pt;  margin-left:0in;  text-align:center;  mso-pagination:widow-orphan;  font-size:12.0pt;  mso-bidi-font-size:10.0pt;  font-family:Helvetica;  mso-fareast-font-family:"Times New Roman";  mso-bidi-font-family:"Times New Roman";} p.Abstract, li.Abstract, div.Abstract  {mso-style-name:Abstract;  mso-style-parent:"Heading 1";  margin-top:0in;  margin-right:0in;  margin-bottom:6.0pt;  margin-left:0in;  text-align:justify;  mso-pagination:widow-orphan;  page-break-after:avoid;  font-size:9.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-fareast-font-family:"Times New Roman";  mso-font-kerning:14.0pt;} p.References, li.References, div.References  {mso-style-name:References;  margin-top:0in;  margin-right:0in;  margin-bottom:4.0pt;  margin-left:.25in;  text-indent:-.25in;  mso-pagination:widow-orphan;  mso-list:l1 level1 lfo2;  tab-stops:list .25in;  font-size:9.0pt;  mso-bidi-font-size:10.0pt;  font-family:"Times New Roman";  mso-fareast-font-family:"Times New Roman";} @page Section1  {size:8.5in 11.0in;  margin:1.0in .75in 1.0in .75in;  mso-header-margin:.5in;  mso-footer-margin:.5in;  mso-paper-source:0;} div.Section1  {page:Section1;} @page Section2  {size:8.5in 11.0in;  margin:1.0in .75in 1.0in .75in;  mso-header-margin:.5in;  mso-footer-margin:.5in;  mso-columns:3 even 0in;  mso-paper-source:0;} div.Section2  {page:Section2;} @page Section3  {size:8.5in 11.0in;  margin:1.0in .75in 1.0in .75in;  mso-header-margin:.5in;  mso-footer-margin:.5in;  mso-columns:2 even 23.75pt;  mso-paper-source:0;} div.Section3  {page:Section3;} @page Section4  {size:8.5in 11.0in;  margin:1.0in .75in 1.0in .75in;  mso-header-margin:.5in;  mso-footer-margin:.5in;  mso-paper-source:0;} div.Section4  {page:Section4;}  /* List Definitions */  @list l0  {mso-list-id:-5;  mso-list-template-ids:-1;} @list l0:level1  {mso-level-style-link:"Heading 1";  mso-level-tab-stop:none;  mso-level-number-position:left;  mso-level-legacy:yes;  mso-level-legacy-indent:0in;  mso-level-legacy-space:.1in;  margin-left:0in;  text-indent:0in;} @list l0:level2  {mso-level-style-link:"Heading 2";  mso-level-text:"%1\.%2";  mso-level-tab-stop:none;  mso-level-number-position:left;  mso-level-legacy:yes;  mso-level-legacy-indent:0in;  mso-level-legacy-space:.1in;  margin-left:0in;  text-indent:0in;} @list l0:level3  {mso-level-style-link:"Heading 3";  mso-level-text:"%1\.%2\.%3";  mso-level-tab-stop:none;  mso-level-number-position:left;  mso-level-legacy:yes;  mso-level-legacy-indent:0in;  mso-level-legacy-space:.1in;  margin-left:0in;  text-indent:0in;} @list l0:level4  {mso-level-style-link:"Heading 4";  mso-level-text:"%1\.%2\.%3\.%4";  mso-level-tab-stop:none;  mso-level-number-position:left;  mso-level-legacy:yes;  mso-level-legacy-indent:0in;  mso-level-legacy-space:.1in;  margin-left:0in;  text-indent:0in;} @list l0:level5  {mso-level-style-link:"Heading 5";  mso-level-text:"%1\.%2\.%3\.%4\.%5";  mso-level-tab-stop:none;  mso-level-number-position:left;  mso-level-legacy:yes;  mso-level-legacy-indent:0in;  mso-level-legacy-space:.1in;  margin-left:0in;  text-indent:0in;} @list l0:level6  {mso-level-style-link:"Heading 6";  mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6";  mso-level-tab-stop:none;  mso-level-number-position:left;  mso-level-legacy:yes;  mso-level-legacy-indent:0in;  mso-level-legacy-space:.1in;  margin-left:0in;  text-indent:0in;} @list l0:level7  {mso-level-style-link:"Heading 7";  mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7";  mso-level-tab-stop:none;  mso-level-number-position:left;  mso-level-legacy:yes;  mso-level-legacy-indent:0in;  mso-level-legacy-space:.1in;  margin-left:0in;  text-indent:0in;} @list l0:level8  {mso-level-style-link:"Heading 8";  mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";  mso-level-tab-stop:none;  mso-level-number-position:left;  mso-level-legacy:yes;  mso-level-legacy-indent:0in;  mso-level-legacy-space:.1in;  margin-left:0in;  text-indent:0in;} @list l0:level9  {mso-level-style-link:"Heading 9";  mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";  mso-level-tab-stop:none;  mso-level-number-position:left;  mso-level-legacy:yes;  mso-level-legacy-indent:0in;  mso-level-legacy-space:.1in;  margin-left:0in;  text-indent:0in;} @list l1  {mso-list-id:1864198689;  mso-list-type:simple;  mso-list-template-ids:-1593771556;} @list l1:level1  {mso-level-style-link:References;  mso-level-text:"\[%1\]";  mso-level-tab-stop:.25in;  mso-level-number-position:left;  margin-left:.25in;  text-indent:-.25in;  mso-ansi-font-size:9.0pt;  font-family:"Times New Roman";} @list l2  {mso-list-id:1891335719;  mso-list-type:hybrid;  mso-list-template-ids:150793862 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;} @list l2:level1  {mso-level-number-format:bullet;  mso-level-text:;  mso-level-tab-stop:.5in;  mso-level-number-position:left;  text-indent:-.25in;  font-family:Symbol;} ol  {margin-bottom:0in;} ul  {margin-bottom:0in;} --&gt; &lt;/style&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable  {mso-style-name:"Table Normal";  mso-tstyle-rowband-size:0;  mso-tstyle-colband-size:0;  mso-style-noshow:yes;  mso-style-parent:"";  mso-padding-alt:0in 5.4pt 0in 5.4pt;  mso-para-margin:0in;  mso-para-margin-bottom:.0001pt;  mso-pagination:widow-orphan;  font-size:10.0pt;  font-family:"Times New Roman";  mso-ansi-language:#0400;  mso-fareast-language:#0400;  mso-bidi-language:#0400;} &lt;/style&gt; &lt;![endif]--&gt;  &lt;div class="Section1"&gt;&lt;div style="text-align: left;"&gt;The following is the position paper I submitted to the &lt;a href="http://www.ils.unc.edu/ISSS/"&gt;NSF Information Seeking Support Systems Workshop&lt;/a&gt; last month. The workshop report is still being assembled, but I wanted to share my own contribution to the discussion, since it is particularly appropriate to the themes of The Noisy Channel.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;p class="Paper-Title" style="margin-bottom: 3pt;"&gt;Resolving the &lt;st1:city st="on"&gt;&lt;st1:place st="on"&gt;Battle&lt;/st1:place&gt;&lt;/st1:city&gt; Royale between Information Retrieval and Information Science&lt;/p&gt;  &lt;/div&gt;  &lt;b style=""&gt;&lt;span style=";font-family:Helvetica;font-size:18;"  &gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div class="Section2"&gt;&lt;span style=";font-family:Helvetica;font-size:12;"  &gt;&lt;/span&gt;  &lt;p class="Author" style="margin-bottom: 0.0001pt;"&gt;Daniel Tunkelang&lt;/p&gt;  &lt;p class="Author" style="margin-bottom: 0.0001pt;"&gt;Endeca&lt;/p&gt;&lt;p class="E-Mail"&gt;&lt;span style="letter-spacing: -0.1pt;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;    &lt;/div&gt;  &lt;span style=";font-family:&amp;quot;;font-size:9;"  &gt; &lt;/span&gt;  &lt;div class="Section3"&gt;  &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt;"&gt;&lt;b style=""&gt;&lt;span style="font-size:12;"&gt;ABSTRACT&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="Abstract"&gt;We propose an approach to help resolve the “battle royale” between the information retrieval and information science communities. The information retrieval side favors the Cranfield paradigm of batch evaluation, criticized by the information science side for its neglect of the user. The information science side favors user studies, criticized by the information retrieval side for their scale and repeatability challenges. Our approach aims to satisfy the primary concerns of both sides.&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 6pt 0in 0.0001pt;"&gt;&lt;b style=""&gt;&lt;span style="font-size:12;"&gt;Categories and Subject Descriptors&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;H.1.2 [Human Factors]: Human information processing.&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;H.3.3 [Information Systems]: Information Search and Retrieval - Information Filtering, Retrieval Models&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;H.5.2 [Information Systems]: Information Interfaces and Presentation - User Interfaces&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 6pt 0in 0.0001pt;"&gt;&lt;b style=""&gt;&lt;span style="font-size:12;"&gt;General Terms&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;Design, Experimentation, Human Factors&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 6pt 0in 0.0001pt;"&gt;&lt;b style=""&gt;&lt;span style="font-size:12;"&gt;Keywords&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;Information science, information retrieval, information seeking, evaluation, user studies&lt;/p&gt;  &lt;h1 style="margin: 6pt 0in 0.0001pt; text-indent: 0in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;1.&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;INTRODUCTION&lt;/h1&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;Over the past few decades, a growing community of researchers has called for the information retrieval community to think outside the Cranfield box. Perhaps the most vocal advocate is Nick Belkin, whose "grand challenges" in his keynote at the 2008 European Conference on Information Retrieval [1] all pertained to the interactive nature of information seeking he claims the Cranfield approach neglects. Belkin cited similar calls to action going back as far as Karen Spärck Jones, in her 1988 acceptance speech for the Gerald Salton award [2], and again from Tefko Saracevic, when he received the same award in 1997 [3]. More recently, we have the Information Seeking and Retrieval research program proposed by Peter Ingwersen and Kalervo Järvelin in &lt;i style=""&gt;The Turn&lt;/i&gt;, published in 2005 [4].&lt;/p&gt;  &lt;h1 style="margin: 6pt 0in 0.0001pt; text-indent: 0in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;2.&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;IMPASSE BETWEEN IR AND IS&lt;/h1&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;Given the advocacy of Belkin and others, why hasn't there been more progress? As Ellen Voorhees noted in defense of Cranfield at the 2006 Workshop on Adaptive Information Retrieval, "changing the abstraction slightly to include just a bit more characterization of the user will result in a dramatic loss of power or increase in cost of retrieval experiments" [5]. Despite user studies that have sought to challenge the Cranfield emphasis on batch information retrieval measures like mean average precision—such as those of Andrew Turpin and Bill Hersh [6]—the information retrieval community, on the whole, remains unconvinced by these experiments because they are smaller in scale and less repeatable than the TREC evaluations.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;As Tefko Saracevic has said, there is a "battle royale" between the information retrieval community, which favors the Cranfield paradigm of batch evaluation despite its neglect of the user, and the information science community, which favors user studies despite their scale and repeatability challenges [7]. How do we move forward?&lt;/p&gt;  &lt;h1 style="margin: 6pt 0in 0.0001pt; text-indent: 0in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;3.&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;PRIMARY CONCERNS OF IR AND IS&lt;/h1&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;Both sides have compelling arguments. If an evaluation procedure is not repeatable and cost-effective, it has little practical value. Nonetheless, it is essential that an evaluation procedure measure the interactive nature of information seeking.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;If we are to find common ground to resolve this dispute, we need to satisfy the primary concerns of both sides:&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin: 0in 0in 6pt 0.5in; text-indent: -0.25in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style="font-family:Symbol;"&gt;&lt;span style=""&gt;·&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;         &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Real information seeking tasks are interstice, so the results of the evaluation procedure must be meaningful in an interactive context.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin: 0in 0in 6pt 0.5in; text-indent: -0.25in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style="font-family:Symbol;"&gt;&lt;span style=""&gt;·&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;         &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;The evaluation procedure must be repeatable and cost-effective.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;In order to move beyond the battle royale and resolve the impasse between the IR and IS communities, we need to address both of these concerns.&lt;/p&gt;  &lt;h1 style="margin: 6pt 0in 0.0001pt; text-indent: 0in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;4.&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;PROPOSED APPROACH&lt;/h1&gt;  &lt;div style=""&gt;&lt;br /&gt;&lt;/div&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;A key point of contention in the battle royale is whether we should evaluate systems by studying individual users or measuring system performance against test collections.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;The short answer is that we need to do both. In order to ground the results of evaluation in realistic contexts, we need to conduct user studies that relate proposed measures to success in interactive information seeking tasks. Otherwise, we optimize under the artificial constraint that a task involves only a single user query.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;Such an approach presumes that we have a characterization of information seeking tasks. This characterization is an open problem that is beyond the scope of this position paper but has been addressed by other information seeking researchers, including Ingwersen and Järvelin [4]. We presume access to a set of tasks that, if not exhaustive, at least applies to a valuable subset of real information seeking problems.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;Consider, as a concrete example, the task of a researcher who, given a comprehensive digital library of technical publications, wants to determine with confidence whether his or her idea is novel. In other words, the researcher want to either discover prior art that anticipates the idea, or to state with confidence that there is no such art. Patent inventors and lawyers performing e-discovery perform analogous tasks. We can measure task performance objectively as a combination of accuracy and efficiency, and we can also consider subject measures like user confidence and satisfaction. Let us assume that we are able to quantify a task success measure that incorporates these factors.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;Given this task and success measure, we would like to know how well an information retrieval system supports the user performing it. As the information scientists correctly argue, user studies are indispensable. But, as we employ user studies to determine which systems are most helpful to users, we need to go a step further and correlate user success to one or more system measures. We can then evaluate these system measures in a repeatable, cost-effective process that does not require user involvement.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;For example, let us hypothesize that mean average precision (MAP) on a given TREC collection is such a measure. We hypothesize that users pursuing the prior art search task are more successful using a system with higher MAP than those using a system with lower MAP. In order to test this hypothesis, we can present users with a family of systems that, insofar as possible, vary only in MAP, and see how well user success correlates to the system’s MAP. If the correlation is strong, then we validate the utility of MAP as a system measure and invest in evaluating systems using MAP against the specified collection in order to predict their utility for the prior art task.&lt;/p&gt;  &lt;p class="MsoBodyTextIndent" style="margin-bottom: 6pt; text-indent: 0in;"&gt;The principle here is a general one, and can even be used not only to compare different algorithms, but also to evaluate more sophisticated interfaces, such as document clustering [8] or faceted search [9]. The only requirement is that we hypothesize and validate system measures that correlate to user success.&lt;/p&gt;  &lt;h1 style="margin: 6pt 0in 0.0001pt; text-indent: 0in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;5.&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;WEAKNESSES OF APPROACH&lt;/h1&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;Our proposed approach has two major weaknesses.&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;The first weakness is that, in a realistic interactive information retrieval context, distinct queries are not independent. Rather, a typical user executes a sequence of queries in pursuit of an information need, each query informed by the results of the previous ones.&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;In a batch test, we must decide the query sequence in advance, and cannot model how the user’s queries depend on system response. Hence, we are limited to computing measures that can be evaluated for each query independently. Nonetheless, we can choose measures which correlate to effectiveness in realistic settings. Hopefully these measures are still meaningful, even when we remove the test queries from their realistic context.&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;The second challenge is that we do not envision a way to compare different interfaces in a batch setting. It seems that testing the relative merits of different interfaces requires real—or at least simulated—users.&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;If, however, we hold the interface constant, then we can define performance measures that apply to those interfaces. For example, we can develop standardized versions of well-studied interfaces, such as faceted search and clustering. We can then compare the performance of different systems that use these interfaces, e.g., different clustering algorithms.&lt;/p&gt;  &lt;h1 style="margin: 6pt 0in 0.0001pt; text-indent: 0in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;6.&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;AN ALTERNATIVE APPROACH&lt;/h1&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;An alternative way to tackle the evaluation problem leverages the “human computation” approach championed by Luis Von Ahn [10]. This approach uses “games with a purpose” to motivate people to perform information-related tasks, such as image tagging and optical character recognition (OCR).&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;A particularly interesting "game" in our present context is Phetch, in which in which one or more "Seekers" compete to find an image based on a text description provided by a "Describer" [11]. The Describer’s goal is to help the Seekers succeed, while the Seekers compete with one another to find the target image within a fixed time limit, using search engine that has indexed the images based on tagging results from the ESP Game. In order to discourage a shotgun approach, the game penalizes Seekers for wrong guesses.&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;This game goes quite far in capturing the essence of interactive information retrieval. If we put aside the competition among the Seekers, then we see that an individual Seeker, aided by the human Describer and the algorithmic--but human indexed--search engine--is pursuing an information retrieval task. Moreover, the Seeker is incented to be both effective and efficient.&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;How can we leverage this framework for information retrieval evaluation? Even though the game envisions both Describers and Seekers to be human beings, there is no reason we cannot allow computers to play too--in either or both roles. Granted, the game, as currently designed, focuses on image retrieval without giving the human players direct access to the image tags, but we could imagine a framework that is more amenable to machine participation, e.g., providing a machine player with a set of tags derived from those in the index when that player is presented with an image. Alternatively, there may be a domain more suited than image retrieval to incorporating computer players. &lt;/p&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;The main appeal of the game framework is that it allows all participants to be judged based on an objective criterion that reflects the effectiveness and efficiency of the interactive information retrieval process. A good Describer should, on average, outscore a bad Describer over the long term; likewise, a good Seeker should outscore a bad one. We can even vary the search engine available to Seekers, in order to compare competing search engine algorithms or interfaces.&lt;/p&gt;  &lt;h1 style="margin: 6pt 0in 0.0001pt; text-indent: 0in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;7.&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;CONCLUSION&lt;/h1&gt;  &lt;p class="MsoNormal" style="margin-bottom: 6pt;"&gt;Our goal is ambitious: we aspire towards an evaluation framework that satisfies information scientists as relevant to real-world information seeking, but nonetheless offers the practicality of the Cranfield paradigm that dominates information retrieval. The near absence of collaboration between the information science and information retrieval communities has been a greatly missed opportunity not only for both researcher communities but also for the rest of the world who could benefit from practical advances in our understanding of information seeking. We hope that the approach we propose takes at least a small step towards resolving this battle royale.&lt;/p&gt;  &lt;h1 style="margin: 6pt 0in 0.0001pt; text-indent: 0in;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;8.&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;REFERENCES&lt;/h1&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[1]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Belkin, N. J., 2008. Some(What) Grand Challenges for Information Retrieval. &lt;i style=""&gt;ACM SIGIR Forum 42, 1&lt;/i&gt; (June 2008), 47-54.&lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[2]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Spärck Jones, K. 1988. A look back and a look forward. In: SIGIR ’88. In &lt;i style=""&gt;Proceedings of the 11th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval&lt;/i&gt;, 13-29.&lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[3]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Saracevic, T. 1997. Users lost: reflections of the past, future and limits of information science. &lt;i style=""&gt;ACM SIGIR Forum 31, 2&lt;/i&gt; (July 1997), 16-27.&lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[4]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Ingwersen, P. and Järvelin, K. 2005. &lt;i style=""&gt;The turn. Integration of information seeking and retrieval in context&lt;/i&gt;. Springer.&lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[5]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Voorhees, E. 2006. Building Test Collections for Adaptive Information Retrieval: What to Abstract for What cost?&lt;span style=""&gt;  &lt;/span&gt;In &lt;i style=""&gt;First International Workshop on Adaptive Information Retrieval (AIR)&lt;/i&gt;. &lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[6]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Turpin, A. and Scholer, F. 2006. User performance versus precision measures for simple search tasks. In &lt;i style=""&gt;Proceedings&lt;br /&gt;&lt;span style=""&gt; &lt;/span&gt;of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval&lt;/i&gt;, 11-18.&lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[7]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. &lt;i style=""&gt;Journal of the American Society for Information Science and Technology 58(3)&lt;/i&gt;, 1915-1933.&lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[8]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Cutting, D., Karger, D., Pedersen, J., and Tukey, J. 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In&lt;i style=""&gt; Proceedings of the 15th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval&lt;/i&gt;, 318-329.&lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[9]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;     &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Workshop on Faceted Search. 2006. In&lt;i style=""&gt; Proceedings of the 29th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval&lt;/i&gt;.&lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[10]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;  &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Von Ahn, L. 2006. Games with a Purpose. &lt;i style=""&gt;IEEE Computer 39, 6&lt;/i&gt; (June 2006), 92-94.&lt;/p&gt;  &lt;p class="References"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=""&gt;[11]&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;  &lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Von Ahn, L., Ginosar, S., Kedia, M., Liu, R., and Blum, M. 2006. Improving accessibility of the web with a computer game. In &lt;i style=""&gt;Proceedings of the SIGCHI Conference on Human Factors in Computing Systems&lt;/i&gt;, 79-82.&lt;/p&gt;  &lt;/div&gt;  &lt;span style=";font-family:&amp;quot;;font-size:9;"  &gt;&lt;br /&gt;&lt;/span&gt;  &lt;p class="Paper-Title" style="text-align: justify;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3580511157496895025?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3580511157496895025/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3580511157496895025' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3580511157496895025'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3580511157496895025'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/resolving-battle-royale-between.html' title='Resolving the Battle Royale between Information Retrieval and Information Science'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1920554294947453583</id><published>2008-07-02T11:40:00.007-04:00</published><updated>2008-07-18T12:39:10.781-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Information technology'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='Enterprise Search'/><title type='text'>A Call to Action</title><content type='html'>I sent the following open letter to the leading enterprise providers and industry analysts in the information access community. I am inspired by the recent efforts of researchers to bring industry events to major academic conferences. I'd like to see industry--particularly enterprise providers and industry analysts--return the favor, embracing these events to help bridge the gap between research and practice.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Dear friends in the information access community,&lt;br /&gt;&lt;br /&gt;I am reaching out to you with this open letter because I believe we, the leading providers and analysts in the information access community, share a common goal of helping companies understand, evaluate, and differentiate the technologies in this space.&lt;br /&gt;&lt;br /&gt;Frankly, I feel that we as a community can do much better at achieving this goal. In my experience talking with CTOs, CIOs, and other decision makers in enterprises, I've found that too many people fail to understand either the state of current technology or the processes they need to put in place to leverage that technology. Indeed, a &lt;a href="http://www.aiim.org/ResourceCenter/AIIMNews/PressReleases/Article.aspx?ID=34834"&gt;recent AIIM report&lt;/a&gt; confirms what I already knew anecdotally--that there is a widespread failure in the enterprise to understand and derive value from information access.&lt;br /&gt;&lt;br /&gt;In order to advance the state of knowledge, I propose that we engage an underutilized resource: the scholarly community of information retrieval and information science researchers. Not only has this community brought us many of the foundations of the technology we provide, but it has also developed a rigorous tradition of evaluation and peer review.&lt;br /&gt;&lt;br /&gt;In addition, this community has been increasingly interested in connection with practitioners, as demonstrated by the industry days held at top-tier scholarly conferences, such as &lt;a href="http://www.sigir2008.org/"&gt;SIGIR&lt;/a&gt;, &lt;a href="http://www.cikm2008.org/"&gt;CIKM&lt;/a&gt;, and &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;ECIR&lt;/a&gt;. I have participated in a few of these, and I was impressed with the quality of both the presenters and the attendees. Web search leaders, such as Google, Yahoo, and Microsoft, have embraced these events, as have smaller companies that specialize in search and related technologies, such as information extraction. Enterprise information access providers, however, have been largely absent at these events, as have industry analysts.&lt;br /&gt;&lt;br /&gt;I suggest that we take at least the following steps to engage the scholarly community of information retrieval and information science researchers:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Collaborate with the organizers of academic conferences such as &lt;a href="http://www.sigir2008.org/"&gt;SIGIR&lt;/a&gt;, &lt;a href="http://www.cikm2008.org/"&gt;CIKM&lt;/a&gt;, and &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;ECIR&lt;/a&gt; to promote participation of enterprise information access providers and analysts in conference industry days.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Participate in workshops that are particularly relevant to enterprise information access providers, such as the annual &lt;a href="http://research.microsoft.com/%7Eryenw/hcir2008/"&gt;HCIR&lt;/a&gt; and exploratory search workshops.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The rigor and independence of the conferences and workshops makes them ideal as vendor-neutral forums. I hope that you all will join me in working to strengthen the connection between the commercial and scholarly communities, thus furthering everyone's understanding of the technology that drives our community forward.&lt;br /&gt;&lt;br /&gt;Please contact me at &lt;a href="mailto:dt@endeca.com"&gt;dt@endeca.com&lt;/a&gt; or join in an open discussion at &lt;a href="http://thenoisychannel.blogspot.com/2008/07/call-to-action.html"&gt;http://thenoisychannel.blogspot.com/2008/07/call-to-action.html&lt;/a&gt; if you are interested in participating in this effort.&lt;br /&gt;&lt;br /&gt;Sincerely,&lt;br /&gt;Daniel Tunkelang&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1920554294947453583?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1920554294947453583/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1920554294947453583' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1920554294947453583'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1920554294947453583'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/call-to-action.html' title='A Call to Action'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1602511238617372965</id><published>2008-07-01T09:55:00.004-04:00</published><updated>2008-07-01T10:51:49.352-04:00</updated><title type='text'>Clarification before Refinement on Amazon</title><content type='html'>I just noticed today that a search on Amazon (e.g., this search for &lt;a href="http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&amp;amp;field-keywords=algorithms"&gt;algorithms&lt;/a&gt;) does not provide the options to sort the results or to refine by anything other than category. Once you do select a category (e.g., &lt;a href="http://www.amazon.com/s/ref=sr_nr_i_0?ie=UTF8&amp;amp;rs=&amp;amp;keywords=algorithms&amp;amp;rh=i%3Aaps%2Ck%3Aalgorithms%2Ci%3Astripbooks"&gt;books&lt;/a&gt;), you are given additional refinement options, as well as the ability to sort.&lt;br /&gt;&lt;br /&gt;While I find this interface less than ideal (e.g. even if all of your search are in a single category, it still makes you select that category explicitly), I do commend them for recognizing the need to have users &lt;a href="http://thenoisychannel.blogspot.com/2008/06/clarification-vs-refinement.html"&gt;clarify before they refine&lt;/a&gt;. The implication--one we've been pursuing at Endeca--is that it is incumbent on the system to detect when its understanding of the user's intent is ambiguous enough to require a clarification dialogue.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1602511238617372965?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1602511238617372965/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1602511238617372965' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1602511238617372965'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1602511238617372965'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/07/clarification-before-refinement-on.html' title='Clarification before Refinement on Amazon'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8592032743057060610</id><published>2008-06-29T12:23:00.003-04:00</published><updated>2008-06-29T12:38:54.061-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Back from ISSS Workshop</title><content type='html'>My apologies for the sparsity of posts lately; it's been a busy week!&lt;br /&gt;&lt;br /&gt;I just came back from the &lt;a href="http://www.ils.unc.edu/ISSS/"&gt;Information Seeking Support Systems Workshop&lt;/a&gt;, which was sponsored by the National Science Foundation and hosted at the University of North Carolina - Chapel Hill. An excerpt from the workshop home page nicely summarizes its purpose:&lt;br /&gt;&lt;blockquote&gt;The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation. &lt;/blockquote&gt;We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that &lt;a href="http://trec.nist.gov/"&gt;TREC&lt;/a&gt; has rallied the information retrieval community.&lt;br /&gt;&lt;br /&gt;One of the assignments we received at the workshop was to pick a problem we would "go to the mat" for. I'd like to share mine here to get some early feedback:&lt;br /&gt;&lt;blockquote&gt;We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.&lt;/blockquote&gt;I'll let folks know as more information is released from the workshop.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8592032743057060610?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8592032743057060610/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8592032743057060610' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8592032743057060610'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8592032743057060610'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/back-from-isss-workshop.html' title='Back from ISSS Workshop'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1135035861596288130</id><published>2008-06-24T14:58:00.003-04:00</published><updated>2008-06-24T15:10:47.429-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>What is (not) Exploratory Search?</title><content type='html'>One of the recurring topics at The Noisy Channel is &lt;a href="http://thenoisychannel.blogspot.com/search/label/exploratory%20search"&gt;exploratory search&lt;/a&gt;. Indeed, one of our readers recently took the initiative to &lt;a href="http://thenoisychannel.blogspot.com/2008/06/exploratory-search-is-relevant-too.html"&gt;upgrade the Wikipedia entry on exploratory search&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;In the information retrieval literature. exploratory search comes across as a niche topic consigned to specialty workshops. A cursory reading of papers from the major information retrieval conferences would lead one to believe that most search problems boil down to improving relevance ranking, albeit with different techniques for different problems (e.g., expert search vs. document search) or domains (e.g., blogs vs. news).&lt;br /&gt;&lt;br /&gt;But it's not just the research community that has neglected exploratory search. When most non-academics think of search, they think of Google with its search box and ranked list of results. The interaction design of web search is anything but exploratory. To the extent that people engage in exploratory search on the web, they tend to do so in spite of, rather than because of, the tools at their disposal.&lt;br /&gt;&lt;br /&gt;Should we conclude then that exploratory search is, in fact, a fringe use case?&lt;br /&gt;&lt;br /&gt;According to &lt;a href="http://www.scils.rutgers.edu/%7Emuresan/Publications/wshsigirWhite2006.pdf"&gt;Ryen White, Gary Marchionini, and Gheorghe Muresan&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).&lt;br /&gt;&lt;/blockquote&gt;If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is &lt;span style="font-style: italic;"&gt;not &lt;/span&gt;exploratory search.&lt;br /&gt;&lt;br /&gt;Let me offer the following characterization of non-exploratory search:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;You know exactly what you want.&lt;/li&gt;&lt;li&gt;You know exactly how to ask for it.&lt;/li&gt;&lt;li&gt;You expect a search query to yield one of two responses:&lt;br /&gt;- Success: you are presented with the object of your search.&lt;br /&gt;- Failure: you learn that the object of your search is unavailable.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;If any of these assumptions fails to hold, then the search problem is, to some extent, exploratory.&lt;br /&gt;&lt;br /&gt;There are real non-exploratory search needs, such as navigational queries on the web and title searches in digital libraries. But these are, for most purposes, solved problems. Most of the open problems in information retrieval, at least in my view, apply to exploratory search scenarios. It would be nice to see more solutions that explicitly support the process of exploration.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1135035861596288130?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1135035861596288130/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1135035861596288130' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1135035861596288130'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1135035861596288130'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/what-is-not-exploratory-search.html' title='What is (not) Exploratory Search?'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8754987329735809781</id><published>2008-06-20T11:01:00.005-04:00</published><updated>2008-06-22T14:03:46.187-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='Information technology'/><category scheme='http://www.blogger.com/atom/ns#' term='Enterprise Search'/><title type='text'>Enterprise Search Done Right</title><content type='html'>A recent study from AIIM (the Association for Information and Image Management, also known  as the Enterprise Content Management Association) reports that &lt;a href="http://www.aiim.org/ResourceCenter/AIIMNews/PressReleases/Article.aspx?ID=34834"&gt;enterprise search frustrates and disappoints users&lt;/a&gt;. Specifically, 49% of survey respondents “agreed” or “strongly agreed” that it is a difficult and time consuming process to find the information they need to do their job.&lt;br /&gt;&lt;br /&gt;Given that I work for &lt;a href="http://endeca.com/byProject/enterprise_search.html"&gt;a leading enterprise search provider&lt;/a&gt;, you might think I'd find these results disconcerting, even if the report points the blame at clients rather than vendors:&lt;br /&gt;&lt;blockquote&gt;But fault does not lie with technology solution providers.  Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.&lt;/blockquote&gt;&lt;a href="http://thenoisychannel.blogspot.com/2008/04/can-search-be-utility.html"&gt;As I've blogged here before&lt;/a&gt;, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.&lt;br /&gt;&lt;br /&gt;Enterprise search, done right, is a serious investment. But it is also an investment that can offer extraordinary returns in productivity and general happiness. Enterprises need to better appreciate the value, but enterprise search providers need to better communicate the process of creating it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8754987329735809781?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8754987329735809781/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8754987329735809781' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8754987329735809781'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8754987329735809781'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/enterprise-search-done-right.html' title='Enterprise Search Done Right'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6589562405087233603</id><published>2008-06-17T18:46:00.007-04:00</published><updated>2008-06-17T19:12:45.957-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='faceted navigation'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Information Retrieval Systems, 1896 - 1966</title><content type='html'>My colleague and Endeca co-founder &lt;a href="http://www.arnoldit.com/search-wizards-speak/endeca.html"&gt;Pete Bell&lt;/a&gt; just pointed me to &lt;a href="http://www.kk.org/thetechnium/archives/2008/06/one_dead_media.php"&gt;a great post by Kevin Kelly&lt;/a&gt; about what may be the earliest implementation of a faceted navigation system. Like every good Endecan, I'm familiar with &lt;a href="http://en.wikipedia.org/wiki/S._R._Ranganathan"&gt;Ranganathan&lt;/a&gt;'s struggle to sell the library world on &lt;a href="http://en.wikipedia.org/wiki/Colon_classification"&gt;colon classification&lt;/a&gt;. But it is still striking to see this struggle played out through technology artifacts from a pre-Internet world.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6589562405087233603?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6589562405087233603/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6589562405087233603' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6589562405087233603'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6589562405087233603'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/information-retrieval-systems-1896-1966.html' title='Information Retrieval Systems, 1896 - 1966'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-7966246082918970851</id><published>2008-06-16T11:17:00.004-04:00</published><updated>2008-06-16T11:40:16.596-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='faceted navigation'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><title type='text'>A Game to Evaluate Browsing Interfaces?</title><content type='html'>I've mused a fair amount about to apply the concept of the &lt;a href="http://thenoisychannel.blogspot.com/2008/05/games-with-hcir-purpose.html"&gt;Phetch&lt;/a&gt; human computation game to evaluate browsing-based information retrieval interfaces. I'd love to be able to better evaluate faceted navigation and clustering approaches, relative to conventional search as well as relative to one another.&lt;br /&gt;&lt;br /&gt;Here is the sort of co-operative game I have in mind. It uses shopping as a scenario, and has two roles: the Shopper and the Shopping Assistant.&lt;br /&gt;&lt;br /&gt;As a Shopper, you are presented with an shopping list and a browsing interface (i.e., you can click on links but you cannot type free text into a search box). Your goal is to find as many of the items on your shopping list as possible within a fixed time limit. In a variation of this game, not all of the items on the list are findable.&lt;br /&gt;&lt;br /&gt;As a Shopping Assistant, you know the complete inventory, but not what the Shopper is looking for. Your goal is to help the Shopper find as many of the items on his or her  shopping list as possible within a fixed time limit. On each round of interaction, you present the Shopper with information and links within the constraints of a fixed-size page. The links may include options to select items (the Shopper's ultimate goal), as well as options that show more items or modify the query.&lt;br /&gt;&lt;br /&gt;Either role could be played by a human or a machine, and, like Phetch, the game could be made competitive by having multiple players in the same role. I'd think the interesting way to implement such a game would be with human Shoppers and algorithmic Shopping Assistants.&lt;br /&gt;&lt;br /&gt;Is anyone aware of research along these lines? I'm hardly wed to the shopping list metaphor--it could be some other task that seems suitable for browsing-oriented interfaces.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-7966246082918970851?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/7966246082918970851/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=7966246082918970851' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7966246082918970851'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7966246082918970851'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/game-to-evaluate-browsing-interfaces.html' title='A Game to Evaluate Browsing Interfaces?'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6106956512871419318</id><published>2008-06-12T10:47:00.003-04:00</published><updated>2008-06-12T11:02:11.813-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Max Wilson's Blog</title><content type='html'>&lt;a href="http://users.ecs.soton.ac.uk/mlw05r/"&gt;Max Wilson&lt;/a&gt;, a colleague of mine at the University of Southampton who has contributed frequently to the conversation here at the Noisy Channel, just started a blog of his own. Check out Max's blog &lt;a href="http://maxlwilson.blogspot.com/"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;His post on &lt;a href="http://maxlwilson.blogspot.com/2008/06/exhibiting-exploratory-behaviour.html"&gt;exhibiting exploratory behaviour&lt;/a&gt; (that's the Queen's English to you!) raises an issue at the heart of many of our discussions on this blog: what is exploratory behavior? Is it &lt;a href="http://http://thenoisychannel.blogspot.com/2008/06/clarification-vs-refinement.html"&gt;clarification or refinement&lt;/a&gt;? Are users exploring in order to resolve imperfect communication with the information retrieval system, or are they exploring in order to learn?&lt;br /&gt;&lt;br /&gt;These are burning questions, and I look forward to learning more about how Max, &lt;a href="http://users.ecs.soton.ac.uk/mc/"&gt;m.c. schraefel&lt;/a&gt;, and others are addressing them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6106956512871419318?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6106956512871419318/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6106956512871419318' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6106956512871419318'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6106956512871419318'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/max-wilsons-blog.html' title='Max Wilson&apos;s Blog'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-5647587548635888195</id><published>2008-06-11T22:27:00.002-04:00</published><updated>2008-06-11T22:43:23.863-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Cranfield'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Amit Singhal'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>How Google Measures Search Quality</title><content type='html'>Thanks to &lt;a href="http://windowoffice.tumblr.com/"&gt;Jon Elsas&lt;/a&gt; for calling my attention to a great post at Datawocky today on &lt;a href="http://anand.typepad.com/datawocky/2008/06/how-google-measures-search-quality.html"&gt;how Google measures search quality&lt;/a&gt;, written by Anand Rajaraman based on his conversation with Google Director of Research Peter Norvig.&lt;br /&gt;&lt;br /&gt;The executive summary: rather than relying on click-through data to judge quality, Google employs armies of raters who manually rate search results for randomly selected queries using different ranking algorithms. These manual ratings drive the evaluation and evolution of Google's ranking algorithms.&lt;br /&gt;&lt;br /&gt;I'm intrigued that Google is seems to wholeheartedly embrace the Cranfield paradigm. Of course, they don't publicize their evaluation measures, so perhaps they're optimizing something more interesting than &lt;a href="http://en.wikipedia.org/wiki/Information_retrieval#Average_precision_of_precision_and_recall"&gt;mean average precision&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;More &lt;a href="http://thenoisychannel.blogspot.com/2008/04/q-with-amit-singhal.html"&gt;questions for Amit&lt;/a&gt;. :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-5647587548635888195?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/5647587548635888195/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=5647587548635888195' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5647587548635888195'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5647587548635888195'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/how-google-measures-search-quality.html' title='How Google Measures Search Quality'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2015833506109327418</id><published>2008-06-10T11:37:00.003-04:00</published><updated>2008-06-10T11:45:52.525-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Seeking Opinions about Information Seeking</title><content type='html'>In a couple of weeks, I'll be participating in &lt;a href="http://www.ils.unc.edu/ISSS/"&gt;an invitational workshop sponsored by the National Science Foundation on Information Seeking Support Systems&lt;/a&gt; at the University of North Carolina - Chapel Hill. The participants are an impressive bunch--I feel like I'm the only person attending whom I've never heard of!&lt;br /&gt;&lt;br /&gt;So, what I'd love to know is what concerns readers here would like me to raise. If you've been reading this blog at all, then you know I have no lack of opinions on research directions for information seeking support systems. But I'd appreciate the chance to aggregate ideas from the readership here, and I'll try my best to make sure they surface at the workshop.&lt;br /&gt;&lt;br /&gt;I encourage you to use the comment section to foster discussion, but of course feel free to email me privately (dt at endeca dot com) if you prefer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2015833506109327418?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2015833506109327418/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2015833506109327418' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2015833506109327418'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2015833506109327418'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/seeking-opinions-about-information.html' title='Seeking Opinions about Information Seeking'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8343573241906244298</id><published>2008-06-08T14:10:00.004-04:00</published><updated>2008-06-12T11:06:47.002-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Exploratory search is relevant too!</title><content type='html'>After seeing what the Noisy channel readership has done to improve the &lt;a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval"&gt;HCIR&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Relevance_%28information_retrieval%29"&gt;Relevance &lt;/a&gt;Wikipedia entries, I was thinking we might take on one or two more. Specifically, the &lt;a href="http://en.wikipedia.org/wiki/Exploratory_search"&gt;Exploratory Search&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Exploratory_Search_Systems"&gt;Exploratory Search Systems&lt;/a&gt; entries are, quite frankly, in sad shape.&lt;br /&gt;&lt;br /&gt;Between the readership here, the folks involved in &lt;a href="http://thenoisychannel.blogspot.com/2008/06/hcir-08.html"&gt;HCIR '08&lt;/a&gt;, and the participants in the &lt;a href="http://www.ils.unc.edu/ISSS/"&gt;IS3 workshop&lt;/a&gt;, I would think we have more than enough expertise in exploratory search to fix these up.&lt;br /&gt;&lt;br /&gt;Any volunteers? For those of you who are doing research in exploratory search, consider that those two Wikipedia pages are the top hits returned when people &lt;a href="http://www.google.com/search?q=exploratory+search"&gt;search for exploratory search on Google&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8343573241906244298?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8343573241906244298/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8343573241906244298' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8343573241906244298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8343573241906244298'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/exploratory-search-is-relevant-too.html' title='Exploratory search is relevant too!'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-957979520688672445</id><published>2008-06-05T14:20:00.007-04:00</published><updated>2008-06-12T11:06:29.497-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>HCIR '08</title><content type='html'>It's my pleasure to announce...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;HCIR '08: Second Workshop on Human-Computer Interaction and Information Retrieval&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;October 23, 2008&lt;br /&gt;Redmond, Washington, USA&lt;br /&gt;&lt;a href="http://research.microsoft.com/%7Eryenw/hcir2008"&gt;http://research.microsoft.com/~ryenw/hcir2008&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;About this Workshop&lt;/span&gt;&lt;br /&gt;As our lives become ever more digital, we face the difficult task of navigating the complex information spaces we create. The fields of Human-Computer Interaction (HCI) and Information Retrieval (IR) have both developed innovative techniques to address this challenge, but their insights have to date often failed to cross disciplinary borders.&lt;br /&gt;&lt;br /&gt;In this one-day workshop we will explore the advances each domain can bring to the other. Following the success of the &lt;a href="http://projects.csail.mit.edu/hcir/"&gt;HCIR 2007&lt;/a&gt; workshop, co-hosted by &lt;a href="http://www.mit.edu/"&gt;MIT&lt;/a&gt; and &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt;, we are once again bringing together academics, industrial researchers, and practitioners for a discussion of this important topic.&lt;br /&gt;&lt;br /&gt;This year the workshop is focused on the design, implementation, and evaluation of search interfaces. We are particularly interested in interfaces that support complex and exploratory search tasks.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Keynote speaker&lt;/span&gt;&lt;span style="font-size:100%;"&gt;: &lt;/span&gt;&lt;a href="http://research.microsoft.com/%7Esdumais/"&gt;Susan Dumais&lt;/a&gt;, Microsoft Research&lt;br /&gt;&lt;br /&gt;Researchers and practitioners are invited to present interfaces (including mockups, prototypes, and other early-stage designs), research results from user studies of interfaces, and system demonstrations related to the intersection of Human Computer Interaction (HCI) and Information Retrieval (IR). The intent of the workshop is not archival publication, but rather to provide a forum to build community and to stimulate discussion, new insight, and experimentation on search interface design. Demonstrations of systems and prototypes are particularly welcome.&lt;br /&gt;&lt;br /&gt;Possible topics include, but are not limited to:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Novel interaction techniques for information retrieval.&lt;/li&gt;&lt;li&gt;Modeling and evaluation of interactive information retrieval.&lt;/li&gt;&lt;li&gt;Exploratory search and information discovery.&lt;/li&gt;&lt;li&gt;Information visualization and visual analytics.&lt;/li&gt;&lt;li&gt;Applications of HCI techniques to information retrieval needs in specific domains.&lt;/li&gt;&lt;li&gt;Ethnography and user studies relevant to information retrieval and access.&lt;/li&gt;&lt;li&gt;Scale and efficiency considerations for interactive information retrieval systems.&lt;/li&gt;&lt;li&gt;Relevance feedback and active learning approaches for information retrieval.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Important Dates&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Aug 22 - Papers/abstracts due&lt;/li&gt;&lt;li&gt;Sep 12 - Decisions to authors&lt;/li&gt;&lt;li&gt;Oct 3 -  Final copy due for printing&lt;/li&gt;&lt;li&gt;Oct 23 - Workshop date&lt;/li&gt;&lt;/ul&gt;Contributions will be peer-reviewed by two members of the program committee. For information on paper submission, see &lt;a href="http://research.microsoft.com/%7Eryenw/hcir2008/submit.html"&gt;http://research.microsoft.com/~ryenw/hcir2008/submit.html&lt;/a&gt; or contact &lt;a href="http://www.blogger.com/cua-hcir2008@cua.edu"&gt;cua-hcir2008@cua.edu&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Workshop Organization&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Workshop chairs:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.cs.cmu.edu/%7Equixote/"&gt;Daniel Tunkelang&lt;/a&gt;, Endeca&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://research.microsoft.com/%7Eryenw/"&gt;Ryen White&lt;/a&gt;, Microsoft Research&lt;/li&gt;&lt;/ul&gt;Program chair:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://faculty.cua.edu/kules/"&gt;Bill Kules&lt;/a&gt;, Catholic University of America&lt;/li&gt;&lt;/ul&gt;Program Committee:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://ciir.cs.umass.edu/%7Eallan/"&gt;James Allan&lt;/a&gt;, University of Massachusetts, USA&lt;/li&gt;&lt;li&gt;Peter Anick, Yahoo!, USA&lt;/li&gt;&lt;li&gt;Peter Bailey, Live Search, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.sis.pitt.edu/%7Epeterb/"&gt;Peter Brusilovsky&lt;/a&gt;, University of Pittsburgh, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.db.dk/ombiblioteksskolen/medarbejdere/default.asp?cid=677&amp;amp;tid=4"&gt;Pia Borlund&lt;/a&gt;, Royal School of Library and Information Science, Denmark&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.ils.unc.edu/%7Ercapra/"&gt;Robert Capra&lt;/a&gt;, University of North Carolina at Chapel Hill, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://www-users.cs.umn.edu/%7Eechi/"&gt;Ed Chi&lt;/a&gt;, Palo Alto Research Center (PARC), USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://research.microsoft.com/%7Ecutrell/"&gt;Ed Cutrell&lt;/a&gt;, Microsoft Research, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://fox.cs.vt.edu/"&gt;Ed Fox&lt;/a&gt;, Virginia Tech, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.fxpal.com/?p=gene"&gt;Gene Golovchinsky&lt;/a&gt;, FX Palo Alto Laboratory, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://people.ischool.berkeley.edu/%7Ehearst/"&gt;Marti Hearst&lt;/a&gt;, University of California at Berkeley, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://ist.psu.edu/faculty_pages/jjansen/"&gt;Jim Jansen&lt;/a&gt;, Pennsylvania State University, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://ils.unc.edu/%7Edianek/"&gt;Diane Kelly&lt;/a&gt;, University of North Carolina at Chapel Hill, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://ils.unc.edu/%7Emarch/"&gt;Gary Marchionini&lt;/a&gt;, University of North Carolina at Chapel Hill, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://research.microsoft.com/%7Emerrie/"&gt;Merrie Morris&lt;/a&gt;, Microsoft Research, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.fxpal.com/?p=jeremy"&gt;Jeremy Pickens&lt;/a&gt;, FX Palo Alto Laboratory, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.clis.umd.edu/people/qu/index.shtml"&gt;Yan Qu&lt;/a&gt;, University of Maryland at College Park, USA&lt;/li&gt;&lt;li&gt;&lt;a href="http://sky.fit.qut.edu.au/%7Espinkah/"&gt;Amanda Spink&lt;/a&gt;, Queensland University of Technology, Australia&lt;/li&gt;&lt;li&gt;&lt;a href="http://management.dal.ca/People%20and%20Groups/Faculty/Profile.php?id=44"&gt;Elaine Toms&lt;/a&gt;, Dalhousie University, Canada&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.bewitched.com/"&gt;Martin Wattenberg&lt;/a&gt;, IBM Research, USA&lt;/li&gt;&lt;li&gt;Ross Wilkinson, CSIRO, Australia&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Supporters&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://research.microsoft.com/"&gt;Microsoft Research&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-957979520688672445?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/957979520688672445/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=957979520688672445' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/957979520688672445'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/957979520688672445'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/hcir-08.html' title='HCIR &apos;08'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-5348469055194889354</id><published>2008-06-04T17:09:00.003-04:00</published><updated>2008-06-12T11:06:02.793-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Powerset'/><category scheme='http://www.blogger.com/atom/ns#' term='faceted navigation'/><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='Natural language processing'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Idea Navigation</title><content type='html'>Last summer, my colleague Vladimir Zelevinsky worked with two interns, &lt;a href="http://www.robinstewart.com/"&gt;Robin Stewart&lt;/a&gt; (MIT) and Greg Scott (Tufts), on a novel approach to information exploration. They call it "idea navigation": the basic idea is to extract subject-verb-object triples from unstructured text, group them into hierarchies, and then expose them in a faceted search and browsing interface. I like to think of it as an exploratory search take on question answering.&lt;br /&gt;&lt;br /&gt;We found out later that &lt;a href="http://www.powerset.com/"&gt;Powerset&lt;/a&gt; developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from &lt;a href="http://wordnet.princeton.edu/"&gt;WordNet&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Click on the frame below to see the presentation they delivered at &lt;a href="http://www.chi2008.org/"&gt;CHI '08&lt;/a&gt;.&lt;br /&gt;&lt;a href="http://videolectures.net/chi08_zelevinsky_ins/"&gt;&lt;br /&gt;&lt;img src="http://videolectures.net/chi08_zelevinsky_ins/thumb.jpg" border="0" /&gt;&lt;br /&gt;&lt;br /&gt;Idea Navigation: Structured Browsing for Unstructured Text&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-5348469055194889354?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/5348469055194889354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=5348469055194889354' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5348469055194889354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5348469055194889354'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/idea-navigation.html' title='Idea Navigation'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-879629678074801373</id><published>2008-06-02T13:59:00.002-04:00</published><updated>2008-06-02T14:00:47.706-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='faceted navigation'/><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='Discover &apos;08'/><title type='text'>Clarification vs. Refinement</title><content type='html'>The other day, in between braving the &lt;a href="http://www.universalorlando.com/ioa_attr_hulk.html"&gt;Hulk&lt;/a&gt; and &lt;a href="http://www.universalorlando.com/ioa_attr_spiderman.html"&gt;Spiderman&lt;/a&gt; rides at &lt;a href="http://discover.endeca.com/"&gt;Endeca Discover '08&lt;/a&gt;, I was chatting with &lt;a href="http://semanticstudios.com/about/"&gt;Peter Morville&lt;/a&gt; about one of my favorite pet peeves in faceted search implementations: the confounding of clarification and refinement. To my delight, he posted about it at &lt;a href="http://findability.org/"&gt;findability.org&lt;/a&gt; today.  &lt;p&gt;What is the difference? I think it's easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you're dropped somewhere at random, you're really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.&lt;/p&gt;  &lt;p&gt;How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.&lt;/p&gt;  &lt;p&gt;"Did you mean..." is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I'm glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.&lt;/p&gt;  &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Y0SVT3VxV1E/SEQ1Lfri4zI/AAAAAAAAABc/gYWhe6f8Mo0/s1600-h/map.JPG"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_Y0SVT3VxV1E/SEQ1Lfri4zI/AAAAAAAAABc/gYWhe6f8Mo0/s400/map.JPG" alt="" id="BLOGGER_PHOTO_ID_5207345540746109746" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-879629678074801373?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/879629678074801373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=879629678074801373' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/879629678074801373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/879629678074801373'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/clarification-vs-refinement.html' title='Clarification vs. Refinement'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_Y0SVT3VxV1E/SEQ1Lfri4zI/AAAAAAAAABc/gYWhe6f8Mo0/s72-c/map.JPG' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2914539255905272831</id><published>2008-06-01T00:50:00.004-04:00</published><updated>2008-06-01T19:27:54.929-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Your Input Really is Relevant!</title><content type='html'>For those who haven't been following the &lt;a href="http://thenoisychannel.blogspot.com/2008/05/your-input-is-relevant.html"&gt;progress on the Wikipedia entry for "Relevance (Information Retrieval)&lt;/a&gt;", I'd like to thank &lt;a href="http://www.cs.cmu.edu/%7Ejelsas/"&gt;Jon Elsas&lt;/a&gt;, &lt;a href="http://www.colloquial.com/carp/"&gt;Bob Carpenter&lt;/a&gt;, and &lt;a href="http://ciir.cs.umass.edu/%7Efdiaz/"&gt;Fernando Diaz&lt;/a&gt; for helping turn lead into gold.&lt;br /&gt;&lt;br /&gt;Check out:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/w/index.php?title=Relevance_%28information_retrieval%29&amp;amp;oldid=213147335"&gt;The entry before I edited it.&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/w/index.php?title=Relevance_%28information_retrieval%29&amp;amp;oldid=215096248"&gt;The entry after I edited it.&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/Relevance_%28information_retrieval%29"&gt;The current entry, revised by Jon and Bob, and Fernando.&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;I'm proud of The Noisy Channel community for fixing one of the top two &lt;a href="http://www.google.com/search?q=relevance"&gt;hits on Google for "relevance"&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2914539255905272831?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2914539255905272831/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2914539255905272831' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2914539255905272831'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2914539255905272831'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/06/your-input-really-is-relevant.html' title='Your Input Really is Relevant!'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3319855479927112762</id><published>2008-05-30T01:56:00.002-04:00</published><updated>2008-06-03T10:27:21.724-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><title type='text'>Is Search Broken?</title><content type='html'>&lt;p&gt;Last night, I had the privilege of speaking to fellow &lt;a href="http://www.cs.cmu.edu/"&gt;CMU School of Computer Science&lt;/a&gt; alumni at Fidelity's Center for Advanced Technology in Boston. Dean &lt;a href="http://www.cs.cmu.edu/%7Ebryant/"&gt;Randy Bryant&lt;/a&gt;, Associate Director of Corporate Relations &lt;a href="http://www.cmu.edu/corporate/contact-us.shtml"&gt;Dan Jenkins&lt;/a&gt;, and Director of Alumni Relations &lt;a href="http://www.cs.cmu.edu/alumni/"&gt;Tina Carr&lt;/a&gt;, organized the event, and they encouraged me to pick a provocative subject.&lt;/p&gt;  &lt;p&gt;Thus encouraged, I decided to ask the question: Is Search Broken?&lt;/p&gt;  &lt;p&gt;Slides are &lt;a href="http://www.cs.cmu.edu/%7Equixote/IsSearchBroken.pps"&gt;here&lt;/a&gt; as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;div style="width:425px;text-align:left" id="__ss_442142"&gt;&lt;object style="margin:0px" width="425" height="355"&gt;&lt;param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=is-search-broken-1212417896205639-9"/&gt;&lt;param name="allowFullScreen" value="true"/&gt;&lt;param name="allowScriptAccess" value="always"/&gt;&lt;embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=is-search-broken-1212417896205639-9" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;"&gt;&lt;a href="http://www.slideshare.net/?src=embed"&gt;&lt;img src="http://static.slideshare.net/swf/logo_embd.png" style="border:0px none;margin-bottom:-5px" alt="SlideShare"/&gt;&lt;/a&gt; | &lt;a href="http://www.slideshare.net/dtunkelang/is-search-broken?src=embed" title="View Is Search Broken?! on SlideShare"&gt;View&lt;/a&gt; | &lt;a href="http://www.slideshare.net/upload?src=embed"&gt;Upload your own&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3319855479927112762?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3319855479927112762/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3319855479927112762' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3319855479927112762'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3319855479927112762'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/is-search-broken.html' title='Is Search Broken?'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-7287261634778736146</id><published>2008-05-28T22:24:00.005-04:00</published><updated>2008-05-28T22:43:41.606-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Another HCIR Game</title><content type='html'>I just received an announcement from the &lt;a href="http://www.sigir.org/sigirlist/"&gt;SIG-IRList&lt;/a&gt; about the &lt;a href="http://soporte1.lsi.uned.es/flickling/"&gt;flickling challenge&lt;/a&gt;, a "game" designed around known-item image retrieval from &lt;a href="http://flickr.com/"&gt;Flickr&lt;/a&gt;. The user is given an image (not annotated) and the goal is to find the image again from Flickr using the system.&lt;br /&gt;&lt;br /&gt;I'm not sure how well it will catch on with casual gamers--but that is hardly its primary motivation. Rather, the challenge was designed to help provide a foundation for evaluating interactive information retrieval--in a cross-language setting, no less. Details available at the &lt;a href="http://nlp.uned.es/iCLEF/2008/guidelines.htm"&gt;iCLEF 2008&lt;/a&gt; site or in this &lt;a href="http://nlp.uned.es/iCLEF/ECIR-evaluation-workshop.pdf"&gt;paper&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I'm thrilled to see efforts like these emerging to evaluate interactive retrieval--indeed, this feels like a solitaire version of&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt; &lt;a href="http://thenoisychannel.blogspot.com/2008/05/games-with-hcir-purpose.html"&gt;Phetch&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-7287261634778736146?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/7287261634778736146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=7287261634778736146' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7287261634778736146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7287261634778736146'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/another-hcir-game.html' title='Another HCIR Game'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-9074044437192097047</id><published>2008-05-27T15:29:00.003-04:00</published><updated>2008-05-27T15:33:51.875-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='Discover &apos;08'/><title type='text'>The Magic Shelf</title><content type='html'>I generally shy away from pimping &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt;'s customers here at The Noisy Channel, but occasionally I have to make an exception. As some of you may remember, &lt;a href="http://borders.com/"&gt;Borders&lt;/a&gt; made a &lt;a href="http://www.internetretailer.com/internet/marketing-conference/07271-amazoncom-will-operate-borderscom-web-site.html"&gt;deal&lt;/a&gt; several years ago to have Amazon operate their web site. Last year, they decided to &lt;a href="http://www.networkworld.com/news/2007/051107-borders-new-site.html"&gt;reclaim their site&lt;/a&gt;. And today they are live, powered by Endeca! For more details, visit &lt;a href="http://blog.endeca.com"&gt;http://blog.endeca.com&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Now back to our commercial-free programming...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-9074044437192097047?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/9074044437192097047/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=9074044437192097047' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/9074044437192097047'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/9074044437192097047'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/magic-shelf.html' title='The Magic Shelf'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2422972854211873022</id><published>2008-05-26T10:49:00.002-04:00</published><updated>2008-05-26T11:10:28.646-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Your Input is Relevant!</title><content type='html'>The following is a public service announcement.&lt;br /&gt;&lt;br /&gt;As some of you may know, I am the primary author of the &lt;a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval"&gt;Human Computer Information Retrieval&lt;/a&gt; entry on Wikipedia. I created this entry last November, shortly after the &lt;a href="http://projects.csail.mit.edu/hcir/"&gt;HCIR '07&lt;/a&gt; workshop. One of the ideas we've tossed around for &lt;a href="http://research.microsoft.com/%7Eryenw/hcir2008/"&gt;HCIR '08&lt;/a&gt; is to collaboratively edit the page. But why wait? With apologies to &lt;a href="http://en.wikipedia.org/wiki/Isaac_Asimov"&gt;Isaac Asimov&lt;/a&gt;, I/you/we are Wikipedia, so let's improve the entry now!&lt;br /&gt;&lt;br /&gt;And, while you've got Wikipedia on the brain, please take a look at the &lt;a href="http://en.wikipedia.org/wiki/Relevance_%28information_retrieval%29"&gt;Relevance (Information Retrieval)&lt;/a&gt; entry. After an unsuccessful attempt to have this entry folded into the main &lt;a href="http://en.wikipedia.org/wiki/Information_retrieval"&gt;Information Retrieval &lt;/a&gt;entry, I've tried to &lt;a href="http://en.wikipedia.org/w/index.php?title=Relevance_%28information_retrieval%29&amp;amp;diff=215053724&amp;amp;oldid=213147335"&gt;rewrite&lt;/a&gt; it to conform to what I perceive as Wikipedia's standards of quality and non-partisanship. While I tried my best, I'm sure there's still room for improving it, and I suspect that some of you reading this are among the best qualified folks to do so!&lt;br /&gt;&lt;br /&gt;As &lt;a href="http://en.wikipedia.org/wiki/Lawrence_Lessig"&gt;Lawrence Lessig&lt;/a&gt; says, it's a read-write society. So readers, please help out a bit with the writing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2422972854211873022?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2422972854211873022/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2422972854211873022' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2422972854211873022'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2422972854211873022'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/your-input-is-relevant.html' title='Your Input is Relevant!'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-4251163038178192846</id><published>2008-05-24T15:28:00.003-04:00</published><updated>2008-05-25T15:08:46.072-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Collaborative tagging'/><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Nick Belkin'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Games With an HCIR Purpose?</title><content type='html'>&lt;p&gt;A couple of weeks ago, my colleague &lt;a href="http://www.cs.cmu.edu/%7Ebiglou/"&gt;Luis Von Ahn&lt;/a&gt; at CMU launched &lt;a href="http://gwap.com/"&gt;Games With a Purpose&lt;/a&gt;, &lt;/p&gt;  &lt;p&gt;Here is a brief explanation from the site: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world. &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Von Ahn has made a career (and earned a &lt;a href="http://www.cmu.edu/cmnews/extra/060918_ahn.html"&gt;MacArthur Fellowship&lt;/a&gt;) from his work on such games, most notably the &lt;a href="http://gwap.com/gwap/gamesPreview/espgame/"&gt;ESP Game&lt;/a&gt; and &lt;a href="http://recaptcha.net/"&gt;reCAPTCHA&lt;/a&gt;. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.&lt;/p&gt;  &lt;p&gt;I've been interested in Von Ahn's work for several years, and most particularly in a game called &lt;a href="http://www.peekaboom.org/phetch/"&gt;Phetch&lt;/a&gt;, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Quick! Find an image of Michael Jackson wearing a sailor hat.   &lt;br /&gt;Phetch is like a treasure hunt -- you must find or help find an image from the Web. &lt;/p&gt;    &lt;p&gt;One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions. &lt;/p&gt;    &lt;p&gt;If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer. &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;A few important details that this description leaves out:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the &lt;a href="http://gwap.com/gwap/gamesPreview/espgame/"&gt;ESP Game&lt;/a&gt;.&lt;/li&gt;    &lt;li&gt;A Seeker loses points (I can't recall how many) for wrong guesses.&lt;/li&gt;    &lt;li&gt;The game has a time limit (hence the "Quick!").&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Now, let's unpack the game description and analyze it in terms of the &lt;a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval"&gt;Human-Computer Information Retrieval&lt;/a&gt; (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her &lt;a href="http://en.wikipedia.org/wiki/Wetware"&gt;wetware&lt;/a&gt; to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.&lt;/p&gt;  &lt;p&gt;A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.&lt;/p&gt;  &lt;p&gt;Assuming these simplifications, here is how a Seeker plays Phetch:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Read the description provided by the Describer and uses it to compose a search.&lt;/li&gt;    &lt;li&gt;Scan the results sequentially, interrupting either to make a guess or to reformulate the search.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.&lt;/p&gt;  &lt;p&gt;Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt;, to elaborate such an approach at &lt;a href="http://projects.csail.mit.edu/hcir/web/"&gt;HCIR '07&lt;/a&gt;. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing &lt;a href="http://thenoisychannel.blogspot.com/2008/04/nick-belkin-at-ecir-08.html"&gt;Nick Belkin's grand challenge&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-4251163038178192846?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/4251163038178192846/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=4251163038178192846' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4251163038178192846'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4251163038178192846'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/games-with-hcir-purpose.html' title='Games With an HCIR Purpose?'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-5859876300053249869</id><published>2008-05-22T14:46:00.005-04:00</published><updated>2008-05-22T14:59:16.402-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='Discover &apos;08'/><title type='text'>Back from Orlando</title><content type='html'>I'm back from &lt;a href="http://discover.endeca.com/"&gt;Endeca Discover '08&lt;/a&gt;: two and a half days of &lt;a href="http://discover.endeca.com/?page_id=8"&gt;presentations&lt;/a&gt;, &lt;a href="http://www.universalorlando.com/ioa_attr_hulk.html"&gt;superheroic&lt;/a&gt; &lt;a href="http://www.universalorlando.com/ioa_attr_spiderman.html"&gt;attractions&lt;/a&gt;, and, in the best tradition of The Noisy Channel, &lt;a href="http://www.oshuckspub.com/"&gt;karaoke&lt;/a&gt;. A bunch of us tried our best to blog the presentations at &lt;a href="http://blog.endeca.com/"&gt;http://blog.endeca.com/&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;All in all, a fun exhausting time, but it's good to be back home. So, for those who have noticed the lack of posts in your RSS feeds, I promise I'll start making it up to you in the next few days.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-5859876300053249869?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/5859876300053249869/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=5859876300053249869' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5859876300053249869'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5859876300053249869'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/back-from-orlando.html' title='Back from Orlando'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1789900435253893654</id><published>2008-05-16T18:32:00.002-04:00</published><updated>2008-05-16T18:39:21.487-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='Discover &apos;08'/><title type='text'>Attending Endeca Discover '08</title><content type='html'>I'll be attending &lt;a href="http://discover.endeca.com/"&gt;Endeca Discover '08&lt;/a&gt;, Endeca's annual user conference, from Sunday, May 18th to Wednesday, May 21st, so you might see a bit of a lull in my verbiage here while I live blog at &lt;a href="http://blog.endeca.com/"&gt;http://blog.endeca.com&lt;/a&gt; and hang out in sunny Orlando with Endeca customers and partners.&lt;br /&gt;&lt;br /&gt;If you're attending Discover, please give me a shout and come to my sessions:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Monday, May 19th, 3:30 pm: &lt;a href="http://discover.endeca.com/?page_id=62#2"&gt;Better Applications through Theory&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Wednesday, May 21st, : 9:45 am: &lt;a href="http://discover.endeca.com/?page_id=62#21"&gt;Founding Technologists Forum&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;Otherwise, I'll do my best to sneak in a post or comment, and I'll be back in full force later next week.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1789900435253893654?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1789900435253893654/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1789900435253893654' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1789900435253893654'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1789900435253893654'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/attending-endeca-discover-08.html' title='Attending Endeca Discover &apos;08'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8164632627138963316</id><published>2008-05-16T01:00:00.004-04:00</published><updated>2008-05-16T01:15:07.253-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cranfield'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>A Utilitarian View of IR Evaluation</title><content type='html'>In many information retrieval papers that propose new techniques, the authors validate those techniques by demonstrating improved mean &lt;a href="http://en.wikipedia.org/wiki/Information_retrieval#Average_precision"&gt;average precision&lt;/a&gt; over a standard test collection. The value of such results--at least to a practitioner--hinges on whether mean average precision correlates to utility for users. Not only do &lt;a href="http://portal.acm.org/citation.cfm?id=1148176"&gt;user studies&lt;/a&gt; place this correlation in doubt, but I have yet to see an empirical argument defending the utility of average precision as an evaluation measure. Please send me any references if you are aware of them!&lt;br /&gt;&lt;br /&gt;Of course, user studies are fraught with complications, the most practical one being their expense. I'm not suggesting that we need to replace &lt;a href="http://www.asis.org/Bulletin/Oct-05/voorhees.html"&gt;Cranfield&lt;/a&gt; studies with user studies wholesale. Rather, I see the purpose of user studies as establishing the utility of measures that can then be evaluated by Cranfield studies. As with any other science, we need to work with simplified, abstract models to achieve progress, but we also need to ground those models by validating them in the real world.&lt;br /&gt;&lt;br /&gt;For example, consider the scenario where a collection contains no documents that match a user's need. In this case, it is ideal for the user to reach this conclusion as accurately, quickly, and confidently as possible. Holding the interface constant, are there evaluation measures that correlate to how well users perform on these three criteria? Alternatively, can we demonstrate that some interfaces lead to better user performance than others? If so, can we establish measures suitable for those interfaces?&lt;br /&gt;&lt;br /&gt;The "no documents" case  is just one of many real-world scenarios, and I don't mean to suggest we should study it at the expense of all others. That said, I think it's a particularly valuable scenario that, as far as I can tell, has been neglected by the information retreival community. I use it to drive home the argument that practical use cases should drive our process of defining evaluation measures.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8164632627138963316?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8164632627138963316/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8164632627138963316' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8164632627138963316'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8164632627138963316'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/utilitarian-view-of-ir-evaluation.html' title='A Utilitarian View of IR Evaluation'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6947152143741044369</id><published>2008-05-13T22:22:00.006-04:00</published><updated>2008-05-13T23:21:39.145-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cranfield'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Thinking about IR Evaluation</title><content type='html'>I just read the recent &lt;a href="http://www.sciencedirect.com/science/journal/03064573"&gt;Information Processing &amp;amp; Management&lt;/a&gt; special issue on Evaluation of Interactive Information Retrieval Systems. The articles were a worthwhile read, and yet they weren't exactly what I was looking for. Let me explain.  &lt;br /&gt;  &lt;br /&gt;In fact, let's start by going back to &lt;a href="http://thenoisychannel.blogspot.com/search/label/Cranfield"&gt;Cranfield&lt;/a&gt;. The Cranfield paradigm offers us a quantitative, repeatable means to evaluate information retrieval systems. Its proponents make a strong case that it is effective and cost-effective. Its critics object that it measures the wrong thing because it neglects the user.  &lt;br /&gt;  &lt;br /&gt;But let's look a bit harder at the proponents' case. The primary measure in use today is &lt;a href="http://en.wikipedia.org/wiki/Information_retrieval#Average_precision"&gt;average precision&lt;/a&gt;--indeed, most authors of &lt;a href="http://www.sigir.org/"&gt;SIGIR&lt;/a&gt; papers validate their proposed approaches by demonstrating increased mean average precision (MAP) over a standard test collection of queries. The dominance of average precision as a measure is no accident: it has been shown to be the &lt;a href="http://portal.acm.org/citation.cfm?doid=1076034.1076042"&gt;best single predictor of the precision-recall graph&lt;/a&gt;.  &lt;br /&gt;  &lt;br /&gt;So why are folks like me complaining? There are the various &lt;a href="http://portal.acm.org/citation.cfm?id=1148176"&gt;user studies&lt;/a&gt; asserting that MAP does not predict user performance on search tasks. Those have me at hello, but the studies are controversial in the information retrieval community, and in any case not constructive.  &lt;br /&gt;  &lt;br /&gt;Instead, consider a paper by &lt;a href="http://people.csail.mit.edu/harr/"&gt;Harr Chen&lt;/a&gt; and &lt;a href="http://people.csail.mit.edu/karger/"&gt;David Karger&lt;/a&gt; (both at MIT) entitled &lt;a href="http://portal.acm.org/citation.cfm?id=1148245"&gt;&amp;quot;Less is more.&amp;quot;&lt;/a&gt; Here is a snippet from the abstract:  &lt;br /&gt;  &lt;blockquote&gt;Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the probability ranking principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the user's information need.&lt;/blockquote&gt; Let me rephrase that: the precision-recall graph, which indicates how well a ranked retrieval algorithms does at ranking relevant documents ahead of irrelevant ones, does not necessarily characterize how well a system meets a user's information need.  &lt;br /&gt;  &lt;br /&gt;One of Chen and Karger's examples is the case where the user is only interested in retrieving one relevant document. In this case, a system does well to return a diverse set of results that hedges against different possible query interpretations or query processing strategies. The authors also discuss more general scenarios, along with heuristics to address them.  &lt;br /&gt;  &lt;br /&gt;But the main contribution of this paper, at least in my eyes, is a philosophical one. The authors consider the diversity of user needs and offer quantitative, repeatable way to evaluate information retrieval systems with respect to different needs. Granted, they do not even consider the challenge of evaluating interactive information retrieval. But they do set a good example.  &lt;br /&gt;  &lt;br /&gt;Stay tuned for more musings on this theme...  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6947152143741044369?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6947152143741044369/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6947152143741044369' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6947152143741044369'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6947152143741044369'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/thinking-about-ir-evaluation.html' title='Thinking about IR Evaluation'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2923224255691486890</id><published>2008-05-12T10:38:00.003-04:00</published><updated>2008-05-12T10:49:08.252-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Powerset'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Natural language processing'/><title type='text'>A Lofty Goal</title><content type='html'>The blogosphere is all atwitter with Powerset's public launch last night. Over at Techcrunch, &lt;a href="http://www.techcrunch.com/2008/05/11/powerset-launches-showcase-for-user-search-experience/"&gt;Michael Arrington refers to their approach as a lofty goal&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;But I'd like us to dream bigger. In the science fiction stories that inspired me to study computer  and information science, the human-computer interface is not just natural language input. It's dialogue. The authors do not treat machine understanding of unambiguous requests as a wonder, but instead take it for granted as an artifact of technical progress. Indeed, the human-computer interface only becomes relevant to the plot when communication breaks down (aka "that does not compute").&lt;br /&gt;&lt;br /&gt;Ever since I hacked a BASIC version of &lt;a href="http://en.wikipedia.org/wiki/ELIZA"&gt;ELIZA&lt;/a&gt; on a Commodore 64, I've felt the visceral appeal of natural language input as an interface. Conversely, the progress of speech synthesis attests to our desire to humanize the machine's output. It is as if we want to reduce the Turing Test to a look-and-feel.&lt;br /&gt;&lt;br /&gt;But the essence of dialogue lies beneath the surface. The conversations we have with machines are driven by our information needs, and should be optimized to that end. Even we human drop natural language among ourselves when circumstances call for more efficient communication. Consider an example as mundane as Starbucks baristas eliciting and delegating a latte order.&lt;br /&gt;&lt;br /&gt;In short, let's remember that we want to talk with our computers, not just at them. Today's natural language input may be a step towards that end, or it may be just a detour.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2923224255691486890?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2923224255691486890/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2923224255691486890' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2923224255691486890'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2923224255691486890'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/lofty-goal.html' title='A Lofty Goal'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8749060495512182938</id><published>2008-05-11T15:29:00.003-04:00</published><updated>2008-05-11T15:34:57.175-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Powerset'/><category scheme='http://www.blogger.com/atom/ns#' term='Natural language processing'/><title type='text'>Powerset: Public Launch Later Today</title><content type='html'>As a member of the Powerset private beta, I just received this announcement:&lt;br /&gt;&lt;p&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt;&lt;/span&gt;&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt;Greetings Powerlabbers,&lt;/span&gt;&lt;/p&gt;   &lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt; &lt;/span&gt;&lt;p&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt;Later today, Powerset is going to launch the first publicly available version of our &lt;a href="http://rs6.net/tn.jsp?e=001pcfB12LcSbeYQt9CQcU68pfq8K7kKJvMuzHCXfvyqrEBFevIe7nnFj5-pUGoxgXpWkAQEMK0qBrlExxxyF-Q0YUS4O2yhn6EhvsJwVWRms8fMJn9ohWE7Q==" target="_blank"&gt;product&lt;/a&gt;. Since you've been active in the Powerlabs community, we wanted to give you a special heads-up to look for our release. Your suggestions, help, feedback, bug reports, and conversation have helped us immensely in creating an innovative and useful product. We hope that you'll continue to be active in Powerlabs and make more great suggestions.&lt;/span&gt;&lt;/p&gt;   &lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt; &lt;/span&gt;&lt;p&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt;More information will be posted on Powerset's blog later today, so keep your eye out for updates. Also, consider following us on &lt;a href="http://rs6.net/tn.jsp?e=001pcfB12LcSbe-mvHLzuZo5-GEmjpExdXP99fKbFFHbKFRo4DVQ1Qf79W-b_79oHyaZocd77AEJr42Cp8R3OyaiW_5wrOJ3e-E9YTRMsfWmStR7r4qIqv0lA==" target="_blank"&gt;Twitter&lt;/a&gt; or becoming a fan of Powerset on &lt;a href="http://rs6.net/tn.jsp?e=001pcfB12LcSbceBX39LF9t2fwYd6pil6CshA9Q3LLZr_tcaLZSfYbAV52wcCOP_H7Gv-TVzKqYx6DsK--nK0v2xQHl9ccyivw-xuHEEeTJ26YiaAD3OlOwvjvZprTog2OqGSD-ph3A65prxP-1_olvG1jV7Uq90FU_1CQD3CgJJrc=" target="_blank"&gt;Facebook&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt;If you have a blog, we'd especially appreciate it if you'd write a blog post about your experience with this first Powerset product. Since you've been on the journey with us, your insight will be helpful in showing other people all of the amazing features in this release.&lt;/span&gt;&lt;/p&gt;   &lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt; &lt;/span&gt;&lt;p&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt;Again, we want to extend special thanks to you for sticking with us. We hope you feel almost as invested in this release as we are.&lt;/span&gt;&lt;/p&gt;   &lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt; &lt;/span&gt;&lt;p&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt;Thanks!&lt;/span&gt;&lt;/p&gt;   &lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt; &lt;/span&gt;&lt;p&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt;The Powerset Team&lt;/span&gt;&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;&lt;span style="color: rgb(0, 0, 0); font-family: Verdana,Geneva,Arial,Helvetica,sans-serif; font-size: 10pt;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:85%;color:#000000;"   &gt;&lt;/span&gt;&lt;/p&gt;&lt;br /&gt;As loyal readers know, I've posted &lt;a href="http://thenoisychannel.blogspot.com/2008/04/search-for-meaning.html"&gt;my impressions&lt;/a&gt; in the past. Now that the beta will be publicly available, I'm curious to hear impressions from you all.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8749060495512182938?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8749060495512182938/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8749060495512182938' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8749060495512182938'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8749060495512182938'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/powerset-public-launch-later-today.html' title='Powerset: Public Launch Later Today'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3955612864619169109</id><published>2008-05-10T15:14:00.005-04:00</published><updated>2008-06-12T11:05:20.600-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='exploratory search'/><title type='text'>Special Issues of Information Processing &amp; Management</title><content type='html'>My colleague &lt;a href="http://www.ecs.soton.ac.uk/people/mlw05r"&gt;Max Wilson&lt;/a&gt; at the &lt;a href="http://www.soton.ac.uk/"&gt;University of Southampton&lt;/a&gt; recently called my attention to a pair of special issues of Information Processing &amp;amp; Management. The first is on Evaluation of Interactive Information Retrieval Systems; the second is on Evaluating Exploratory Search Systems. Both are available online at &lt;a href="http://www.sciencedirect.com/science/journal/03064573"&gt;ScienceDirect&lt;/a&gt;. The interactive IR papers can be downloaded for free; the exploratory search papers are available for purchase to folks who don't have access through their institutions.&lt;br /&gt;&lt;br /&gt;I'm behind on my reading, but the titles look promising. Stay tuned!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3955612864619169109?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3955612864619169109/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3955612864619169109' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3955612864619169109'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3955612864619169109'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/special-issues-of-information.html' title='Special Issues of Information Processing &amp; Management'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-4892912880515042527</id><published>2008-05-09T00:18:00.007-04:00</published><updated>2008-05-10T11:32:23.017-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search'/><category scheme='http://www.blogger.com/atom/ns#' term='Business intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='Natural language processing'/><category scheme='http://www.blogger.com/atom/ns#' term='knowledge management'/><category scheme='http://www.blogger.com/atom/ns#' term='Information technology'/><title type='text'>A Harmonic Convergence</title><content type='html'>This week, &lt;a href="http://www.forrester.com/"&gt;Forrester&lt;/a&gt; released a report entitled &lt;a href="http://www.forrester.com/Research/Document/Excerpt/0,7211,45715,00.html"&gt;"Search + BI = Unified Information Access"&lt;/a&gt;. The authors assert the convergence of search and business intelligence, a case that Forrester has been developing &lt;a href="http://www.forrester.com/rb/search/results.jsp?SortType=Date&amp;amp;nb=1&amp;amp;Ntt=convergence&amp;amp;more=59504&amp;amp;Ntk=MainSearch&amp;amp;Ntx=mode+MatchAllPartial&amp;amp;dAg=10000&amp;amp;N=133001+51052+50166+50132"&gt;for quite some time&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The executive summary:&lt;br /&gt;&lt;blockquote&gt;Search and business intelligence (BI) really are two sides of the same coin. Enterprise search enables people to access unstructured content like documents, blog and wiki entries, and emails stored in repositories across their organizations. BI surfaces structured data in reports and dashboards. As both technologies mature, the boundary between them is beginning to blur. Search platforms are beginning to perform BI functions like data visualization and reporting, and BI vendors have begun to incorporate simple to use search experiences into their products. Information and knowledge management professionals should take advantage of this convergence, which will have the same effect from both sides: to give businesspeople better context and information for the decisions they make every day.&lt;/blockquote&gt;It's hard to find any fault here. In fact, the convergence of search and BI is a corollary to the fact that people (yes, businesspeople are people too) use these systems, and that the same people have no desire to distinguish between "structured" and "unstructured" content as they pursue their information needs.&lt;br /&gt;&lt;br /&gt;That said, I do have some quibbles with how the authors expect the convergence to play out. The authors make two assertions that I have a hard time accepting at face value:&lt;br /&gt;&lt;ul&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;People will be able to execute data queries via a search box using natural language.&lt;/li&gt;&lt;/ul&gt;Sure, but will they want to? Natural language is fraught with communication challenges, and I'm no more persuaded by natural language queries for BI than I am by &lt;a href="http://thenoisychannel.blogspot.com/2008/04/search-for-meaning.html"&gt;natural language queries for search&lt;/a&gt;.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Visual data representations will increase understanding of linkages among concepts.&lt;/li&gt;&lt;/ul&gt;We've all heard the cliché that a picture is worth a thousand words. I know this better than most, as &lt;a href="http://reports-archive.adm.cs.cmu.edu/anon/1998/abstracts/98-189.html"&gt;I earned my PhD by producing visual representations of networks&lt;/a&gt;. But I worry that people overestimate the value of these visualizations. Data visualization is simply a way to represent data analytics. I see more value in making analytics interactive (e.g., supporting and guiding incremental refinement) than in emphasizing visual representations.&lt;br /&gt;&lt;br /&gt;But I quibble. I strongly agree with most of their points, including:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;BI interfaces will encourage discovery of additional data dimensions.&lt;/li&gt;&lt;li&gt;BI and search tools will provide proactive suggestions.&lt;/li&gt;&lt;li&gt;BI and search will continue to borrow techniques from each other.&lt;/li&gt;&lt;/ul&gt;And it doesn't hurt that the authors express a very favorable view of &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt;. I can only hope they won't change their minds after reading this post!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-4892912880515042527?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/4892912880515042527/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=4892912880515042527' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4892912880515042527'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4892912880515042527'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/harmonic-convergence.html' title='A Harmonic Convergence'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-4618349140430204776</id><published>2008-05-08T11:44:00.003-04:00</published><updated>2008-05-10T11:32:43.585-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='social media'/><category scheme='http://www.blogger.com/atom/ns#' term='Privacy'/><title type='text'>This Conversation is Public</title><content type='html'>An interesting implication of blogging and other social media is that conversations once conducted privately have become public. The most common examples are conversations that take place through the comment areas for posts, rather than through private email.&lt;br /&gt;&lt;br /&gt;My initial reaction to this phenomenon was to bemoan the loss of boundaries. But, in keeping with &lt;a href="http://thenoisychannel.blogspot.com/search/label/Privacy"&gt;my recent musings about privacy&lt;/a&gt;, I increasingly see the virtues of public conversations. After all, a synonym for privacy, albeit with a somewhat different connotation, is secrecy. Near-antonyms include transparency and openness.&lt;br /&gt;&lt;br /&gt;I can't promise to always serve personally as an open, transparent information access provider. But I'll do so where possible. Here at The Noisy Channel, the conversation is public.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-4618349140430204776?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/4618349140430204776/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=4618349140430204776' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4618349140430204776'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/4618349140430204776'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/this-conversation-is-public.html' title='This Conversation is Public'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3104340140179242780</id><published>2008-05-07T17:03:00.005-04:00</published><updated>2008-05-10T11:33:02.548-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Information technology'/><title type='text'>Business, Technology, and Information</title><content type='html'>I was fortunate to attend the &lt;a href="http://www.infoworld.com/event/tristatecioforum/"&gt;Tri-State CIO Forum&lt;/a&gt; these last couple of days, and I thought I'd change the pace a bit by posting some reflections about it.&lt;br /&gt;&lt;br /&gt;In his keynote speech last night, George Colony, Chairman and CEO of &lt;a href="http://www.forrester.com/"&gt;Forrester Research&lt;/a&gt;, called on the business community to drop the name "information technology" (IT) in favor of "business technology" (BT). His reasoning, in a nutshell, was that such nomenclature would  reflect the centrality of technology's role for businesses.&lt;br /&gt;&lt;br /&gt;Following similar reasoning but reaching a different conclusion, Julia King, an Executive Editor for &lt;a href="http://www.computerworld.com/"&gt;Computerworld&lt;/a&gt; and one of of today's speakers, noted that IT titles are being "techno-scrubbed", and that there is a shift from managing technology to managing information.&lt;br /&gt;&lt;br /&gt;While I can't get excited about a naming debate, I do feel there's an important point overlooked in this discussion. Even though we've achieved consensus on the importance of technology, we need a sharper focus on information. It is a cliché that we live in an information age, but expertise about information is scarce. Information scientists struggle to influence technology development, and information theory is mostly confined to areas like cryptography and compression.&lt;br /&gt;&lt;br /&gt;We have no lack of information technology. Search engines, databases, and applications built on top of them are ubiquitous. But we still just learning how to work with information.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3104340140179242780?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3104340140179242780/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3104340140179242780' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3104340140179242780'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3104340140179242780'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/business-technology-and-information.html' title='Business, Technology, and Information'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-5810562282718133691</id><published>2008-05-05T20:42:00.007-04:00</published><updated>2008-05-10T11:33:23.083-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Tefko Saracevic'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Saracevic on Relevance and Interaction</title><content type='html'>There is no Nobel Prize in computer science, despite computer science having done more than any other discipline in the past fifty years to change the world. Instead, there is the &lt;a href="http://en.wikipedia.org/wiki/Turing_Award"&gt;Turing Award&lt;/a&gt;, which serves as a Nobel Prize of computing.&lt;br /&gt;&lt;br /&gt;But the Turing Award has never been given to anyone in information retrieval. Instead, there is the &lt;a href="http://en.wikipedia.org/wiki/Gerard_Salton_Award"&gt;Gerald Salton Award&lt;/a&gt;, which serves as a Turing Award of information retrieval. Its recipients represent an A-list of information retrieval researchers.&lt;br /&gt;&lt;br /&gt;Last week, I had the opportunity to talk with Salton Award recipient &lt;a href="http://www.scils.rutgers.edu/%7Etefko/"&gt;Tefko Saracevic&lt;/a&gt;. If you are not familiar with Saracevic, I suggest you take an hour to watch his &lt;a href="http://www.sis.utk.edu/lazerow2007"&gt;2007 lecture on "Relevance in information science"&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I won't try to capture an hour of conversation in a blog post, but here are a few highlights:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;We learn from philosophers, particularly &lt;a href="http://en.wikipedia.org/wiki/Alfred_Sch%C3%BCtz"&gt;Alfred Schütz&lt;/a&gt;, that we cannot reduce relevance to a single concept, but rather have to consider a system of interdependent relevancies, such as topical relevance, interpretational relevance, and motivational relevance.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;When we talk about relevance measures, such as precision and recall, we evaluate results from the perspective of a user. But information retrieval approaches necessarily take a systems perspective, making assumptions about what people will want and encoding those assumptions in models and algorithms.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A major challenge in the information retrieval is that users--particularly web search users--often formulate queries that are ineffective, particularly because they are too short. Studies have shown that &lt;a href="http://en.wikipedia.org/wiki/Reference_interview"&gt;reference interviews&lt;/a&gt; can lead to improved retrieval effectiveness (typically through longer, more informative queries). He said that automated systems could help too, but he wasn't aware of any that had achieved traction.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A variety of factors affect interactive information retrieval, including task context, intent, expertise. Moreover, people react to certain relevance clues more than others, and more within some populations than others.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;As I expected, I walked away with more questions than answers. But I did walk away reassured that my colleagues and I at &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt; , along with others in the &lt;a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval"&gt;HCIR&lt;/a&gt; community, are attacking the right problem: helping users formulate better queries.&lt;br /&gt;&lt;br /&gt;I'd like to close with an anecdote that Saracevic recounts in his 2007 lecture. &lt;a href="http://ciir.cs.umass.edu/personnel/croft.html"&gt;Bruce Croft&lt;/a&gt; had just delivered an information retrieval talk, and &lt;a href="http://www.scils.rutgers.edu/%7Ebelkin/belkin.html"&gt;Nick Belkin&lt;/a&gt; raised the objection that users need to be incorporated into the study. Croft's conversation-ending response: "Tell us what to do, and we will do it."&lt;br /&gt;&lt;br /&gt;We're halfway there. We've built interactive information retrieval systems, and we see from deployment after deployment that they work. Not that there isn't plenty of room for improvement, but the unmet challenge, &lt;a href="http://thenoisychannel.blogspot.com/2008/04/ellen-voorhees-defends-cranfield.html"&gt;as Ellen Voorhees makes clear&lt;/a&gt;, is evaluation. We need to address &lt;a href="http://thenoisychannel.blogspot.com/2008/04/nick-belkin-at-ecir-08.html"&gt;Nick Belkin's grand challenge&lt;/a&gt; and establish a paradigm suitable for evaluation of interactive IR systems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-5810562282718133691?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/5810562282718133691/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=5810562282718133691' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5810562282718133691'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/5810562282718133691'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/saracevic-on-relevance-and-interaction.html' title='Saracevic on Relevance and Interaction'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1996269419600412644</id><published>2008-05-02T14:37:00.005-04:00</published><updated>2008-06-11T22:56:45.353-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Endeca'/><category scheme='http://www.blogger.com/atom/ns#' term='ECIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Guided Summarization</title><content type='html'>I'm still waiting for the &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;ECIR&lt;/a&gt; organizers to post the slides from the &lt;a href="http://ecir2008.dcs.gla.ac.uk/industry.html"&gt;Industry Day&lt;/a&gt;. I particularly liked Nick Craswell's presentation on A Brief Tour of "Query Space". Until his slides are up, I recommend &lt;a href="http://research.microsoft.com/users/nickcr/pubs/craswell_sigir07.pdf"&gt;this SIGIR '07 paper&lt;/a&gt; to give you an idea of his approach.&lt;br /&gt;&lt;br /&gt;Slides are &lt;a href="http://www.cs.cmu.edu/%7Equixote/GuidedSummarization.pps"&gt;here&lt;/a&gt; as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.&lt;br /&gt;&lt;br /&gt;&lt;div style="width:425px;text-align:left" id="__ss_463030"&gt;&lt;object style="margin:0px" width="425" height="355"&gt;&lt;param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=guidedsummarization-1213239205653056-9"/&gt;&lt;param name="allowFullScreen" value="true"/&gt;&lt;param name="allowScriptAccess" value="always"/&gt;&lt;embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=guidedsummarization-1213239205653056-9" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;"&gt;&lt;a href="http://www.slideshare.net/?src=embed"&gt;&lt;img src="http://static.slideshare.net/swf/logo_embd.png" style="border:0px none;margin-bottom:-5px" alt="SlideShare"/&gt;&lt;/a&gt; | &lt;a href="http://www.slideshare.net/dtunkelang/guided-summarization?src=embed" title="View Guided Summarization on SlideShare"&gt;View&lt;/a&gt; | &lt;a href="http://www.slideshare.net/upload?src=embed"&gt;Upload your own&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1996269419600412644?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1996269419600412644/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1996269419600412644' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1996269419600412644'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1996269419600412644'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/guided-summarization.html' title='Guided Summarization'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-7660464317837493535</id><published>2008-05-02T01:45:00.003-04:00</published><updated>2008-05-02T01:53:49.880-04:00</updated><title type='text'>List of Findability Solutions</title><content type='html'>&lt;a href="http://www.delphigroup.com/about/people/dan_keldsen/"&gt;Dan Keldsen&lt;/a&gt; has posted a &lt;a href="http://www.biztechtalk.com/2008/04/final-list-of-f.html"&gt;list of findability-related solutions&lt;/a&gt; at &lt;a href="http://www.biztechtalk.com/"&gt;BizTechTalk&lt;/a&gt;.  The 80 or so solutions that he lists are certainly an attempt to err on the side of recall, by including search, taxonomies, interfaces, and visualization as aspects of findability. Definitely a useful resource for anyone interested in enterprise information access.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-7660464317837493535?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/7660464317837493535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=7660464317837493535' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7660464317837493535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/7660464317837493535'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/list-of-findability-solutions.html' title='List of Findability Solutions'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2784262847886202940</id><published>2008-05-01T20:39:00.002-04:00</published><updated>2008-05-10T11:34:00.894-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='LinkedIn'/><category scheme='http://www.blogger.com/atom/ns#' term='Privacy'/><title type='text'>Privacy through Difficulty</title><content type='html'>I had lunch today with &lt;a href="http://people.csail.mit.edu/harr/"&gt;Harr Chen&lt;/a&gt;, a graduate student at MIT, and we were talking about the consequences of information efficiency for privacy.&lt;br /&gt;&lt;br /&gt;A nice example is the &lt;a href="http://blog.linkedin.com/blog/2008/03/company-profile.html"&gt;company pages on LinkedIn&lt;/a&gt;. No company, to my knowledge, publishes statistics on:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;the schools their employees attended.&lt;/li&gt;&lt;li&gt;the companies where their employees previously worked.&lt;/li&gt;&lt;li&gt;the companies where their ex-employees work next.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;If a company maintains these statistics, it surely considers them to be sensitive and confidential. Nonetheless, by aggregating information from member profiles, LinkedIn computes best guesses at these statistics and makes them public.&lt;br /&gt;&lt;br /&gt;Arguably, information like this was never truly private, but was simply so difficult to aggregate that nobody bothered. As Harr aptly put it, they practiced "privacy through difficulty"--a privacy analog to &lt;a href="http://en.wikipedia.org/wiki/Security_through_obscurity"&gt;security through obscurity&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Some people are terrified by the increasing efficiency of the information market and look for legal remedies as a last ditch attempt to protect their privacy. I am inclined towards the other extreme (see &lt;a href="http://thenoisychannel.blogspot.com/2008/04/privacy-and-information-theory.html"&gt;my previous post on privacy and information theory&lt;/a&gt;): let's assume that information flow is efficient and confront the consequences honestly. Then we can have an informed conversation about information privacy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2784262847886202940?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2784262847886202940/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2784262847886202940' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2784262847886202940'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2784262847886202940'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/05/privacy-through-difficulty.html' title='Privacy through Difficulty'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1467936972770705865</id><published>2008-04-28T01:47:00.007-04:00</published><updated>2008-05-10T11:34:36.470-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='social navigation'/><category scheme='http://www.blogger.com/atom/ns#' term='faceted navigation'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='collaborative filtering'/><title type='text'>Social Navigation</title><content type='html'>There has bit a lot of &lt;a href="http://blogsearch.google.com/blogsearch?q=social-navigation"&gt;recent buzz about social navigation&lt;/a&gt;, including some debate about what the phrase means. I dug into the archives and found a paper from the CHI '94 Conference on Human Factors in Computing Systems entitled &lt;a href="http://www.dcs.gla.ac.uk/%7Ematthew/papers/hci94.pdf"&gt;"Running Out of Space: Models of Information Navigation"&lt;/a&gt;. In it, Paul Dourish and Matthew Chalmers distinguish between semantic navigation and social navigation:&lt;br /&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;blockquote&gt;[semantic navigation offers] the ability to explore and choose perspectives of view based on knowledge of the semantically-structured information.&lt;br /&gt;...&lt;br /&gt;In social navigation, movement from one item to another is provoked as an artifact of the activity of another or a group of others.&lt;/blockquote&gt;Back in 1994, the Web was only starting to reach a broad audience. The authors cite two examples of social navigation: personal home pages, where people listed sites they found interesting, and &lt;a href="http://en.wikipedia.org/wiki/Collaborative_filtering"&gt;collaborative filtering&lt;/a&gt; (specifically, the &lt;a href="http://www.ischool.utexas.edu/%7Ei385d/readings/Goldberg_UsingCollaborative_92.pdf"&gt;Information Tapestry&lt;/a&gt; project at Xerox PARC).&lt;br /&gt;&lt;br /&gt;Today, a decade and a half later, the web has scaled by several orders of magnitude, search engines have largely obviated the listing of interesting sites on personal home pages, and collaborative filtering, while still going strong as a social influence on user experience, hardly feels like navigation. It does seem that the term "social navigation" deserves an update.&lt;br /&gt;&lt;br /&gt;Following Dourish and Chalmers, let us define social navigation as the ability to explore and choose perspectives of view based on social information. Importantly, social navigation is user-controlled navigation just like semantic navigation--only that the user is navigation by changing the social lens on the information rather than specifying semantic constraints.&lt;br /&gt;&lt;br /&gt;One example of social navigation is the ratings information at the Internet Movie Database (&lt;a href="http://www.imdb.com/" title="Internet Movie Database" rel="homepage" target="_blank" class="zem_slink"&gt;IMDB&lt;/a&gt;). For example, we can see from the &lt;a href="http://www.imdb.com/title/tt0337978/ratings"&gt;ratings for &lt;span style="font-style: italic;"&gt;Live Free or Die Hard&lt;/span&gt;&lt;/a&gt; that the movie appealed most to males under 18.&lt;br /&gt;&lt;br /&gt;Fandango (an &lt;a href="http://www.endeca.com/" title="Endeca Technologies Inc." rel="homepage" target="_blank" class="zem_slink"&gt;Endeca&lt;/a&gt; customer) takes this concept a step further, offering users &lt;a href="http://www.fandango.com/livefreeordiehard_2681/readuserreviews"&gt;faceted navigation of the space of movie reviews&lt;/a&gt;, where facets include age, gender, whether or not the reviewer has children, and whether the reviewer lives near the user.&lt;br /&gt;&lt;br /&gt;More sophisticated interfaces will intermingle semantic and social navigation. Here is a screen shot from a prototype some of my colleagues put together and demonstrated at &lt;a href="http://projects.csail.mit.edu/hcir/web/"&gt;HCIR '07&lt;/a&gt;:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Y0SVT3VxV1E/SBVnUsqB-8I/AAAAAAAAAAo/Jt2tAsYZkCs/s1600-h/socialNavigation.jpg"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_Y0SVT3VxV1E/SBVnUsqB-8I/AAAAAAAAAAo/Jt2tAsYZkCs/s400/socialNavigation.jpg" alt="" id="BLOGGER_PHOTO_ID_5194171350524230594" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;Social navigation, defined as above, offers users more than just the ability to be influenced by other people. It offers users transparency and control over the social lens. It allows us to think outside the black box.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1467936972770705865?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1467936972770705865/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1467936972770705865' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1467936972770705865'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1467936972770705865'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/social-navigation.html' title='Social Navigation'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_Y0SVT3VxV1E/SBVnUsqB-8I/AAAAAAAAAAo/Jt2tAsYZkCs/s72-c/socialNavigation.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-374657349144582995</id><published>2008-04-27T00:00:00.004-04:00</published><updated>2008-05-09T17:47:25.905-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Gian-Carlo Rota'/><title type='text'>Happy Rota Day!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.monadas.net/rota/imagen/rota.10.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 200px;" src="http://www.monadas.net/rota/imagen/rota.10.gif" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Since this is a personal blog, I'd like to go a bit off-topic and take recognize my late mentor &lt;a href="http://en.wikipedia.org/wiki/Gian-Carlo_Rota"&gt;Gian-Carlo Rota&lt;/a&gt;, whose birthday is today. While I and countless others recall Gian-Carlo most fondly as a mentor and teacher, his crowning achievement was to make &lt;a href="http://en.wikipedia.org/wiki/Combinatorics"&gt;combinatorics&lt;/a&gt; a respectable branch of modern mathematics. Indeed, combinatorics and probability theory have been instrumental to the progress of information retrieval and information science.&lt;br /&gt;&lt;br /&gt;And this nugget of his &lt;a href="http://www.math.tamu.edu/%7Ecyan/Rota/tenlesses.pdf"&gt;advice about lecturing&lt;/a&gt; seems remarkably appropriate in the context of how information retrieval engines should work:&lt;br /&gt;&lt;blockquote&gt;Every lecture should state one main point and repeat it over and over, like a theme with variations. An audience is like a herd of cows, moving slowly in the direction they are being driven towards. If we make one point, we have a good chance that the audience will take the right direction; if we make several points, then the cows will scatter all over the field. The audience will lose interest and everyone will go back to the thoughts they interrupted in order to come to our lecture.&lt;/blockquote&gt;Happy Birthday, Gian-Carlo.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-374657349144582995?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/374657349144582995/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=374657349144582995' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/374657349144582995'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/374657349144582995'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/happy-rota-day.html' title='Happy Rota Day!'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2101664098337264095</id><published>2008-04-25T00:05:00.005-04:00</published><updated>2008-05-10T11:35:06.921-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Database'/><category scheme='http://www.blogger.com/atom/ns#' term='Dagstuhl'/><category scheme='http://www.blogger.com/atom/ns#' term='XML'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Workshop on Ranked XML Querying</title><content type='html'>Thanks to &lt;a href="http://behind-the-enemy-lines.blogspot.com/"&gt;an excellent blog&lt;/a&gt; written by &lt;a href="http://pages.stern.nyu.edu/%7Epanos/"&gt;Panos Ipeirotis&lt;/a&gt; at the NYU Stern School, I learned about a workshop held last month in &lt;a href="http://en.wikipedia.org/wiki/Dagstuhl"&gt;Dagstuhl&lt;/a&gt; on &lt;a href="http://kathrin.dagstuhl.de/08111/"&gt;ranked XML querying&lt;/a&gt;. Most of the presentations are available online, including one entitled &lt;a href="http://kathrin.dagstuhl.de/files/Materials/08/08111/08111.WeikumGerhard.Slides.ppt"&gt;DB &amp;amp; IR from a DB Viewpoint&lt;/a&gt; by &lt;a href="http://www.mpi-inf.mpg.de/%7Eweikum/"&gt;Gerhard Weikum&lt;/a&gt; at the Max Planck Institut für Informatik. I'm excited to see these efforts to unify the DB and IR perspectives. So much more productive than the &lt;a href="http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html"&gt;infamous MapReduce debate&lt;/a&gt;!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2101664098337264095?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2101664098337264095/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2101664098337264095' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2101664098337264095'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2101664098337264095'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/thanks-to-excellent-blog-written-by.html' title='Workshop on Ranked XML Querying'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-415340069570697719</id><published>2008-04-24T16:48:00.007-04:00</published><updated>2008-05-10T11:35:22.143-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Usability'/><category scheme='http://www.blogger.com/atom/ns#' term='Database Usability'/><category scheme='http://www.blogger.com/atom/ns#' term='Jeff Naughton'/><category scheme='http://www.blogger.com/atom/ns#' term='Database'/><category scheme='http://www.blogger.com/atom/ns#' term='H. V. Jagadish'/><title type='text'>Database Usability</title><content type='html'>Just as I was digesting &lt;a href="http://thenoisychannel.blogspot.com/2008/04/north-east-db-ir-day.html"&gt;Jeff Naughton's presentation at DB/IR day&lt;/a&gt;, a colleague at &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt; emailed me the keynote that &lt;a href="http://www.eecs.umich.edu/%7Ejag"&gt;H. V. Jagadish&lt;/a&gt; (University of Michigan) presented at &lt;a href="http://sigmod07.riit.tsinghua.edu.cn/"&gt;SIGMOD '07&lt;/a&gt; on &lt;a href="http://www.eecs.umich.edu/db/usable/usability.pdf"&gt;making database systems usable&lt;/a&gt;. He enumerates the familiar pain points of today's database systems: confusing schemas, too many choices to make, unexpected--and unexplained--system behavior, and too high a cost for initial creation. He proposes "systems that reflect the user's model of the data, rather than forcing the data to fit a particular model."&lt;br /&gt;&lt;br /&gt;As with Jeff's presentation, the main take-away here is a framework (though both he and Jeff have taken initial steps to address the problems they describe). As a practitioner, I'm most encouraged by the fact that database researchers, like information retrieval researchers, are increasingly recognizing the importance of users.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-415340069570697719?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/415340069570697719/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=415340069570697719' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/415340069570697719'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/415340069570697719'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/database-usability.html' title='Database Usability'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2025457982005934065</id><published>2008-04-23T14:20:00.006-04:00</published><updated>2008-05-10T11:35:41.486-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Collaborative tagging'/><category scheme='http://www.blogger.com/atom/ns#' term='Yahoo Research'/><category scheme='http://www.blogger.com/atom/ns#' term='social media'/><category scheme='http://www.blogger.com/atom/ns#' term='Knowledge representation'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='PARC'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Theory'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>The Efficiency of Social Tagging</title><content type='html'>Credit to &lt;a href="http://ssli.ee.washington.edu/people/duh/"&gt;Kevin Duh&lt;/a&gt; by way of the &lt;a href="http://nlpers.blogspot.com/"&gt;natural language processing blog&lt;/a&gt; for highlighting recent work from &lt;a href="http://www.parc.com/"&gt;PARC&lt;/a&gt; on &lt;a href="http://www-users.cs.umn.edu/%7Eechi/papers/2008-ICWSM/2008-03-tagging-encoding-ICWSM.pdf"&gt;understanding the efficiency of social tagging systems using information theory&lt;/a&gt;. The authors apply information theory to establish a framework for measuring the efficiency social tagging systems, and then empirically observe that the efficiency of tagging on &lt;a href="http://del.icio.us/"&gt;del.icio.us&lt;/a&gt; has been decreasing over time. They conclude by suggesting that current tagging interfaces may be at fault, through a positive feedback process of encouraging popular tags.&lt;br /&gt;&lt;br /&gt;After seeing this and the &lt;a href="http://tagmaps.research.yahoo.com/"&gt;TagMaps&lt;/a&gt; work at &lt;a href="http://www.yahooresearchberkeley.com/"&gt;Yahoo Research Berkeley&lt;/a&gt;, I feel that the &lt;a href="http://en.wikipedia.org/wiki/Information_retrieval"&gt;IR&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Human-computer_interaction"&gt;HCI&lt;/a&gt; communities should join forces to understand social tagging in general terms that relate information, knowledge representation, and human beings. These concerns are hardly specific to the web or to what is now called &lt;a href="http://en.wikipedia.org/wiki/Social_media"&gt;"social media"&lt;/a&gt;--after all, media is social by definition. Indeed, there is no reason to confine this approach to human-tagged collections--why not consider automated tagging systems on the same playing field?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2025457982005934065?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2025457982005934065/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2025457982005934065' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2025457982005934065'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2025457982005934065'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/efficiency-of-social-tagging.html' title='The Efficiency of Social Tagging'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-9064650036450250747</id><published>2008-04-22T12:59:00.005-04:00</published><updated>2008-05-10T11:35:59.011-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='University of Glasgow'/><category scheme='http://www.blogger.com/atom/ns#' term='Leif Azzopardi'/><category scheme='http://www.blogger.com/atom/ns#' term='ECIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Accessibility'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Accessibility in Information Retrieval</title><content type='html'>The other day, I was talking with &lt;a href="http://ir.dcs.gla.ac.uk/%7Eleif/"&gt;Leif Azzopardi&lt;/a&gt; at the &lt;a href="http://www.gla.ac.uk/"&gt;University of Glasgow&lt;/a&gt; about &lt;a href="http://www.dcs.gla.ac.uk/publications/paperdetails.cfm?id=8790"&gt;accessibility in information retrieval&lt;/a&gt;. Accessibility is a &lt;a href="http://en.wikipedia.org/wiki/Accessibility#Transportation"&gt;concept borrowed from land use and transportation planning&lt;/a&gt;: it measures the cost that people are willing to incur to reach opportunities (e.g., shopping, restaurants), weighted by the desirability of those opportunities.&lt;br /&gt;&lt;br /&gt;What does accessibility mean in the context of information retrieval?&lt;br /&gt;&lt;blockquote&gt;Instead of an actual physical space, in IR, we are predominately concerned with accessing information within a collection of documents (i.e., information space), and instead of a transportation system, we have an Information Access System (i.e., a means by which we can access the information in the collection, like a query mechanism, a browsing mechanism, etc). The accessibility of a document is indicative of the likelihood or opportunity of it being retrieved by the user in this information space given such a mechanism.&lt;/blockquote&gt;It's a very appealing way to measure the effectiveness with which the an information retrieval system exposes a document collection--as well as the bias the system imposes. While the paper offers more questions than answers, I recommend to anyone who is interested in thinking outside the box of the &lt;a href="http://en.wikipedia.org/wiki/Information_retrieval#Performance_measures"&gt;traditional IR performance measures&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-9064650036450250747?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/9064650036450250747/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=9064650036450250747' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/9064650036450250747'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/9064650036450250747'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/accessibility-in-information-retrieval.html' title='Accessibility in Information Retrieval'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-6914586377699059403</id><published>2008-04-20T12:08:00.004-04:00</published><updated>2008-05-10T11:36:50.052-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Database Usability'/><category scheme='http://www.blogger.com/atom/ns#' term='Jeff Naughton'/><category scheme='http://www.blogger.com/atom/ns#' term='Columbia University'/><category scheme='http://www.blogger.com/atom/ns#' term='Database'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>North East DB / IR Day</title><content type='html'>Last Friday, I had the privilege to attend the &lt;a href="http://dbirday.cs.columbia.edu/spring08/"&gt;Spring 2008 North East DB/IR Day&lt;/a&gt;, hosted by &lt;a href="http://www.columbia.edu/"&gt;Columbia University&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;The North East DB/IR Day brings together database and information retrieval researchers and students from both academic and research institutions in the Northeastern United States. The DB/IR Day is a semi-annual workshop that features an exciting technical program as well as informal discussion. The DB/IR Day provides a regular forum for presenting diverse viewpoints on database systems and information retrieval, addressing current topics as well as promoting information exchange among researchers.&lt;/blockquote&gt;The event lived up to its promise, and I was impressed with the quality of student posters. But my favorite part of the event was the keynote by &lt;a href="http://pages.cs.wisc.edu/%7Enaughton/"&gt;Jeff Naughton&lt;/a&gt; entitled "&lt;a href="http://dbirday.cs.columbia.edu/spring08/keynotes.php#keynote3"&gt;Extracting Problems for Database and IR Researchers&lt;/a&gt;."&lt;br /&gt;&lt;br /&gt;Jeff characterized the traditional philosophy of the database community as guaranteeing perfect outputs is the inputs are perfect. He argues that what we need more of today are databases that expect imperfection, and try to help.&lt;br /&gt;&lt;br /&gt;To summarize his talk:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Provide support for "learn schema as you go."&lt;/li&gt;&lt;li&gt;Develop techniques to explain inconsistency and let users reason about it.&lt;/li&gt;&lt;li&gt;Expect errors, provide tools for users to understand/debug them.&lt;/li&gt;&lt;li&gt;View task as helping user discover what they want in large space of potential queries.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;It is encouraging to see such a prominent database researcher advocating this vision, especially since it aligns so well with the &lt;a href="http://endeca.com/technology/"&gt;technology we are developing at Endeca&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-6914586377699059403?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/6914586377699059403/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=6914586377699059403' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6914586377699059403'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/6914586377699059403'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/north-east-db-ir-day.html' title='North East DB / IR Day'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8604430208197686066</id><published>2008-04-18T01:22:00.006-04:00</published><updated>2008-05-10T11:37:10.143-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Powerset'/><category scheme='http://www.blogger.com/atom/ns#' term='HCIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Natural language processing'/><category scheme='http://www.blogger.com/atom/ns#' term='hakia'/><title type='text'>The Search for Meaning</title><content type='html'>By a fortuitous coincidence, I had the opportunity to see two consecutive presentations from search engine companies banking on natural language processing (NLP)&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt; to power the next generation of search. The first was from &lt;a href="http://www.powerset.com/press/kaplan"&gt;Ron Kaplan&lt;/a&gt;, Chief Technology and Science Officer of &lt;a href="http://www.powerset.com/"&gt;Powerset&lt;/a&gt;, who presented at &lt;a href="http://www.columbia.edu/"&gt;Columbia University&lt;/a&gt;. The second was from &lt;a href="http://homepage.mac.com/hempelma/"&gt;Christian Hempelmann&lt;/a&gt;, Chief Scientific Officer of &lt;a href="http://www.hakia.com/"&gt;hakia&lt;/a&gt;, who presented at &lt;a href="http://semweb.meetup.com/25/"&gt;New York Semantic Web Meetup&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.powerset.com/"&gt;Powerset&lt;/a&gt; talk was entitled "Deep natural language processing for web-scale indexing and retrieval." &lt;a href="http://www.cs.cmu.edu/%7Ejelsas/"&gt;Jon Elsas&lt;/a&gt;, who attended the &lt;a href="http://www.lti.cs.cmu.edu/Seminars/abstract-07-08.htm#RonaldK"&gt;same talk&lt;/a&gt; earlier this week at &lt;a href="http://www.cmu.edu/"&gt;CMU&lt;/a&gt;, did an excellent job &lt;a href="http://windowoffice.tumblr.com/post/31883488"&gt;summarizing it on his blog&lt;/a&gt;. I'll simply express my reaction: I don't get it. I have no reason to doubt that their NLP pipeline is best-in-class. The team has impressive credentials. But I see no evidence that they have produced better results than keyword search. After participating in their &lt;a href="https://labs.powerset.com/"&gt;private beta&lt;/a&gt; for several months, I'd hoped that the presentation would help me see what I'd missed. I specifically asked Ron what measures they used to evaluate their system, and he was mum. So now I am more unconvinced that ever, though, to steal a line from a colleague, I cannot reconcile their enthusiasm with their results.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://hakia.com/"&gt;hakia&lt;/a&gt; talk was entitled "Search for Meaning." Christian started by making the case for a semantic, rather than statistical approach to NLP. He then presented &lt;a href="http://company.hakia.com/technology.html"&gt;hakia's technology&lt;/a&gt; in a fair amount of detail, including walking through examples of worse sense disambiguation using context. I'm not convinced that semantics trump statistics, but I thoroughly enjoyed the presentation, and was intrigued enough to want to learn more. I find the company refreshingly open about its technology (not to mention that &lt;a href="http://hakia.com/"&gt;their beta&lt;/a&gt; is public), and I hope it works well enough to be practical.&lt;br /&gt;&lt;br /&gt;Still, I'm not convinced the NLP is either the right answer or the right question. I'm no expert on the history of language, but it's clear that natural languages are hardly optimal means of communication, even among human beings. Rather, they are artifacts of our &lt;a href="http://en.wikipedia.org/wiki/Satisficing"&gt;satisficing&lt;/a&gt; and resisting change. Since we are lucky enough to not have developed expectations that people can communicate with computers using natural language (&lt;a href="http://en.wikipedia.org/wiki/HAL_9000"&gt;HAL&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Star_Trek"&gt;Star Trek&lt;/a&gt; notwithstanding), why take a step backwards now? Rather than advocating for inefficient, unreliable communication mechanisms like natural language, we should be figuring out ways to make communication more efficient.&lt;br /&gt;&lt;br /&gt;To use an analogy, there's a reason that programming languages have strict rules, and that compilers output errors rather than just trying to guess what you mean. The mild inconvenience upstream is a small cost, compared to the downstream benefits of unambiguous communication. I'm not suggesting that people start speaking in &lt;a href="http://en.wikipedia.org/wiki/Formal_language"&gt;formal languages&lt;/a&gt;. But I do feel we should strive for a dialog-oriented approach where both the human and the computer have confidence in their mutual understanding. I can't resist a plug for &lt;a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval"&gt;HCIR&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8604430208197686066?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8604430208197686066/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8604430208197686066' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8604430208197686066'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8604430208197686066'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/search-for-meaning.html' title='The Search for Meaning'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-3103863650841399273</id><published>2008-04-17T08:46:00.005-04:00</published><updated>2008-05-09T17:33:55.219-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cranfield'/><category scheme='http://www.blogger.com/atom/ns#' term='TREC'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Nick Belkin'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Ellen Voorhees'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Ellen Voorhees defends Cranfield</title><content type='html'>I was extremely flattered to receive an email from Ellen Voorhees responding to my post about Nick Belkin's keynote. Then I was a little bit scared, since she is a strong advocate of the Cranfield tradition, and I braced myself for her rebuttal.&lt;br /&gt;&lt;br /&gt;She pointed me to a &lt;a href="http://www.dcs.gla.ac.uk/workshops/air/slides/EllenVoorhees-TestCollectionsforAIR.pdf"&gt;talk&lt;/a&gt; she gave at the &lt;a href="http://www.dcs.gla.ac.uk/workshops/air/"&gt;First International Workshop on Adaptive Information Retrieval (AIR) in 2006&lt;/a&gt;. I'd paraphrase her argument as follows: Nick and others (including me) are right to push for a paradigm that supports AIR research, but are being naïve regarding what is necessary for such research to deliver effective--and cost-effective--results. It's a strong case, and I'd be the first to concede that the advocates for AIR research have not (at least to my knowledge) produced a plausible abstract task that is amenable to efficient evaluation.&lt;br /&gt;&lt;br /&gt;To quote Nick again, it's a grand challenge. And Ellen makes it clear that what we've learned so far is not encouraging.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-3103863650841399273?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/3103863650841399273/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=3103863650841399273' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3103863650841399273'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/3103863650841399273'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/ellen-voorhees-defends-cranfield.html' title='Ellen Voorhees defends Cranfield'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-2627858331952523378</id><published>2008-04-15T23:45:00.005-04:00</published><updated>2008-05-10T11:37:31.064-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Information Theory'/><category scheme='http://www.blogger.com/atom/ns#' term='Privacy'/><title type='text'>Privacy and Information Theory</title><content type='html'>Privacy is a evergreen topic in technology discussions, and increasingly finds its way into the mainstream (cf. &lt;a href="http://en.wikipedia.org/wiki/AOL_search_data_scandal"&gt;AOL&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/NSA_warrantless_surveillance_controversy"&gt;NSA&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Facebook_Beacon"&gt;Facebook&lt;/a&gt;). My impression is that most people feel that companies and government agencies are amassing their "private" data to some nefarious end.&lt;br /&gt;&lt;br /&gt;Let's forget about technology for a moment and subject the notion of privacy to basic examination. If I truly want to keep a secret, I don't tell anyone. If I want to share information with you but no one else, I only disclose the information under the proviso of a social or legal contract of non-disclosure.&lt;br /&gt;&lt;br /&gt;But there's a major catch here: you--or I--may disclose the information involuntarily by our actions. The various establishments I frequent know my favorite foods, drinks, and even karaoke songs. More subtly, if I tell you in confidence that I don't like or trust someone, that information is likely to visibly affect your interaction with that person. Moreover, someone who knows that we are friends might even suspect me as the cause for your change in behavior.&lt;br /&gt;&lt;br /&gt;What does this have to do with privacy of information? Everything! The mainstream debates treat information privacy as binary. Even when people discuss gradations of privacy, they tend to think in terms of each particular disclosure (e.g., age, favorite flavor of ice cream) as binary. But, if we take an information-theoretic look at disclosure, we immediately see that this binary view of disclosure is illusory.&lt;br /&gt;&lt;br /&gt;For example, if you know I work for a software company and live in New York City, you know more about my gender, education, and salary than if you only know that I live in the United States. We can quantify this information gain in bits of conditional entropy.&lt;br /&gt;&lt;br /&gt;Information theory provides a unifying framework for thinking about privacy. We can answer questions like "if I disclose that I like bagels and smoked salmon, to what extent to I disclose that I live in New York?" Or to what extent does an anonymized search log identify me personally.&lt;br /&gt;&lt;br /&gt;If we can take this framework and make it consumable to non-information theorists, perhaps we can improve the quality of the privacy debate.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-2627858331952523378?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/2627858331952523378/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=2627858331952523378' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2627858331952523378'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/2627858331952523378'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/privacy-and-information-theory.html' title='Privacy and Information Theory'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-8384141404493901184</id><published>2008-04-12T09:32:00.004-04:00</published><updated>2008-05-10T11:37:48.801-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Enterprise Search'/><title type='text'>Can Search be a Utility?</title><content type='html'>A recent lecture at the New York CTO club inspired a heated discussion on what is wrong with enterprise search solutions. Specifically, &lt;a href="http://weblog.infoworld.com/ny-cto/"&gt;Jon Williams&lt;/a&gt; asked &lt;a href="http://weblog.infoworld.com/ny-cto/archives/2008/03/search_as_a_uti.html"&gt;why search can't be a utility&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.&lt;br /&gt;&lt;br /&gt;On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as &lt;a href="http://clusty.com/"&gt;Clusty&lt;/a&gt;, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.&lt;br /&gt;&lt;br /&gt;Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).&lt;br /&gt;&lt;br /&gt;While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.&lt;br /&gt;&lt;br /&gt;On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.&lt;br /&gt;&lt;br /&gt;In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.&lt;br /&gt;&lt;br /&gt;It seems we can go in two directions.&lt;br /&gt;&lt;br /&gt;The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.&lt;br /&gt;&lt;br /&gt;The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-8384141404493901184?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/8384141404493901184/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=8384141404493901184' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8384141404493901184'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/8384141404493901184'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/can-search-be-utility.html' title='Can Search be a Utility?'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-9213402729915578589</id><published>2008-04-10T08:35:00.006-04:00</published><updated>2008-05-09T17:27:14.326-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TREC'/><category scheme='http://www.blogger.com/atom/ns#' term='ECIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Multiple-Query Sessions</title><content type='html'>As &lt;a href="http://www.scils.rutgers.edu/%7Ebelkin/belkin.html"&gt;Nick Belkin&lt;/a&gt; pointed out in his recent &lt;a href="http://thenoisychannel.com/2008/04/nick-belkin-at-ecir-08.html"&gt;ECIR 2008 keynote&lt;/a&gt;, a grand challenge for the IR community is to figure out how to bring the user into the evaluation process. A key aspect of this challenge is rethinking system evaluation in terms of sessions rather than queries.&lt;br /&gt;&lt;br /&gt;Some recent work in the IR community is very encouraging:&lt;br /&gt;&lt;br /&gt;- Work by &lt;a href="http://research.microsoft.com/%7Eryenw/"&gt;Ryen White&lt;/a&gt; and colleagues at Microsoft Research that mines session data to guide users to popular web destinations. &lt;a href="http://research.microsoft.com/%7Eryenw/papers/WhiteSIGIR2007a.pdf"&gt;Their paper&lt;/a&gt; was awarded Best Paper at &lt;a href="http://www.sigir2007.org/"&gt;SIGIR 2007&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;- Work by &lt;a href="http://research.microsoft.com/%7Enickcr/"&gt;Nick Craswell&lt;/a&gt; and &lt;a href="http://research.microsoft.com/%7Eszummer/"&gt;Martin Szummer&lt;/a&gt; (also at Microsoft Research, and also presented at SIGIR 2007) that performs &lt;a href="http://research.microsoft.com/users/nickcr/pubs/craswell_sigir07.pdf"&gt;random walks on the click graph&lt;/a&gt; to use click data effectively as evidence to improve relevance ranking for image search on the web.&lt;br /&gt;&lt;br /&gt;- Work by &lt;a href="http://www.uta.fi/%7Elikaja/"&gt;Kalervo Järvelin&lt;/a&gt; (at the University of Tampere in Finland) and colleagues on &lt;a href="http://www.info.uta.fi/tutkimus/fire/archive/2008/sDCG-ECIR%2708.pdf"&gt;discounted cumulated gain based evaluation of multiple-query IR sessions&lt;/a&gt; that was awarded Best Paper at &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;ECIR 2008&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This recent work--and the prominence it has received in the IR community--is refreshing, especially in light of the relative lack of academic work on interactive IR and the demise of the short-lived &lt;a href="http://trec.nist.gov/data/interactive.html"&gt;TREC interactive track&lt;/a&gt;. They are first steps, but hopefully IR researchers and practitioners will pick up on them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-9213402729915578589?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/9213402729915578589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=9213402729915578589' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/9213402729915578589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/9213402729915578589'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/multiple-query-sessions.html' title='Multiple-Query Sessions'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1951816253170822526</id><published>2008-04-08T14:38:00.006-04:00</published><updated>2008-05-10T11:38:06.912-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ECIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='Amit Singhal'/><category scheme='http://www.blogger.com/atom/ns#' term='Relevance'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Q&amp;A with Amit Singhal</title><content type='html'>&lt;a href="http://singhal.info/"&gt;Amit Singhal&lt;/a&gt;, who is head of search quality at Google, gave a very entertaining keynote at &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;ECIR '08&lt;/a&gt; that focused on the &lt;a href="http://en.wikipedia.org/wiki/Adversarial_IR"&gt;adversarial aspects of Web IR&lt;/a&gt;. Specifically, he discussed some of the techniques used in the arms race to game Google's ranking algorithms. &lt;a href="http://www.sigir2007.org/news/20080401singhal.html"&gt;Perhaps he revealed more than he intended!&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;During the question and answer session, I reminded Amit of the admonition against &lt;a href="http://en.wikipedia.org/wiki/Security_through_obscurity"&gt;security through obscurity&lt;/a&gt; that is well accepted in the security and cryptography communities. I questioned whether his team is pursuing the wrong strategy by failing to respect this maxim. Amit replied that a relevance analog to &lt;a href="http://en.wikipedia.org/wiki/Security_by_design"&gt;security by design&lt;/a&gt; was an interesting challenge (which he delegated to the audience), but he appealed to the subjectivity of relevance as a reason for it being harder to make relevance as transparent as security.&lt;br /&gt;&lt;br /&gt;While I accept the difficulty of this challenge, I reject the suggestion that subjectivity makes it harder. To being with, Google and other web search engines rank results objectively, rather than based on user-specific considerations. Furthermore, the subjectivity of relevance should make the adversarial problem easier rather than harder, &lt;a href="http://www.computerworld.com/securitytopics/security/story/0,10801,87470,00.html"&gt;as has been observed in the security industry&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;But the challenge is indeed a daunting one. Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?&lt;br /&gt;&lt;br /&gt;At &lt;a href="http://endeca.com/"&gt;Endeca&lt;/a&gt;, we emphasize the transparency of our engine as a core value of our offering to enterprises. Granted, our clients generally do not have an adversarial relationship with their data. Still, I am convinced that the same approach not only can work on the web, but will be the only way to end the arms race between spammers and &lt;a href="http://www.nytimes.com/2007/06/03/business/yourmoney/03google.html"&gt;Amit's army of tweakers&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1951816253170822526?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1951816253170822526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1951816253170822526' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1951816253170822526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1951816253170822526'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/q-with-amit-singhal.html' title='Q&amp;A with Amit Singhal'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8016696494330504473.post-1133456205347157697</id><published>2008-04-06T08:44:00.006-04:00</published><updated>2008-05-10T11:38:30.911-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cranfield'/><category scheme='http://www.blogger.com/atom/ns#' term='TREC'/><category scheme='http://www.blogger.com/atom/ns#' term='ECIR'/><category scheme='http://www.blogger.com/atom/ns#' term='Evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='Nick Belkin'/><category scheme='http://www.blogger.com/atom/ns#' term='Library and Information Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Tefko Saracevic'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Nick Belkin at ECIR '08</title><content type='html'>Last week, I had the pleasure to attend the 30th &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;European Conference on Information Retrieval&lt;/a&gt;, chaired by &lt;a href="http://www.dcs.gla.ac.uk/%7Eounis/"&gt;Iadh Ounis&lt;/a&gt; at the &lt;a href="http://www.gla.ac.uk/"&gt;University of Glasgow&lt;/a&gt;. The conference was outstanding in several respects, not least of which was a keynote address by &lt;a href="http://www.scils.rutgers.edu/%7Ebelkin/belkin.html"&gt;Nick Belkin&lt;/a&gt;, one the world's leading researchers on interactive information retrieval.&lt;br /&gt;&lt;br /&gt;Nick's keynote, entitled "Some(what) Grand Challenges for Information Retrieval," was a full frontal attack on the &lt;a href="http://www.asis.org/Bulletin/Oct-05/voorhees.html"&gt;Cranfield evaluation paradigm&lt;/a&gt; that has dominated IR research for the past half century. I am hoping to see his keynote published and posted online, but in the meantime here is a choice excerpt:&lt;br /&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;blockquote&gt;in accepting the [Gerald Salton] award at the 1997 SIGIR meeting, Tefko Saracevic stressed the significance of integrating research in information seeking behavior with research in IR system models and algorithms, saying: "if we consider that unlike art IR is not there for its own sake, that is, IR systems are researched and built to be used, then IR is far, far more than a branch of computer science, concerned primarily with issues of algorithms, computers, and computing."&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;Nevertheless, we can still see the dominance of the TREC (i.e. Cranfield) evaluation paradigm in most IR research, the inability of this paradigm to accommodate study of people in interaction with information systems (cf. the death of the TREC Interactive Track), and a dearth of research which integrates study of users’ goals, tasks and behaviors with research on models and methods which respond to results of such studies and supports those goals, tasks and behaviors.&lt;br /&gt;&lt;br /&gt;This situation is especially striking for several reasons. First, it is clearly the case that IR as practiced is inherently interactive; secondly, it is clearly the case that the new models and associated representation and ranking techniques lead to only incremental (if that) improvement in performance over previous models and techniques, which is generally not statistically significant; and thirdly, that such improvement, as determined in TREC-style evaluation, rarely, if ever, leads to improved performance by human searchers in interactive IR systems.&lt;/blockquote&gt;Nick has long been critical of the IR community's neglect of users and interaction. But this keynote was significant for two reasons. First, the ECIR program committee's decision to invite a keynote speaker from the information science community acknowledges the need for collaboration between these two communities. Second, Nick reciprocated this overture by calling for interdisciplinary efforts to bridge the gap between the formal study of information retrieval and the practical understanding of information behavior. As an avid proponent of &lt;a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval"&gt;HCIR&lt;/a&gt;, I am heartily encouraged by steps like these.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8016696494330504473-1133456205347157697?l=thenoisychannel.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thenoisychannel.blogspot.com/feeds/1133456205347157697/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8016696494330504473&amp;postID=1133456205347157697' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1133456205347157697'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8016696494330504473/posts/default/1133456205347157697'/><link rel='alternate' type='text/html' href='http://thenoisychannel.blogspot.com/2008/04/nick-belkin-at-ecir-08.html' title='Nick Belkin at ECIR &apos;08'/><author><name>Daniel Tunkelang</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://2.bp.blogspot.com/-t6fljXJKj7Y/TbL-4ZuAWiI/AAAAAAAAAIg/EFUsFaPMkQs/s1600/daniel.jpg'/></author><thr:total>7</thr:total></entry></feed>
