Sunday, September 14, 2008

Is Blog Search Different?

Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.

The position paper suggests focusing on 3 three kinds of search tasks:
  1. Find out what are people thinking or feeling about X over time.
  2. Find good blogs/authors to read.
  3. Find useful information that was published in blogs sometime in the past.
The authors generally recommend the use of faceted navigation interfaces--something I'd hope would be uncontroversial by now for search in general.

But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.

So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?

2 comments:

Jon said...

Daniel -- The task of blog distillation really does seem different than most traditional ad hoc tasks. In this task, we're really retrieving individual authors or small groups of authors based on their discussion of a topic. In our SIGIR paper, we really looked closely at the blog distillation queries as compared to other ad hoc task queries, and found the blog search queries were on average shorter, much more general, and typically represented multifaceted information needs.

This analysis was done on TREC data, not data collected from a real-world search engine, but I do think the distinctions are valid. The information needs of blog searchers are not equal to information needs of web searchers in general.

David Fauth said...

"I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data..."

Isn't this where the heavy lifting and hard work is? Instead of indexing a set of pages and determining number of links to/from, now we are asking what are the topics about, are they reliable (however that is defined), how many comments (what if comments are turned off), number of RSS subscribers, attitude of the blogger, tags (do they use tags or not), etc.

Faceted search is valuable, however, it appears that there has only been limited work towards building out some of the facets needed for navigation.

A follow-up question is if I want to find out about a topic, why shouldn't I be able to have somewhat of a picture put together of both blogs and usual web sites.

Sunday, September 14, 2008

Is Blog Search Different?

Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.

The position paper suggests focusing on 3 three kinds of search tasks:
  1. Find out what are people thinking or feeling about X over time.
  2. Find good blogs/authors to read.
  3. Find useful information that was published in blogs sometime in the past.
The authors generally recommend the use of faceted navigation interfaces--something I'd hope would be uncontroversial by now for search in general.

But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.

So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?

2 comments:

Jon said...

Daniel -- The task of blog distillation really does seem different than most traditional ad hoc tasks. In this task, we're really retrieving individual authors or small groups of authors based on their discussion of a topic. In our SIGIR paper, we really looked closely at the blog distillation queries as compared to other ad hoc task queries, and found the blog search queries were on average shorter, much more general, and typically represented multifaceted information needs.

This analysis was done on TREC data, not data collected from a real-world search engine, but I do think the distinctions are valid. The information needs of blog searchers are not equal to information needs of web searchers in general.

David Fauth said...

"I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data..."

Isn't this where the heavy lifting and hard work is? Instead of indexing a set of pages and determining number of links to/from, now we are asking what are the topics about, are they reliable (however that is defined), how many comments (what if comments are turned off), number of RSS subscribers, attitude of the blogger, tags (do they use tags or not), etc.

Faceted search is valuable, however, it appears that there has only been limited work towards building out some of the facets needed for navigation.

A follow-up question is if I want to find out about a topic, why shouldn't I be able to have somewhat of a picture put together of both blogs and usual web sites.

Sunday, September 14, 2008

Is Blog Search Different?

Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.

The position paper suggests focusing on 3 three kinds of search tasks:
  1. Find out what are people thinking or feeling about X over time.
  2. Find good blogs/authors to read.
  3. Find useful information that was published in blogs sometime in the past.
The authors generally recommend the use of faceted navigation interfaces--something I'd hope would be uncontroversial by now for search in general.

But I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don't doubt the data they've collected, but I'm curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.

So I'd like to turn the question around: what should blog search look like that is not applicable to search in general?

2 comments:

Jon said...

Daniel -- The task of blog distillation really does seem different than most traditional ad hoc tasks. In this task, we're really retrieving individual authors or small groups of authors based on their discussion of a topic. In our SIGIR paper, we really looked closely at the blog distillation queries as compared to other ad hoc task queries, and found the blog search queries were on average shorter, much more general, and typically represented multifaceted information needs.

This analysis was done on TREC data, not data collected from a real-world search engine, but I do think the distinctions are valid. The information needs of blog searchers are not equal to information needs of web searchers in general.

David Fauth said...

"I'm more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data..."

Isn't this where the heavy lifting and hard work is? Instead of indexing a set of pages and determining number of links to/from, now we are asking what are the topics about, are they reliable (however that is defined), how many comments (what if comments are turned off), number of RSS subscribers, attitude of the blogger, tags (do they use tags or not), etc.

Faceted search is valuable, however, it appears that there has only been limited work towards building out some of the facets needed for navigation.

A follow-up question is if I want to find out about a topic, why shouldn't I be able to have somewhat of a picture put together of both blogs and usual web sites.