Sunday, July 27, 2008

Catching up on SIGIR '08

Now that SIGIR '08 is over, I hope to see more folks blogging about it. I'm jealous of everyone who had the opportunity to attend, not only because of the culinary delights of Singapore, but because the program seems to reflect an increasing interest of the academic community in real-world IR problems.

Some notes from looking over the proceedings:
  • Of the 27 paper sessions, 2 include the word "user" in their titles, 2 include the word "social", 2 focus on Query Analysis & Models, and 1 is about exploratory search. Compared to the last few SIGIR conferences, this is a significant increase in focus on users and interaction.

  • A paper on whether test collections predict users' effectiveness offers an admirable defense of the Cranfield paradigm, much along the lines I've been advocating.

  • A nice paper from Microsoft Research looks at the problem of whether to personalize results for a query, recognizing that not all queries benefit from personalization. This approach may well be able to reap the benefits of personaliztion while avoiding much of its harm.

  • Two papers on tag prediction: Real-time Automatic Tag Recommendation (ACM Digital Library subscription required) and Social Tag Prediction. Semi-automated tagging tools are one of the best ways to leverage the best of both human and machine capabilities.
And I haven't even gotten to the posters! I'm sad to see that they dropped the industry day, but perhaps they'll bring it back next year in Boston.

9 comments:

Max L. Wilson said...

win! i didnt realise SIGIR had happened now. I was sad not to attend too. thanks for the blogpost to remind me about it. excellent to hear that there was even more user stuff in the main proceedings this year. i think there was only 1 session last year.

Pavel Serdyukov said...

I had several arguments against when I was reading the Sheffild's paper about correlating measures and user effectiveness. In my opinion, they try to state and prove obvious things.

- Their user task is purely recall based and they prove that p@200 correlates better than MAP with user satisfaction. Of course, since the task they've tested correlates a lot with recall.

- They try to look at the correlation of various P@ - P@10, P@20 etc. with the time needed to save the first relevant documents and discover that P@10 correlates most of all(!). It was not hard to guess... And not to say, that there are specific measures for that, like Mean Reciprocal Rank.

I would be much more in favour of a paper that measures correlation of all possible measures with satisfaction of doing all possible tasks with a system.

Daniel Tunkelang said...

Pavel,

Fair criticism, and perhaps I'm betraying my enthusiasm at any work that pursues the problem of correlating system performance measures with user effectiveness. For example, I was psyched to see Stephen Robertson derive average precision from a user model, supporting the case many of us have been making that system performance measures need to be ultimately motivated by users.

In any case, while I'm in favor of considering a broader set of measures, I'm not sure that satisfaction is the most important one to optimize. I think user effectiveness at a task level is at least as important, and that recall-oriented tasks have been given short shrift in a community that over-emphasizes precision in the top-ranked results.

Anonymous said...

Why don't we boycott all the conferences that don't have open access proceedings? I can't even read the first paper you pointed to.

One way to make this happen is through reviewing. I only agree to review for venues that publish open access.

The Journal of Machine Learning Resarch and the United States National Institutes of Health are an inspiration. Here's the ML vs. JMLR story and here's the NIH open access policy.

The Computational Linguistics journal will be open access as of May 2009; the ACL conference proceedings have been open access. The ACL executive board and journal editor (Robert Dale) worked hard to make CL open access.

Daniel Tunkelang said...

Bob,

The only paper I can't find openly available online is "Real-time Automatic Tag Recommendation". Most of the time, authors post papers on their home pages (or at worst will send copies of their papers on request), and the ACM doesn't seem to intervene. I'm not sure how much the ACM counts on revenue from Digital Library subscriptions, but I imagine that making the proceedings open-access would put a dent in that revenue.

Paul Heymann said...

The "Real-Time Automated Tag Recommendation" paper is here, I think. I couldn't find the other paper from the Social Tagging session online in a form that wasn't broken, however (the Ralf Schenkel paper).

Incidentally, on the topic of the ACM Digital Library, I think they said during the SIGIR'08 Business Meeting that SIGIR actually makes more money off of the Digital Library than any other source (even compared to membership subscriptions and such). I would prefer that they do the same thing as the WWW Conference and post all papers online, however.

If you're curious, I think I'm going to post a report on SIGIR'08 to the Stanford InfoBlog tomorrow or Wednesday, though I don't think I'm going to talk about the business meeting. (It was pretty procedural---Should we have video? Where do we get money? How should we change reviewing given that we can't have the Senior PC meet for longer than a day?).

Daniel Tunkelang said...

Here is a link to the Ralf Schenkel paper: http://lsirpeople.epfl.ch/smichel/publications/sigir2008.pdf

I'm not surprised that the Digital Library is a money-maker--to me, it's the main motivation for renewing my membership (not to mention paying extra for access to the library). I'm sure there would be broader readership if access to the Digital Library were free, but I'm not sure if authors (who would presumably be the folks pushing hardest for broader readership) would be willing to pay extra to make up for the lost revenue. This seems to be one of those rare cases where a lot of readers are willing to pay for content.

Anonymous said...

I find that computer scientists are good pirates, but social scientists, statisticians and biologists aren't so likely to post copies of their papers online. And I can't get those papers through an ACM subscription.

We went around and around on the money-making aspect of journals with Computational Linguistics. In the end, the ACL is pitching in an additional $20K/year to make the journal free. That $20K/year pays for an admin assistant for the journal (1 day/week) and for copy-editing and typesetting/layout. For some reason journals will pay for assistants but not for editors or reviewers.

In the end, I think what swayed everyone was the experience of JMLR. It's now the leading journal for machine learning, and part of that may be attributable to its open access policy.

Not to sound like a raving open-source lunatic, but the fact that an outfit like SIGIR's making money off of the whole enterprise doesn't make me want to review for them or submit papers.

And not to advocate this approach, but some open-access biology journals push the price off to the authors, such as Oxford's Nucleic Acids Research.

Daniel Tunkelang said...

I spend far more time reading SIGIR papers than writing or reviewing them, so I don't have a problem with being charged for access. Conversely, I imagine that many authors see publication as a benefit to them, and therefore do not expect any compensation for their contributions--if anything, they pay in the form of time investment and conference fees. Reviewers, as far as I can tell, are the only people acting altruistically, and I imagine that most reviewers are also authors.

All that said, I'd be in favor of open access. But SIGIR is hardly a profit-making enterprise. I assume that the money they'd lose from Digital Library subscriptions would have to come from somewhere else, like conference fees, which in large part means from authors. Well, from their employers and funding agencies.

Sunday, July 27, 2008

Catching up on SIGIR '08

Now that SIGIR '08 is over, I hope to see more folks blogging about it. I'm jealous of everyone who had the opportunity to attend, not only because of the culinary delights of Singapore, but because the program seems to reflect an increasing interest of the academic community in real-world IR problems.

Some notes from looking over the proceedings:
  • Of the 27 paper sessions, 2 include the word "user" in their titles, 2 include the word "social", 2 focus on Query Analysis & Models, and 1 is about exploratory search. Compared to the last few SIGIR conferences, this is a significant increase in focus on users and interaction.

  • A paper on whether test collections predict users' effectiveness offers an admirable defense of the Cranfield paradigm, much along the lines I've been advocating.

  • A nice paper from Microsoft Research looks at the problem of whether to personalize results for a query, recognizing that not all queries benefit from personalization. This approach may well be able to reap the benefits of personaliztion while avoiding much of its harm.

  • Two papers on tag prediction: Real-time Automatic Tag Recommendation (ACM Digital Library subscription required) and Social Tag Prediction. Semi-automated tagging tools are one of the best ways to leverage the best of both human and machine capabilities.
And I haven't even gotten to the posters! I'm sad to see that they dropped the industry day, but perhaps they'll bring it back next year in Boston.

9 comments:

Max L. Wilson said...

win! i didnt realise SIGIR had happened now. I was sad not to attend too. thanks for the blogpost to remind me about it. excellent to hear that there was even more user stuff in the main proceedings this year. i think there was only 1 session last year.

Pavel Serdyukov said...

I had several arguments against when I was reading the Sheffild's paper about correlating measures and user effectiveness. In my opinion, they try to state and prove obvious things.

- Their user task is purely recall based and they prove that p@200 correlates better than MAP with user satisfaction. Of course, since the task they've tested correlates a lot with recall.

- They try to look at the correlation of various P@ - P@10, P@20 etc. with the time needed to save the first relevant documents and discover that P@10 correlates most of all(!). It was not hard to guess... And not to say, that there are specific measures for that, like Mean Reciprocal Rank.

I would be much more in favour of a paper that measures correlation of all possible measures with satisfaction of doing all possible tasks with a system.

Daniel Tunkelang said...

Pavel,

Fair criticism, and perhaps I'm betraying my enthusiasm at any work that pursues the problem of correlating system performance measures with user effectiveness. For example, I was psyched to see Stephen Robertson derive average precision from a user model, supporting the case many of us have been making that system performance measures need to be ultimately motivated by users.

In any case, while I'm in favor of considering a broader set of measures, I'm not sure that satisfaction is the most important one to optimize. I think user effectiveness at a task level is at least as important, and that recall-oriented tasks have been given short shrift in a community that over-emphasizes precision in the top-ranked results.

Anonymous said...

Why don't we boycott all the conferences that don't have open access proceedings? I can't even read the first paper you pointed to.

One way to make this happen is through reviewing. I only agree to review for venues that publish open access.

The Journal of Machine Learning Resarch and the United States National Institutes of Health are an inspiration. Here's the ML vs. JMLR story and here's the NIH open access policy.

The Computational Linguistics journal will be open access as of May 2009; the ACL conference proceedings have been open access. The ACL executive board and journal editor (Robert Dale) worked hard to make CL open access.

Daniel Tunkelang said...

Bob,

The only paper I can't find openly available online is "Real-time Automatic Tag Recommendation". Most of the time, authors post papers on their home pages (or at worst will send copies of their papers on request), and the ACM doesn't seem to intervene. I'm not sure how much the ACM counts on revenue from Digital Library subscriptions, but I imagine that making the proceedings open-access would put a dent in that revenue.

Paul Heymann said...

The "Real-Time Automated Tag Recommendation" paper is here, I think. I couldn't find the other paper from the Social Tagging session online in a form that wasn't broken, however (the Ralf Schenkel paper).

Incidentally, on the topic of the ACM Digital Library, I think they said during the SIGIR'08 Business Meeting that SIGIR actually makes more money off of the Digital Library than any other source (even compared to membership subscriptions and such). I would prefer that they do the same thing as the WWW Conference and post all papers online, however.

If you're curious, I think I'm going to post a report on SIGIR'08 to the Stanford InfoBlog tomorrow or Wednesday, though I don't think I'm going to talk about the business meeting. (It was pretty procedural---Should we have video? Where do we get money? How should we change reviewing given that we can't have the Senior PC meet for longer than a day?).

Daniel Tunkelang said...

Here is a link to the Ralf Schenkel paper: http://lsirpeople.epfl.ch/smichel/publications/sigir2008.pdf

I'm not surprised that the Digital Library is a money-maker--to me, it's the main motivation for renewing my membership (not to mention paying extra for access to the library). I'm sure there would be broader readership if access to the Digital Library were free, but I'm not sure if authors (who would presumably be the folks pushing hardest for broader readership) would be willing to pay extra to make up for the lost revenue. This seems to be one of those rare cases where a lot of readers are willing to pay for content.

Anonymous said...

I find that computer scientists are good pirates, but social scientists, statisticians and biologists aren't so likely to post copies of their papers online. And I can't get those papers through an ACM subscription.

We went around and around on the money-making aspect of journals with Computational Linguistics. In the end, the ACL is pitching in an additional $20K/year to make the journal free. That $20K/year pays for an admin assistant for the journal (1 day/week) and for copy-editing and typesetting/layout. For some reason journals will pay for assistants but not for editors or reviewers.

In the end, I think what swayed everyone was the experience of JMLR. It's now the leading journal for machine learning, and part of that may be attributable to its open access policy.

Not to sound like a raving open-source lunatic, but the fact that an outfit like SIGIR's making money off of the whole enterprise doesn't make me want to review for them or submit papers.

And not to advocate this approach, but some open-access biology journals push the price off to the authors, such as Oxford's Nucleic Acids Research.

Daniel Tunkelang said...

I spend far more time reading SIGIR papers than writing or reviewing them, so I don't have a problem with being charged for access. Conversely, I imagine that many authors see publication as a benefit to them, and therefore do not expect any compensation for their contributions--if anything, they pay in the form of time investment and conference fees. Reviewers, as far as I can tell, are the only people acting altruistically, and I imagine that most reviewers are also authors.

All that said, I'd be in favor of open access. But SIGIR is hardly a profit-making enterprise. I assume that the money they'd lose from Digital Library subscriptions would have to come from somewhere else, like conference fees, which in large part means from authors. Well, from their employers and funding agencies.

Sunday, July 27, 2008

Catching up on SIGIR '08

Now that SIGIR '08 is over, I hope to see more folks blogging about it. I'm jealous of everyone who had the opportunity to attend, not only because of the culinary delights of Singapore, but because the program seems to reflect an increasing interest of the academic community in real-world IR problems.

Some notes from looking over the proceedings:
  • Of the 27 paper sessions, 2 include the word "user" in their titles, 2 include the word "social", 2 focus on Query Analysis & Models, and 1 is about exploratory search. Compared to the last few SIGIR conferences, this is a significant increase in focus on users and interaction.

  • A paper on whether test collections predict users' effectiveness offers an admirable defense of the Cranfield paradigm, much along the lines I've been advocating.

  • A nice paper from Microsoft Research looks at the problem of whether to personalize results for a query, recognizing that not all queries benefit from personalization. This approach may well be able to reap the benefits of personaliztion while avoiding much of its harm.

  • Two papers on tag prediction: Real-time Automatic Tag Recommendation (ACM Digital Library subscription required) and Social Tag Prediction. Semi-automated tagging tools are one of the best ways to leverage the best of both human and machine capabilities.
And I haven't even gotten to the posters! I'm sad to see that they dropped the industry day, but perhaps they'll bring it back next year in Boston.

9 comments:

Max L. Wilson said...

win! i didnt realise SIGIR had happened now. I was sad not to attend too. thanks for the blogpost to remind me about it. excellent to hear that there was even more user stuff in the main proceedings this year. i think there was only 1 session last year.

Pavel Serdyukov said...

I had several arguments against when I was reading the Sheffild's paper about correlating measures and user effectiveness. In my opinion, they try to state and prove obvious things.

- Their user task is purely recall based and they prove that p@200 correlates better than MAP with user satisfaction. Of course, since the task they've tested correlates a lot with recall.

- They try to look at the correlation of various P@ - P@10, P@20 etc. with the time needed to save the first relevant documents and discover that P@10 correlates most of all(!). It was not hard to guess... And not to say, that there are specific measures for that, like Mean Reciprocal Rank.

I would be much more in favour of a paper that measures correlation of all possible measures with satisfaction of doing all possible tasks with a system.

Daniel Tunkelang said...

Pavel,

Fair criticism, and perhaps I'm betraying my enthusiasm at any work that pursues the problem of correlating system performance measures with user effectiveness. For example, I was psyched to see Stephen Robertson derive average precision from a user model, supporting the case many of us have been making that system performance measures need to be ultimately motivated by users.

In any case, while I'm in favor of considering a broader set of measures, I'm not sure that satisfaction is the most important one to optimize. I think user effectiveness at a task level is at least as important, and that recall-oriented tasks have been given short shrift in a community that over-emphasizes precision in the top-ranked results.

Anonymous said...

Why don't we boycott all the conferences that don't have open access proceedings? I can't even read the first paper you pointed to.

One way to make this happen is through reviewing. I only agree to review for venues that publish open access.

The Journal of Machine Learning Resarch and the United States National Institutes of Health are an inspiration. Here's the ML vs. JMLR story and here's the NIH open access policy.

The Computational Linguistics journal will be open access as of May 2009; the ACL conference proceedings have been open access. The ACL executive board and journal editor (Robert Dale) worked hard to make CL open access.

Daniel Tunkelang said...

Bob,

The only paper I can't find openly available online is "Real-time Automatic Tag Recommendation". Most of the time, authors post papers on their home pages (or at worst will send copies of their papers on request), and the ACM doesn't seem to intervene. I'm not sure how much the ACM counts on revenue from Digital Library subscriptions, but I imagine that making the proceedings open-access would put a dent in that revenue.

Paul Heymann said...

The "Real-Time Automated Tag Recommendation" paper is here, I think. I couldn't find the other paper from the Social Tagging session online in a form that wasn't broken, however (the Ralf Schenkel paper).

Incidentally, on the topic of the ACM Digital Library, I think they said during the SIGIR'08 Business Meeting that SIGIR actually makes more money off of the Digital Library than any other source (even compared to membership subscriptions and such). I would prefer that they do the same thing as the WWW Conference and post all papers online, however.

If you're curious, I think I'm going to post a report on SIGIR'08 to the Stanford InfoBlog tomorrow or Wednesday, though I don't think I'm going to talk about the business meeting. (It was pretty procedural---Should we have video? Where do we get money? How should we change reviewing given that we can't have the Senior PC meet for longer than a day?).

Daniel Tunkelang said...

Here is a link to the Ralf Schenkel paper: http://lsirpeople.epfl.ch/smichel/publications/sigir2008.pdf

I'm not surprised that the Digital Library is a money-maker--to me, it's the main motivation for renewing my membership (not to mention paying extra for access to the library). I'm sure there would be broader readership if access to the Digital Library were free, but I'm not sure if authors (who would presumably be the folks pushing hardest for broader readership) would be willing to pay extra to make up for the lost revenue. This seems to be one of those rare cases where a lot of readers are willing to pay for content.

Anonymous said...

I find that computer scientists are good pirates, but social scientists, statisticians and biologists aren't so likely to post copies of their papers online. And I can't get those papers through an ACM subscription.

We went around and around on the money-making aspect of journals with Computational Linguistics. In the end, the ACL is pitching in an additional $20K/year to make the journal free. That $20K/year pays for an admin assistant for the journal (1 day/week) and for copy-editing and typesetting/layout. For some reason journals will pay for assistants but not for editors or reviewers.

In the end, I think what swayed everyone was the experience of JMLR. It's now the leading journal for machine learning, and part of that may be attributable to its open access policy.

Not to sound like a raving open-source lunatic, but the fact that an outfit like SIGIR's making money off of the whole enterprise doesn't make me want to review for them or submit papers.

And not to advocate this approach, but some open-access biology journals push the price off to the authors, such as Oxford's Nucleic Acids Research.

Daniel Tunkelang said...

I spend far more time reading SIGIR papers than writing or reviewing them, so I don't have a problem with being charged for access. Conversely, I imagine that many authors see publication as a benefit to them, and therefore do not expect any compensation for their contributions--if anything, they pay in the form of time investment and conference fees. Reviewers, as far as I can tell, are the only people acting altruistically, and I imagine that most reviewers are also authors.

All that said, I'd be in favor of open access. But SIGIR is hardly a profit-making enterprise. I assume that the money they'd lose from Digital Library subscriptions would have to come from somewhere else, like conference fees, which in large part means from authors. Well, from their employers and funding agencies.