Monday, May 26, 2008

Your Input is Relevant!

The following is a public service announcement.

As some of you may know, I am the primary author of the Human Computer Information Retrieval entry on Wikipedia. I created this entry last November, shortly after the HCIR '07 workshop. One of the ideas we've tossed around for HCIR '08 is to collaboratively edit the page. But why wait? With apologies to Isaac Asimov, I/you/we are Wikipedia, so let's improve the entry now!

And, while you've got Wikipedia on the brain, please take a look at the Relevance (Information Retrieval) entry. After an unsuccessful attempt to have this entry folded into the main Information Retrieval entry, I've tried to rewrite it to conform to what I perceive as Wikipedia's standards of quality and non-partisanship. While I tried my best, I'm sure there's still room for improving it, and I suspect that some of you reading this are among the best qualified folks to do so!

As Lawrence Lessig says, it's a read-write society. So readers, please help out a bit with the writing.

11 comments:

Anonymous said...

Daniel -- I wish I had time to help out with this wikipedia page!

Its my understanding that the term "relevance" in IR has two primary meanings. The first, (I believe its Saracevic's definition) refers to a general "goodness" of a document retrieved by an IR system. This is influenced by the document itself, as well as the collection, the user, the time in which the query was issued, and likely many more factors. This notion may include topicality, authority, novelty, etc., but is ultimately up to the user and how (s)he interprets the retrieved result at time time.

The other interpretation, probably more common due to TREC & Cranfield style evaluations, is referring to topical relevance, a.k.a. "aboutness". This, according to Saracevic, is a manifestation of the more general concept of relevance. But, this is the definition of relevance that is generally adopted in most TREC-style IR evaluations (excluding something like topic tracking, in which novelty plays a critical role) and for this reason is probably the more commonly accepted notion of relevance. This notion of relevance refers to how well the document matches the information need at a topical level. It is also subjective -- a query cannot ever precisely reflect the internal state of the asker -- but its a little easier to pin down and evaluate than the first definition.

I think we all probably agree that there's some larger notion of how "good" a document is in response to a query -- whether we call this relevance, utility, or something else is a point of discussion. Nonetheless, I do think the more widely accepted definition of relevance is the latter -- topical relevance. The Wikipedia page should reflect that distinction.

Daniel Tunkelang said...

Jon, you are helping! I just made a revision to use the phrase "topical relevance" to reflect this distinction. Better?

Anonymous said...

Better.

My initial gut reaction to this article is to really do an overhaul. I would have the opening sentences convey that the common definition of relevance is topical "aboutness" w.r.t the query, and treat the more general definition as a more academic distinction.

Daniel Tunkelang said...

As I said, I really wanted to merge it into the IR page. You can see the discussion here. Given the decision of the Wikipedia editors to keep the page, I tried to work with what was there. It would be great if you or someone else could make the next round of edits. As it is, I suspect that the original author and/or the editors might think I'm just on a personal crusade. By keeping the existing structure but overhauling much of the content, I tried to play withing the bounds of "editing".

Anonymous said...

I added a whole bunch more to the eval section (precision-at-N, BEP, area under PR curve, convex hulls, etc.).

I also made the connection to binary classification, which has a decent entry on Wikipedia (whereas the classification entry itself is very weak).

Personally, I find the PR curve the most useful thing to look at. Partly that's because we've been pushing high-recall evals, whereas most web search is about high precision.

This just highlights the problem with editing vs. rewriting Wikipedia entries. I'd define everything in terms of confusion matrices and ranked confusion matrices as we do in the LingPipe classes for ConfusionMatrix, PrecisionRecallEvaluation,
and ScoredPrecisionRecallEvaluation .

I'm particularly fond of scoring by log loss, but that only makes sense for probabilistic retrieval.

For the entry on relevance, I'd like to see a section on marginal relevance with links to Carbonell et al. Abstractly, I think the question is what's the most relevant set of docs to return? Or what's the most relevant doc given that I've seen these other docs?

I also removed hyphens from nominal uses of "F measure". The prescriptive rule as explained to me by a copy editor is that noun compounds are hyphenated when used as an adjective but not when used as a noun (e.g. a towel-rack designer vs. a towel rack). Feel free to re-insert if this completely violates usage in the IR community.

Daniel Tunkelang said...

Bob, thanks! This is shaping up, and I'm starting to feel there is hope for that entry after all!

That said, I'm not averse to a rewrite of the entry. I have the feeling that, with more of us taking the reins, it's a bit less controversial than if I personally, after trying to get the entry deleted, completely rewrote it.

I'm happy to cut and paste / summarize discussion from here onto the talk page, or to let people do that themselves.

Anonymous said...

"Abstractly, I think the question is what's the most relevant set of docs to return? Or what's the most relevant doc given that I've seen these other docs?"

Bob -- excellent points, and this belongs in the relevance article. This fits squarely into the "topical relevance" vs. "user relevance" dichotomy I mentioned earlier. Your points highlight instances where the user relevance is really a moving target.

One quibble -- the connection to binary classification is not quite right. It is true that when using binary relevance judgements, some IR performance measures look similar to things like accuracy, but they are fundamentally different. The most common IR evaluation measures (MAP, NDCG, MRR) place higher significance on items high in the ranked list than lower in the ranked list, whereas classification evaluation measures are essentially based on the placement of a class boundary in the ranked list. There is also a considerable amount of research in IR evaluation without explicit relevance levels (binary or otherwise), rather using preferred orderings between pairs of documents given a query.

I also am a little confused as to why there is an evaluation section in this article. There is already a section in the IR article about performance measures. The connection between relevance and evaluation needs to be made, but this section seems to be out of place.

Daniel Tunkelang said...

Jon, I am not thrilled with the redundancy between the Performance Measures section here and the corresponding section in the Information Retrieval entry. But I'm loathe to delete the latter, even though it seems more appropriate for it to be part of the discussion of relevance (since these are purported measures of relevance). All the more reason I wanted to merge the entries. Unless Wikipedia has a better mechanism for handling material that should appear in two places.

Anonymous said...

I know next to nothing about Wikipedia policies regarding editing vs. re-writing, redundant material, etc. It seems reasonable that there be either one large IR article, encompassing this Relevance material (your suggestion to merge them) or several: one for general IR, one for IR evaluation, one for Relevance, maybe one for retrieval models, and possibly more... that's starting to look more like a textbook, though.

I've just had a bit of work taken off my plate and may take a crack at editing this article today. Many thanks for your & Bob's effort in getting this article headed the right way.

Anonymous said...

made some updates & drastic cuts to the article, in an attempt to keep it on topic for relevance, not necessarily evaluation or algorithms.

Daniel Tunkelang said...

I'm quite happy with the results, at least compared to my initial effort and especially compared to what preceded it. While I'm sure there's room for improvement, I feel this is at least good enough for high school students using it to cheat on their term papers.

Monday, May 26, 2008

Your Input is Relevant!

The following is a public service announcement.

As some of you may know, I am the primary author of the Human Computer Information Retrieval entry on Wikipedia. I created this entry last November, shortly after the HCIR '07 workshop. One of the ideas we've tossed around for HCIR '08 is to collaboratively edit the page. But why wait? With apologies to Isaac Asimov, I/you/we are Wikipedia, so let's improve the entry now!

And, while you've got Wikipedia on the brain, please take a look at the Relevance (Information Retrieval) entry. After an unsuccessful attempt to have this entry folded into the main Information Retrieval entry, I've tried to rewrite it to conform to what I perceive as Wikipedia's standards of quality and non-partisanship. While I tried my best, I'm sure there's still room for improving it, and I suspect that some of you reading this are among the best qualified folks to do so!

As Lawrence Lessig says, it's a read-write society. So readers, please help out a bit with the writing.

11 comments:

Anonymous said...

Daniel -- I wish I had time to help out with this wikipedia page!

Its my understanding that the term "relevance" in IR has two primary meanings. The first, (I believe its Saracevic's definition) refers to a general "goodness" of a document retrieved by an IR system. This is influenced by the document itself, as well as the collection, the user, the time in which the query was issued, and likely many more factors. This notion may include topicality, authority, novelty, etc., but is ultimately up to the user and how (s)he interprets the retrieved result at time time.

The other interpretation, probably more common due to TREC & Cranfield style evaluations, is referring to topical relevance, a.k.a. "aboutness". This, according to Saracevic, is a manifestation of the more general concept of relevance. But, this is the definition of relevance that is generally adopted in most TREC-style IR evaluations (excluding something like topic tracking, in which novelty plays a critical role) and for this reason is probably the more commonly accepted notion of relevance. This notion of relevance refers to how well the document matches the information need at a topical level. It is also subjective -- a query cannot ever precisely reflect the internal state of the asker -- but its a little easier to pin down and evaluate than the first definition.

I think we all probably agree that there's some larger notion of how "good" a document is in response to a query -- whether we call this relevance, utility, or something else is a point of discussion. Nonetheless, I do think the more widely accepted definition of relevance is the latter -- topical relevance. The Wikipedia page should reflect that distinction.

Daniel Tunkelang said...

Jon, you are helping! I just made a revision to use the phrase "topical relevance" to reflect this distinction. Better?

Anonymous said...

Better.

My initial gut reaction to this article is to really do an overhaul. I would have the opening sentences convey that the common definition of relevance is topical "aboutness" w.r.t the query, and treat the more general definition as a more academic distinction.

Daniel Tunkelang said...

As I said, I really wanted to merge it into the IR page. You can see the discussion here. Given the decision of the Wikipedia editors to keep the page, I tried to work with what was there. It would be great if you or someone else could make the next round of edits. As it is, I suspect that the original author and/or the editors might think I'm just on a personal crusade. By keeping the existing structure but overhauling much of the content, I tried to play withing the bounds of "editing".

Anonymous said...

I added a whole bunch more to the eval section (precision-at-N, BEP, area under PR curve, convex hulls, etc.).

I also made the connection to binary classification, which has a decent entry on Wikipedia (whereas the classification entry itself is very weak).

Personally, I find the PR curve the most useful thing to look at. Partly that's because we've been pushing high-recall evals, whereas most web search is about high precision.

This just highlights the problem with editing vs. rewriting Wikipedia entries. I'd define everything in terms of confusion matrices and ranked confusion matrices as we do in the LingPipe classes for ConfusionMatrix, PrecisionRecallEvaluation,
and ScoredPrecisionRecallEvaluation .

I'm particularly fond of scoring by log loss, but that only makes sense for probabilistic retrieval.

For the entry on relevance, I'd like to see a section on marginal relevance with links to Carbonell et al. Abstractly, I think the question is what's the most relevant set of docs to return? Or what's the most relevant doc given that I've seen these other docs?

I also removed hyphens from nominal uses of "F measure". The prescriptive rule as explained to me by a copy editor is that noun compounds are hyphenated when used as an adjective but not when used as a noun (e.g. a towel-rack designer vs. a towel rack). Feel free to re-insert if this completely violates usage in the IR community.

Daniel Tunkelang said...

Bob, thanks! This is shaping up, and I'm starting to feel there is hope for that entry after all!

That said, I'm not averse to a rewrite of the entry. I have the feeling that, with more of us taking the reins, it's a bit less controversial than if I personally, after trying to get the entry deleted, completely rewrote it.

I'm happy to cut and paste / summarize discussion from here onto the talk page, or to let people do that themselves.

Anonymous said...

"Abstractly, I think the question is what's the most relevant set of docs to return? Or what's the most relevant doc given that I've seen these other docs?"

Bob -- excellent points, and this belongs in the relevance article. This fits squarely into the "topical relevance" vs. "user relevance" dichotomy I mentioned earlier. Your points highlight instances where the user relevance is really a moving target.

One quibble -- the connection to binary classification is not quite right. It is true that when using binary relevance judgements, some IR performance measures look similar to things like accuracy, but they are fundamentally different. The most common IR evaluation measures (MAP, NDCG, MRR) place higher significance on items high in the ranked list than lower in the ranked list, whereas classification evaluation measures are essentially based on the placement of a class boundary in the ranked list. There is also a considerable amount of research in IR evaluation without explicit relevance levels (binary or otherwise), rather using preferred orderings between pairs of documents given a query.

I also am a little confused as to why there is an evaluation section in this article. There is already a section in the IR article about performance measures. The connection between relevance and evaluation needs to be made, but this section seems to be out of place.

Daniel Tunkelang said...

Jon, I am not thrilled with the redundancy between the Performance Measures section here and the corresponding section in the Information Retrieval entry. But I'm loathe to delete the latter, even though it seems more appropriate for it to be part of the discussion of relevance (since these are purported measures of relevance). All the more reason I wanted to merge the entries. Unless Wikipedia has a better mechanism for handling material that should appear in two places.

Anonymous said...

I know next to nothing about Wikipedia policies regarding editing vs. re-writing, redundant material, etc. It seems reasonable that there be either one large IR article, encompassing this Relevance material (your suggestion to merge them) or several: one for general IR, one for IR evaluation, one for Relevance, maybe one for retrieval models, and possibly more... that's starting to look more like a textbook, though.

I've just had a bit of work taken off my plate and may take a crack at editing this article today. Many thanks for your & Bob's effort in getting this article headed the right way.

Anonymous said...

made some updates & drastic cuts to the article, in an attempt to keep it on topic for relevance, not necessarily evaluation or algorithms.

Daniel Tunkelang said...

I'm quite happy with the results, at least compared to my initial effort and especially compared to what preceded it. While I'm sure there's room for improvement, I feel this is at least good enough for high school students using it to cheat on their term papers.

Monday, May 26, 2008

Your Input is Relevant!

The following is a public service announcement.

As some of you may know, I am the primary author of the Human Computer Information Retrieval entry on Wikipedia. I created this entry last November, shortly after the HCIR '07 workshop. One of the ideas we've tossed around for HCIR '08 is to collaboratively edit the page. But why wait? With apologies to Isaac Asimov, I/you/we are Wikipedia, so let's improve the entry now!

And, while you've got Wikipedia on the brain, please take a look at the Relevance (Information Retrieval) entry. After an unsuccessful attempt to have this entry folded into the main Information Retrieval entry, I've tried to rewrite it to conform to what I perceive as Wikipedia's standards of quality and non-partisanship. While I tried my best, I'm sure there's still room for improving it, and I suspect that some of you reading this are among the best qualified folks to do so!

As Lawrence Lessig says, it's a read-write society. So readers, please help out a bit with the writing.

11 comments:

Anonymous said...

Daniel -- I wish I had time to help out with this wikipedia page!

Its my understanding that the term "relevance" in IR has two primary meanings. The first, (I believe its Saracevic's definition) refers to a general "goodness" of a document retrieved by an IR system. This is influenced by the document itself, as well as the collection, the user, the time in which the query was issued, and likely many more factors. This notion may include topicality, authority, novelty, etc., but is ultimately up to the user and how (s)he interprets the retrieved result at time time.

The other interpretation, probably more common due to TREC & Cranfield style evaluations, is referring to topical relevance, a.k.a. "aboutness". This, according to Saracevic, is a manifestation of the more general concept of relevance. But, this is the definition of relevance that is generally adopted in most TREC-style IR evaluations (excluding something like topic tracking, in which novelty plays a critical role) and for this reason is probably the more commonly accepted notion of relevance. This notion of relevance refers to how well the document matches the information need at a topical level. It is also subjective -- a query cannot ever precisely reflect the internal state of the asker -- but its a little easier to pin down and evaluate than the first definition.

I think we all probably agree that there's some larger notion of how "good" a document is in response to a query -- whether we call this relevance, utility, or something else is a point of discussion. Nonetheless, I do think the more widely accepted definition of relevance is the latter -- topical relevance. The Wikipedia page should reflect that distinction.

Daniel Tunkelang said...

Jon, you are helping! I just made a revision to use the phrase "topical relevance" to reflect this distinction. Better?

Anonymous said...

Better.

My initial gut reaction to this article is to really do an overhaul. I would have the opening sentences convey that the common definition of relevance is topical "aboutness" w.r.t the query, and treat the more general definition as a more academic distinction.

Daniel Tunkelang said...

As I said, I really wanted to merge it into the IR page. You can see the discussion here. Given the decision of the Wikipedia editors to keep the page, I tried to work with what was there. It would be great if you or someone else could make the next round of edits. As it is, I suspect that the original author and/or the editors might think I'm just on a personal crusade. By keeping the existing structure but overhauling much of the content, I tried to play withing the bounds of "editing".

Anonymous said...

I added a whole bunch more to the eval section (precision-at-N, BEP, area under PR curve, convex hulls, etc.).

I also made the connection to binary classification, which has a decent entry on Wikipedia (whereas the classification entry itself is very weak).

Personally, I find the PR curve the most useful thing to look at. Partly that's because we've been pushing high-recall evals, whereas most web search is about high precision.

This just highlights the problem with editing vs. rewriting Wikipedia entries. I'd define everything in terms of confusion matrices and ranked confusion matrices as we do in the LingPipe classes for ConfusionMatrix, PrecisionRecallEvaluation,
and ScoredPrecisionRecallEvaluation .

I'm particularly fond of scoring by log loss, but that only makes sense for probabilistic retrieval.

For the entry on relevance, I'd like to see a section on marginal relevance with links to Carbonell et al. Abstractly, I think the question is what's the most relevant set of docs to return? Or what's the most relevant doc given that I've seen these other docs?

I also removed hyphens from nominal uses of "F measure". The prescriptive rule as explained to me by a copy editor is that noun compounds are hyphenated when used as an adjective but not when used as a noun (e.g. a towel-rack designer vs. a towel rack). Feel free to re-insert if this completely violates usage in the IR community.

Daniel Tunkelang said...

Bob, thanks! This is shaping up, and I'm starting to feel there is hope for that entry after all!

That said, I'm not averse to a rewrite of the entry. I have the feeling that, with more of us taking the reins, it's a bit less controversial than if I personally, after trying to get the entry deleted, completely rewrote it.

I'm happy to cut and paste / summarize discussion from here onto the talk page, or to let people do that themselves.

Anonymous said...

"Abstractly, I think the question is what's the most relevant set of docs to return? Or what's the most relevant doc given that I've seen these other docs?"

Bob -- excellent points, and this belongs in the relevance article. This fits squarely into the "topical relevance" vs. "user relevance" dichotomy I mentioned earlier. Your points highlight instances where the user relevance is really a moving target.

One quibble -- the connection to binary classification is not quite right. It is true that when using binary relevance judgements, some IR performance measures look similar to things like accuracy, but they are fundamentally different. The most common IR evaluation measures (MAP, NDCG, MRR) place higher significance on items high in the ranked list than lower in the ranked list, whereas classification evaluation measures are essentially based on the placement of a class boundary in the ranked list. There is also a considerable amount of research in IR evaluation without explicit relevance levels (binary or otherwise), rather using preferred orderings between pairs of documents given a query.

I also am a little confused as to why there is an evaluation section in this article. There is already a section in the IR article about performance measures. The connection between relevance and evaluation needs to be made, but this section seems to be out of place.

Daniel Tunkelang said...

Jon, I am not thrilled with the redundancy between the Performance Measures section here and the corresponding section in the Information Retrieval entry. But I'm loathe to delete the latter, even though it seems more appropriate for it to be part of the discussion of relevance (since these are purported measures of relevance). All the more reason I wanted to merge the entries. Unless Wikipedia has a better mechanism for handling material that should appear in two places.

Anonymous said...

I know next to nothing about Wikipedia policies regarding editing vs. re-writing, redundant material, etc. It seems reasonable that there be either one large IR article, encompassing this Relevance material (your suggestion to merge them) or several: one for general IR, one for IR evaluation, one for Relevance, maybe one for retrieval models, and possibly more... that's starting to look more like a textbook, though.

I've just had a bit of work taken off my plate and may take a crack at editing this article today. Many thanks for your & Bob's effort in getting this article headed the right way.

Anonymous said...

made some updates & drastic cuts to the article, in an attempt to keep it on topic for relevance, not necessarily evaluation or algorithms.

Daniel Tunkelang said...

I'm quite happy with the results, at least compared to my initial effort and especially compared to what preceded it. While I'm sure there's room for improvement, I feel this is at least good enough for high school students using it to cheat on their term papers.