Wednesday, June 4, 2008
Idea Navigation
We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.
Click on the frame below to see the presentation they delivered at CHI '08.
Idea Navigation: Structured Browsing for Unstructured Text
Wednesday, June 4, 2008
Idea Navigation
We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.
Click on the frame below to see the presentation they delivered at CHI '08.
Idea Navigation: Structured Browsing for Unstructured Text
5 comments:
-
-
One of the nicer instances of verb-object extraction I've seen is IHOP (information hyperlinked over proteins), which operates over proteins and their interactions. Here's their relations page for TP53, a widely studied human tumor suppressor:
ihop-net.org/UniPub/iHOP/gs/92798.html
Another live app using the same kind of approach is TextRunner from U. Washington:
cs.washington.edu/research/textrunner/
The CoNLL bakeoffs focused on this kind of lightweight predicate/argument parsing for a few years. For instance, see:
lsi.upc.edu/~srlconll/st05/st05.html
As to extending to gerunds and other nominalizations, check out this corpus and related work:
nlp.cs.nyu.edu/meyers/NomBank.html - June 6, 2008 at 3:06 PM
- Daniel Tunkelang said...
-
Thanks for the links! The TextRunner application is very cool, even if it doesn't seem to do much with the verbs. But it seems more interesting that anything else I've seen on the open web, and of course it indexes a much broader and heterogeneous corpus than Wikipedia.
- June 7, 2008 at 2:05 PM
-
-
See also Dawn Lawrie's dissertation, from 2003. A statistical approach to idea navigation, or "concept subsumption heirarchies", as she calls 'em.
http://www.cs.loyola.edu/~lawrie/papers/lawrieThesis.pdf
Scroll through for some good screenshots.
Dawn does this with NN phrases in her heirarchies, but there is no reason why you couldn't extract Adj-Noun phrases, Noun-Verb-Noun phrases, etc. and then use the same underlying statistical language model approaches to building the subsumption heirarchies. - August 25, 2008 at 5:22 PM
- Daniel Tunkelang said...
-
Jeremy, thanks for the link. The approach looks promising, and I'm curious how it compares to WordNet-driven Castanet work at Berkeley. Granted, there's something nice about not depending on a limited lexicon.
As for the idea navigation work, I see it more as suggesting an interface rather than an approach to the information extraction problem of identifying the N-V-N triples. The really simple idea is to think of question answering as a problem best serves by an exploratory interface. - August 26, 2008 at 12:17 AM
-
-
The Castanet work does cite the 1999 Sanderson and Croft work, upon which this 2003 Lawrie work is also based. So I'm sure there are some similarities.
One offhand difference, though, I think, is that the Castanet work appears to create mtutually exclusive, partitioned heirarchies, whereas the Lawrie work allows for multiple parents.
However, that is just my impression after a quick skim; I didn't read the Castanet work in full, and it has also been 5-6 years since I read the Lawrie work in detail. - August 26, 2008 at 5:59 PM
Wednesday, June 4, 2008
Idea Navigation
We found out later that Powerset developed similar functionality that they called "Powermouse" in their private beta and now call "Factz". While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven't seen on Powerset, like leveraging verb hypernyms from WordNet.
Click on the frame below to see the presentation they delivered at CHI '08.
Idea Navigation: Structured Browsing for Unstructured Text
5 comments:
-
-
One of the nicer instances of verb-object extraction I've seen is IHOP (information hyperlinked over proteins), which operates over proteins and their interactions. Here's their relations page for TP53, a widely studied human tumor suppressor:
ihop-net.org/UniPub/iHOP/gs/92798.html
Another live app using the same kind of approach is TextRunner from U. Washington:
cs.washington.edu/research/textrunner/
The CoNLL bakeoffs focused on this kind of lightweight predicate/argument parsing for a few years. For instance, see:
lsi.upc.edu/~srlconll/st05/st05.html
As to extending to gerunds and other nominalizations, check out this corpus and related work:
nlp.cs.nyu.edu/meyers/NomBank.html - June 6, 2008 at 3:06 PM
- Daniel Tunkelang said...
-
Thanks for the links! The TextRunner application is very cool, even if it doesn't seem to do much with the verbs. But it seems more interesting that anything else I've seen on the open web, and of course it indexes a much broader and heterogeneous corpus than Wikipedia.
- June 7, 2008 at 2:05 PM
-
-
See also Dawn Lawrie's dissertation, from 2003. A statistical approach to idea navigation, or "concept subsumption heirarchies", as she calls 'em.
http://www.cs.loyola.edu/~lawrie/papers/lawrieThesis.pdf
Scroll through for some good screenshots.
Dawn does this with NN phrases in her heirarchies, but there is no reason why you couldn't extract Adj-Noun phrases, Noun-Verb-Noun phrases, etc. and then use the same underlying statistical language model approaches to building the subsumption heirarchies. - August 25, 2008 at 5:22 PM
- Daniel Tunkelang said...
-
Jeremy, thanks for the link. The approach looks promising, and I'm curious how it compares to WordNet-driven Castanet work at Berkeley. Granted, there's something nice about not depending on a limited lexicon.
As for the idea navigation work, I see it more as suggesting an interface rather than an approach to the information extraction problem of identifying the N-V-N triples. The really simple idea is to think of question answering as a problem best serves by an exploratory interface. - August 26, 2008 at 12:17 AM
-
-
The Castanet work does cite the 1999 Sanderson and Croft work, upon which this 2003 Lawrie work is also based. So I'm sure there are some similarities.
One offhand difference, though, I think, is that the Castanet work appears to create mtutually exclusive, partitioned heirarchies, whereas the Lawrie work allows for multiple parents.
However, that is just my impression after a quick skim; I didn't read the Castanet work in full, and it has also been 5-6 years since I read the Lawrie work in detail. - August 26, 2008 at 5:59 PM
5 comments:
One of the nicer instances of verb-object extraction I've seen is IHOP (information hyperlinked over proteins), which operates over proteins and their interactions. Here's their relations page for TP53, a widely studied human tumor suppressor:
ihop-net.org/UniPub/iHOP/gs/92798.html
Another live app using the same kind of approach is TextRunner from U. Washington:
cs.washington.edu/research/textrunner/
The CoNLL bakeoffs focused on this kind of lightweight predicate/argument parsing for a few years. For instance, see:
lsi.upc.edu/~srlconll/st05/st05.html
As to extending to gerunds and other nominalizations, check out this corpus and related work:
nlp.cs.nyu.edu/meyers/NomBank.html
Thanks for the links! The TextRunner application is very cool, even if it doesn't seem to do much with the verbs. But it seems more interesting that anything else I've seen on the open web, and of course it indexes a much broader and heterogeneous corpus than Wikipedia.
See also Dawn Lawrie's dissertation, from 2003. A statistical approach to idea navigation, or "concept subsumption heirarchies", as she calls 'em.
http://www.cs.loyola.edu/~lawrie/papers/lawrieThesis.pdf
Scroll through for some good screenshots.
Dawn does this with NN phrases in her heirarchies, but there is no reason why you couldn't extract Adj-Noun phrases, Noun-Verb-Noun phrases, etc. and then use the same underlying statistical language model approaches to building the subsumption heirarchies.
Jeremy, thanks for the link. The approach looks promising, and I'm curious how it compares to WordNet-driven Castanet work at Berkeley. Granted, there's something nice about not depending on a limited lexicon.
As for the idea navigation work, I see it more as suggesting an interface rather than an approach to the information extraction problem of identifying the N-V-N triples. The really simple idea is to think of question answering as a problem best serves by an exploratory interface.
The Castanet work does cite the 1999 Sanderson and Croft work, upon which this 2003 Lawrie work is also based. So I'm sure there are some similarities.
One offhand difference, though, I think, is that the Castanet work appears to create mtutually exclusive, partitioned heirarchies, whereas the Lawrie work allows for multiple parents.
However, that is just my impression after a quick skim; I didn't read the Castanet work in full, and it has also been 5-6 years since I read the Lawrie work in detail.
Post a Comment