Friday, April 18, 2008

The Search for Meaning

By a fortuitous coincidence, I had the opportunity to see two consecutive presentations from search engine companies banking on natural language processing (NLP) to power the next generation of search. The first was from Ron Kaplan, Chief Technology and Science Officer of Powerset, who presented at Columbia University. The second was from Christian Hempelmann, Chief Scientific Officer of hakia, who presented at New York Semantic Web Meetup.

The Powerset talk was entitled "Deep natural language processing for web-scale indexing and retrieval." Jon Elsas, who attended the same talk earlier this week at CMU, did an excellent job summarizing it on his blog. I'll simply express my reaction: I don't get it. I have no reason to doubt that their NLP pipeline is best-in-class. The team has impressive credentials. But I see no evidence that they have produced better results than keyword search. After participating in their private beta for several months, I'd hoped that the presentation would help me see what I'd missed. I specifically asked Ron what measures they used to evaluate their system, and he was mum. So now I am more unconvinced that ever, though, to steal a line from a colleague, I cannot reconcile their enthusiasm with their results.

The hakia talk was entitled "Search for Meaning." Christian started by making the case for a semantic, rather than statistical approach to NLP. He then presented hakia's technology in a fair amount of detail, including walking through examples of worse sense disambiguation using context. I'm not convinced that semantics trump statistics, but I thoroughly enjoyed the presentation, and was intrigued enough to want to learn more. I find the company refreshingly open about its technology (not to mention that their beta is public), and I hope it works well enough to be practical.

Still, I'm not convinced the NLP is either the right answer or the right question. I'm no expert on the history of language, but it's clear that natural languages are hardly optimal means of communication, even among human beings. Rather, they are artifacts of our satisficing and resisting change. Since we are lucky enough to not have developed expectations that people can communicate with computers using natural language (HAL and Star Trek notwithstanding), why take a step backwards now? Rather than advocating for inefficient, unreliable communication mechanisms like natural language, we should be figuring out ways to make communication more efficient.

To use an analogy, there's a reason that programming languages have strict rules, and that compilers output errors rather than just trying to guess what you mean. The mild inconvenience upstream is a small cost, compared to the downstream benefits of unambiguous communication. I'm not suggesting that people start speaking in formal languages. But I do feel we should strive for a dialog-oriented approach where both the human and the computer have confidence in their mutual understanding. I can't resist a plug for HCIR.

8 comments:

Anonymous said...

Daniel - Nice to see someone having the same reaction as I: "I see no evidence that they have produced better results than keyword search"

I have to admit, I was trying out all of Kaplan's demo examples on both Powerset & Google during his talk. I didn't find a single one that I would say Powerset more effectively handled than Google. Keep in mind these were hand-picked examples to show the power of NL search.

I will say that Wikipedia has changed the game for both Powerset and Google. For Powerset, it provides a mostly well formed and factual resource for their NLP. For Google, by either design or chance, its in the top 10 results for most queries. If the topic of the Wikipedia page is right and finding an answer to your question in that page isn't too tough, this effectively circumvents the need for any real NLP.

Daniel Tunkelang said...

Slides from the hakia talk are available on the Semantic Web NYC web site.

Anonymous said...

Thoroughly interesting read. Landed while reading up on powerset and left reading the hakia slides. illuminating!
-des
http://techwatch.reviewk.com/

Daniel Tunkelang said...

Des, thanks! To be clear, I may be harsher on Powerset because of my greater familiarity with it (through the private beta) and because of the volume of hype. I'm broadly skeptical of natural language search, and even hakia's excellent presentation did not convert me.

stefanoq said...

Reminds me of the handwriting example. What's easier: getting a machine to recognize script or teaching a human to write legibly for mechanical readers? I find your perspective refreshing and stimulating, Daniel.

Daniel Tunkelang said...

Stefano, thanks! To be clear, I'm all for the best and brightest minds trying to solve hard problems. I just would rather see them pursue projects whose practical consequences are at least proportional to the difficulty of the obstacles that have to be overcome.

Elder said...

I, too, believe strongly in dialogic interfaces and so I find your position on NLP distinctly odd. It may be the case that there are drawbacks for humans to communicate in Natural languages -- I for one have very little idea how else a mother would communicate with her infant to introduce the child to the world, but, heh -- but I guess I will fall back on the old saw about democracy: it may be flawed, but it's the best we've got. So, if we are going to dialog with computers, dialoging in natural languages with all the bells and whistles --anaphora, ellipsis, push/pop to from embedded segments, entrainment to the other in terms of lexis, syntax, the lot-- is probably the "most efficient" way to go about it. All that leads to NLP as the enabling technology. Not the NLP that at the moment is deployed in Powerset or Hakia but the NLP that is supported by these technologies, NLP that is being developed at a thousand sites around the world and presented at the ACL and other confierences. Primitive "I know what you want and I'll anticipate it with stilted, pre-canned dialog" just isn't the interface of the future -- not even for information retrieval applicatons. "Sorry, Charlie." as the tuna used to say.

Daniel Tunkelang said...

Elder, I've read my share of ACL papers, and I have no doubt that the computational linguistics community is doing great work. But even you refer to "NLP that is being developed" to achieve natural language dialog with computers. In other words, we're certainly not there, and not for lack of trying. You seem sure that we will get there if we keep trying. Maybe. I'm just not as convinced, for the reasons I've described. But the kind of dialog I am advocating is hardly pre-canned. Rather, I'd like to see us preserve the flexibility of natural language while exorcising its ambiguity. Perhaps that's as aspirational as the quest for natural language dialog, but I think we should draw insight from the successful track record of formal languages for human-computer interaction (e.g., programming languages), as compared to that of natural language.

Friday, April 18, 2008

The Search for Meaning

By a fortuitous coincidence, I had the opportunity to see two consecutive presentations from search engine companies banking on natural language processing (NLP) to power the next generation of search. The first was from Ron Kaplan, Chief Technology and Science Officer of Powerset, who presented at Columbia University. The second was from Christian Hempelmann, Chief Scientific Officer of hakia, who presented at New York Semantic Web Meetup.

The Powerset talk was entitled "Deep natural language processing for web-scale indexing and retrieval." Jon Elsas, who attended the same talk earlier this week at CMU, did an excellent job summarizing it on his blog. I'll simply express my reaction: I don't get it. I have no reason to doubt that their NLP pipeline is best-in-class. The team has impressive credentials. But I see no evidence that they have produced better results than keyword search. After participating in their private beta for several months, I'd hoped that the presentation would help me see what I'd missed. I specifically asked Ron what measures they used to evaluate their system, and he was mum. So now I am more unconvinced that ever, though, to steal a line from a colleague, I cannot reconcile their enthusiasm with their results.

The hakia talk was entitled "Search for Meaning." Christian started by making the case for a semantic, rather than statistical approach to NLP. He then presented hakia's technology in a fair amount of detail, including walking through examples of worse sense disambiguation using context. I'm not convinced that semantics trump statistics, but I thoroughly enjoyed the presentation, and was intrigued enough to want to learn more. I find the company refreshingly open about its technology (not to mention that their beta is public), and I hope it works well enough to be practical.

Still, I'm not convinced the NLP is either the right answer or the right question. I'm no expert on the history of language, but it's clear that natural languages are hardly optimal means of communication, even among human beings. Rather, they are artifacts of our satisficing and resisting change. Since we are lucky enough to not have developed expectations that people can communicate with computers using natural language (HAL and Star Trek notwithstanding), why take a step backwards now? Rather than advocating for inefficient, unreliable communication mechanisms like natural language, we should be figuring out ways to make communication more efficient.

To use an analogy, there's a reason that programming languages have strict rules, and that compilers output errors rather than just trying to guess what you mean. The mild inconvenience upstream is a small cost, compared to the downstream benefits of unambiguous communication. I'm not suggesting that people start speaking in formal languages. But I do feel we should strive for a dialog-oriented approach where both the human and the computer have confidence in their mutual understanding. I can't resist a plug for HCIR.

8 comments:

Anonymous said...

Daniel - Nice to see someone having the same reaction as I: "I see no evidence that they have produced better results than keyword search"

I have to admit, I was trying out all of Kaplan's demo examples on both Powerset & Google during his talk. I didn't find a single one that I would say Powerset more effectively handled than Google. Keep in mind these were hand-picked examples to show the power of NL search.

I will say that Wikipedia has changed the game for both Powerset and Google. For Powerset, it provides a mostly well formed and factual resource for their NLP. For Google, by either design or chance, its in the top 10 results for most queries. If the topic of the Wikipedia page is right and finding an answer to your question in that page isn't too tough, this effectively circumvents the need for any real NLP.

Daniel Tunkelang said...

Slides from the hakia talk are available on the Semantic Web NYC web site.

Anonymous said...

Thoroughly interesting read. Landed while reading up on powerset and left reading the hakia slides. illuminating!
-des
http://techwatch.reviewk.com/

Daniel Tunkelang said...

Des, thanks! To be clear, I may be harsher on Powerset because of my greater familiarity with it (through the private beta) and because of the volume of hype. I'm broadly skeptical of natural language search, and even hakia's excellent presentation did not convert me.

stefanoq said...

Reminds me of the handwriting example. What's easier: getting a machine to recognize script or teaching a human to write legibly for mechanical readers? I find your perspective refreshing and stimulating, Daniel.

Daniel Tunkelang said...

Stefano, thanks! To be clear, I'm all for the best and brightest minds trying to solve hard problems. I just would rather see them pursue projects whose practical consequences are at least proportional to the difficulty of the obstacles that have to be overcome.

Elder said...

I, too, believe strongly in dialogic interfaces and so I find your position on NLP distinctly odd. It may be the case that there are drawbacks for humans to communicate in Natural languages -- I for one have very little idea how else a mother would communicate with her infant to introduce the child to the world, but, heh -- but I guess I will fall back on the old saw about democracy: it may be flawed, but it's the best we've got. So, if we are going to dialog with computers, dialoging in natural languages with all the bells and whistles --anaphora, ellipsis, push/pop to from embedded segments, entrainment to the other in terms of lexis, syntax, the lot-- is probably the "most efficient" way to go about it. All that leads to NLP as the enabling technology. Not the NLP that at the moment is deployed in Powerset or Hakia but the NLP that is supported by these technologies, NLP that is being developed at a thousand sites around the world and presented at the ACL and other confierences. Primitive "I know what you want and I'll anticipate it with stilted, pre-canned dialog" just isn't the interface of the future -- not even for information retrieval applicatons. "Sorry, Charlie." as the tuna used to say.

Daniel Tunkelang said...

Elder, I've read my share of ACL papers, and I have no doubt that the computational linguistics community is doing great work. But even you refer to "NLP that is being developed" to achieve natural language dialog with computers. In other words, we're certainly not there, and not for lack of trying. You seem sure that we will get there if we keep trying. Maybe. I'm just not as convinced, for the reasons I've described. But the kind of dialog I am advocating is hardly pre-canned. Rather, I'd like to see us preserve the flexibility of natural language while exorcising its ambiguity. Perhaps that's as aspirational as the quest for natural language dialog, but I think we should draw insight from the successful track record of formal languages for human-computer interaction (e.g., programming languages), as compared to that of natural language.

Friday, April 18, 2008

The Search for Meaning

By a fortuitous coincidence, I had the opportunity to see two consecutive presentations from search engine companies banking on natural language processing (NLP) to power the next generation of search. The first was from Ron Kaplan, Chief Technology and Science Officer of Powerset, who presented at Columbia University. The second was from Christian Hempelmann, Chief Scientific Officer of hakia, who presented at New York Semantic Web Meetup.

The Powerset talk was entitled "Deep natural language processing for web-scale indexing and retrieval." Jon Elsas, who attended the same talk earlier this week at CMU, did an excellent job summarizing it on his blog. I'll simply express my reaction: I don't get it. I have no reason to doubt that their NLP pipeline is best-in-class. The team has impressive credentials. But I see no evidence that they have produced better results than keyword search. After participating in their private beta for several months, I'd hoped that the presentation would help me see what I'd missed. I specifically asked Ron what measures they used to evaluate their system, and he was mum. So now I am more unconvinced that ever, though, to steal a line from a colleague, I cannot reconcile their enthusiasm with their results.

The hakia talk was entitled "Search for Meaning." Christian started by making the case for a semantic, rather than statistical approach to NLP. He then presented hakia's technology in a fair amount of detail, including walking through examples of worse sense disambiguation using context. I'm not convinced that semantics trump statistics, but I thoroughly enjoyed the presentation, and was intrigued enough to want to learn more. I find the company refreshingly open about its technology (not to mention that their beta is public), and I hope it works well enough to be practical.

Still, I'm not convinced the NLP is either the right answer or the right question. I'm no expert on the history of language, but it's clear that natural languages are hardly optimal means of communication, even among human beings. Rather, they are artifacts of our satisficing and resisting change. Since we are lucky enough to not have developed expectations that people can communicate with computers using natural language (HAL and Star Trek notwithstanding), why take a step backwards now? Rather than advocating for inefficient, unreliable communication mechanisms like natural language, we should be figuring out ways to make communication more efficient.

To use an analogy, there's a reason that programming languages have strict rules, and that compilers output errors rather than just trying to guess what you mean. The mild inconvenience upstream is a small cost, compared to the downstream benefits of unambiguous communication. I'm not suggesting that people start speaking in formal languages. But I do feel we should strive for a dialog-oriented approach where both the human and the computer have confidence in their mutual understanding. I can't resist a plug for HCIR.

8 comments:

Anonymous said...

Daniel - Nice to see someone having the same reaction as I: "I see no evidence that they have produced better results than keyword search"

I have to admit, I was trying out all of Kaplan's demo examples on both Powerset & Google during his talk. I didn't find a single one that I would say Powerset more effectively handled than Google. Keep in mind these were hand-picked examples to show the power of NL search.

I will say that Wikipedia has changed the game for both Powerset and Google. For Powerset, it provides a mostly well formed and factual resource for their NLP. For Google, by either design or chance, its in the top 10 results for most queries. If the topic of the Wikipedia page is right and finding an answer to your question in that page isn't too tough, this effectively circumvents the need for any real NLP.

Daniel Tunkelang said...

Slides from the hakia talk are available on the Semantic Web NYC web site.

Anonymous said...

Thoroughly interesting read. Landed while reading up on powerset and left reading the hakia slides. illuminating!
-des
http://techwatch.reviewk.com/

Daniel Tunkelang said...

Des, thanks! To be clear, I may be harsher on Powerset because of my greater familiarity with it (through the private beta) and because of the volume of hype. I'm broadly skeptical of natural language search, and even hakia's excellent presentation did not convert me.

stefanoq said...

Reminds me of the handwriting example. What's easier: getting a machine to recognize script or teaching a human to write legibly for mechanical readers? I find your perspective refreshing and stimulating, Daniel.

Daniel Tunkelang said...

Stefano, thanks! To be clear, I'm all for the best and brightest minds trying to solve hard problems. I just would rather see them pursue projects whose practical consequences are at least proportional to the difficulty of the obstacles that have to be overcome.

Elder said...

I, too, believe strongly in dialogic interfaces and so I find your position on NLP distinctly odd. It may be the case that there are drawbacks for humans to communicate in Natural languages -- I for one have very little idea how else a mother would communicate with her infant to introduce the child to the world, but, heh -- but I guess I will fall back on the old saw about democracy: it may be flawed, but it's the best we've got. So, if we are going to dialog with computers, dialoging in natural languages with all the bells and whistles --anaphora, ellipsis, push/pop to from embedded segments, entrainment to the other in terms of lexis, syntax, the lot-- is probably the "most efficient" way to go about it. All that leads to NLP as the enabling technology. Not the NLP that at the moment is deployed in Powerset or Hakia but the NLP that is supported by these technologies, NLP that is being developed at a thousand sites around the world and presented at the ACL and other confierences. Primitive "I know what you want and I'll anticipate it with stilted, pre-canned dialog" just isn't the interface of the future -- not even for information retrieval applicatons. "Sorry, Charlie." as the tuna used to say.

Daniel Tunkelang said...

Elder, I've read my share of ACL papers, and I have no doubt that the computational linguistics community is doing great work. But even you refer to "NLP that is being developed" to achieve natural language dialog with computers. In other words, we're certainly not there, and not for lack of trying. You seem sure that we will get there if we keep trying. Maybe. I'm just not as convinced, for the reasons I've described. But the kind of dialog I am advocating is hardly pre-canned. Rather, I'd like to see us preserve the flexibility of natural language while exorcising its ambiguity. Perhaps that's as aspirational as the quest for natural language dialog, but I think we should draw insight from the successful track record of formal languages for human-computer interaction (e.g., programming languages), as compared to that of natural language.