Saturday, April 12, 2008

Can Search be a Utility?

A recent lecture at the New York CTO club inspired a heated discussion on what is wrong with enterprise search solutions. Specifically, Jon Williams asked why search can't be a utility.

It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.

On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.

Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).

While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.

On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.

In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.

It seems we can go in two directions.

The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.

The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.

7 comments:

FD said...

This will be all over the place.

I think it's questionable to claim that web search currently views relevance as objective. By virtue of increasingly finer-grained user and query modeling, relevance is becoming more subjective. These methods are not yet at the sophistication of models in the library/information science community but are a move in that direction.

That said, IR existed long before web search and I think it's dangerous to think situations different from web search are new situations for IR. Seemingly new tasks and perspectives have often been quite well-studied in the IR and information/library science literature. I think the appropriate question is: "does enterprise search have an analog in search for other corpora?" If not, what _precisely_ are the differences. The paragraph "In the enterprise, however...addresses our needs" is a little vague and makes enterprise search sound like QA.

In conclusion, I agree with your general claim that search cannot be a utility. IR consists of a set of design principles for solving search problems. The appropriateness of a technique is very dependent on the search scenario and it is the role of the IR expert to make these decisions based on experience and published results. If I need a bridge from San Francisco to Oakland, I won't plop a down the Ben Franklin bridge. But I may call the engineer who built it.

David Weinberger said...

Don't we want both of the alternatives you end with? We want to be able to search by objective, controlled metadata such as department and date, and we want to search using far fuzzier semantics ("What do you know about Japanese business-card protocol?"). Both are ways we find relevant info, in the ordinary English meaning of "relevant." (In that meaning, relevance is always relevant _to_ interests that usually aren't expressed in the query itself, which makes the objective/subjective dichotomy slippery.) Or have I missed your point?

The difference between Web and enterprise search would then be that because the enterprise is a (semi-)closed system, the metadata is more predictable and reliable, and the interests can be assumed with greater confidence.

So, don't enterprises really want both of the alternatives? (This can be taken as a specification of the more general principle "People want everything," which itself is a specification of the most general principle "Those damn people!" :)

Daniel Tunkelang said...

Perhaps "objective" vs. "subjective" isn't the right dichotomy. There has been lots of progress on user modeling to personalize ranking. Nonetheless, I'd rather the system be less clever and more predictable by giving me more control over the experience. Work by Koenemann and Belkin suggests that others share my view.

And, speaking of imperfect dichotomies, I concede that I'm oversimplifying the distinction between web search and enterprise search. Let me try to add some nuance.

I don't have statistics handy, but I believe that a majority of web search queries are either navigational queries best answered by a home page or popular informational queries best answered by a Wikipedia page. At least for these queries, there is near-universal agreement on what constitutes the best result. Moreover, the nature of the web makes it possible to quantify this user-independent relevance reasonably well.

How is enterprise search different? Part of the problem with characterizing enterprise search is that there really is no single characterization. There are known-item searches best answered by the enterprise analogs of home pages and Wikipedia pages. But the highest-value information needs in an enterprise are not known-item searches. Rather, they are scenarios where of searching for information to solve a problem without even the certainty that the information is available, or that the problem is framed correctly.

As for searching using metadata vs. free text, that's a different--though very related--issue. At the risk of another oversimplification, I'd say that free text suffices for most known-item information needs, while metadata is essential for most exploratory needs. And I agree that enterprises often benefit from more predictable and reliable metadata than the web at large.

But my point here is not to try to articulate the differences between web data and enterprise data. Rather, I'm asserting a general difference between the typical expectations / needs of web search and enterprise search users.

Omar Alonso said...

Enterprise search is tough because the expectations in terms of relevance are difficult to achieve, in my opinion.

As somebody who has worked on both camps (enterprise and internet search), the typical user expects a Google-like quality inside the company given his/her experience on the Web.

The problem is, it is unlikely that your business information need is looking for a good hotel review for your next vacations. It is likely that is about a potential customer or competitor. And the answer is not a fact. Involves digesting and summarizing different things. Kind of sensemaking & exploratory search.

So maybe, one way to tackle the problem for enterprise search is to not use the Web search as a model, but something different.

fd said...

Daniel, I think you brought up an important point. An IR scenario is not defined by a corpus alone; that would be boring. Rather, an IR scenario is defined by a corpus and the users entering the system. The diversity of needs and search types suggest one place where we can distinguish IR tasks. If the web were only known-item searches, we would have been "done" a long time ago. The tail's pretty long and I-ahem-suspect that's where a lot of current web research is going on.

FWIW, I did not understand "Moreover, the nature of the web makes it possible to quantify this user-independent relevance reasonably well.".

Daniel Tunkelang said...

I did find some statistics to justify my belief about the distribution of web search queries.

According to a study published by Andrei Broder in 2002, navigational queries represent 26.4% of web search queries.

According to a 2007 study by Jean Véronis, 27% of Google searches and 31% of Yahoo searches return a Wikipedia result at their first link.

Daniel Tunkelang said...

As for my statement about the nature of the web making it possible to quantify this user-independent relevance reasonably well, I mean two things.

1) There is sufficient agreement about relevance as applied to typical web searches that it is at least approximately true to talk about user-independent relevance.

2) The success of Google, Yahoo, and Microsoft at returning results that satisfy users show that this relevance function can be codified (even if each of these companies guards its formula as carefully as the Coca-Cola formula.

Saturday, April 12, 2008

Can Search be a Utility?

A recent lecture at the New York CTO club inspired a heated discussion on what is wrong with enterprise search solutions. Specifically, Jon Williams asked why search can't be a utility.

It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.

On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.

Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).

While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.

On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.

In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.

It seems we can go in two directions.

The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.

The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.

7 comments:

FD said...

This will be all over the place.

I think it's questionable to claim that web search currently views relevance as objective. By virtue of increasingly finer-grained user and query modeling, relevance is becoming more subjective. These methods are not yet at the sophistication of models in the library/information science community but are a move in that direction.

That said, IR existed long before web search and I think it's dangerous to think situations different from web search are new situations for IR. Seemingly new tasks and perspectives have often been quite well-studied in the IR and information/library science literature. I think the appropriate question is: "does enterprise search have an analog in search for other corpora?" If not, what _precisely_ are the differences. The paragraph "In the enterprise, however...addresses our needs" is a little vague and makes enterprise search sound like QA.

In conclusion, I agree with your general claim that search cannot be a utility. IR consists of a set of design principles for solving search problems. The appropriateness of a technique is very dependent on the search scenario and it is the role of the IR expert to make these decisions based on experience and published results. If I need a bridge from San Francisco to Oakland, I won't plop a down the Ben Franklin bridge. But I may call the engineer who built it.

David Weinberger said...

Don't we want both of the alternatives you end with? We want to be able to search by objective, controlled metadata such as department and date, and we want to search using far fuzzier semantics ("What do you know about Japanese business-card protocol?"). Both are ways we find relevant info, in the ordinary English meaning of "relevant." (In that meaning, relevance is always relevant _to_ interests that usually aren't expressed in the query itself, which makes the objective/subjective dichotomy slippery.) Or have I missed your point?

The difference between Web and enterprise search would then be that because the enterprise is a (semi-)closed system, the metadata is more predictable and reliable, and the interests can be assumed with greater confidence.

So, don't enterprises really want both of the alternatives? (This can be taken as a specification of the more general principle "People want everything," which itself is a specification of the most general principle "Those damn people!" :)

Daniel Tunkelang said...

Perhaps "objective" vs. "subjective" isn't the right dichotomy. There has been lots of progress on user modeling to personalize ranking. Nonetheless, I'd rather the system be less clever and more predictable by giving me more control over the experience. Work by Koenemann and Belkin suggests that others share my view.

And, speaking of imperfect dichotomies, I concede that I'm oversimplifying the distinction between web search and enterprise search. Let me try to add some nuance.

I don't have statistics handy, but I believe that a majority of web search queries are either navigational queries best answered by a home page or popular informational queries best answered by a Wikipedia page. At least for these queries, there is near-universal agreement on what constitutes the best result. Moreover, the nature of the web makes it possible to quantify this user-independent relevance reasonably well.

How is enterprise search different? Part of the problem with characterizing enterprise search is that there really is no single characterization. There are known-item searches best answered by the enterprise analogs of home pages and Wikipedia pages. But the highest-value information needs in an enterprise are not known-item searches. Rather, they are scenarios where of searching for information to solve a problem without even the certainty that the information is available, or that the problem is framed correctly.

As for searching using metadata vs. free text, that's a different--though very related--issue. At the risk of another oversimplification, I'd say that free text suffices for most known-item information needs, while metadata is essential for most exploratory needs. And I agree that enterprises often benefit from more predictable and reliable metadata than the web at large.

But my point here is not to try to articulate the differences between web data and enterprise data. Rather, I'm asserting a general difference between the typical expectations / needs of web search and enterprise search users.

Omar Alonso said...

Enterprise search is tough because the expectations in terms of relevance are difficult to achieve, in my opinion.

As somebody who has worked on both camps (enterprise and internet search), the typical user expects a Google-like quality inside the company given his/her experience on the Web.

The problem is, it is unlikely that your business information need is looking for a good hotel review for your next vacations. It is likely that is about a potential customer or competitor. And the answer is not a fact. Involves digesting and summarizing different things. Kind of sensemaking & exploratory search.

So maybe, one way to tackle the problem for enterprise search is to not use the Web search as a model, but something different.

fd said...

Daniel, I think you brought up an important point. An IR scenario is not defined by a corpus alone; that would be boring. Rather, an IR scenario is defined by a corpus and the users entering the system. The diversity of needs and search types suggest one place where we can distinguish IR tasks. If the web were only known-item searches, we would have been "done" a long time ago. The tail's pretty long and I-ahem-suspect that's where a lot of current web research is going on.

FWIW, I did not understand "Moreover, the nature of the web makes it possible to quantify this user-independent relevance reasonably well.".

Daniel Tunkelang said...

I did find some statistics to justify my belief about the distribution of web search queries.

According to a study published by Andrei Broder in 2002, navigational queries represent 26.4% of web search queries.

According to a 2007 study by Jean Véronis, 27% of Google searches and 31% of Yahoo searches return a Wikipedia result at their first link.

Daniel Tunkelang said...

As for my statement about the nature of the web making it possible to quantify this user-independent relevance reasonably well, I mean two things.

1) There is sufficient agreement about relevance as applied to typical web searches that it is at least approximately true to talk about user-independent relevance.

2) The success of Google, Yahoo, and Microsoft at returning results that satisfy users show that this relevance function can be codified (even if each of these companies guards its formula as carefully as the Coca-Cola formula.

Saturday, April 12, 2008

Can Search be a Utility?

A recent lecture at the New York CTO club inspired a heated discussion on what is wrong with enterprise search solutions. Specifically, Jon Williams asked why search can't be a utility.

It's unfortunate when such a simple question calls for a complicated answer, but I'll try to tackle it.

On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don't vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.

Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and "it didn't work." (Full disclosure: Google competes with Endeca in the enterprise).

While the GSA does have some significant technical limitations, I don't think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.

On the web, we get away with pretending that relevance is objective because there is so much agreement among users--particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.

In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.

It seems we can go in two directions.

The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.

The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch--that you can't just plug in a box and expect it to solve an enterprise's knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.

7 comments:

FD said...

This will be all over the place.

I think it's questionable to claim that web search currently views relevance as objective. By virtue of increasingly finer-grained user and query modeling, relevance is becoming more subjective. These methods are not yet at the sophistication of models in the library/information science community but are a move in that direction.

That said, IR existed long before web search and I think it's dangerous to think situations different from web search are new situations for IR. Seemingly new tasks and perspectives have often been quite well-studied in the IR and information/library science literature. I think the appropriate question is: "does enterprise search have an analog in search for other corpora?" If not, what _precisely_ are the differences. The paragraph "In the enterprise, however...addresses our needs" is a little vague and makes enterprise search sound like QA.

In conclusion, I agree with your general claim that search cannot be a utility. IR consists of a set of design principles for solving search problems. The appropriateness of a technique is very dependent on the search scenario and it is the role of the IR expert to make these decisions based on experience and published results. If I need a bridge from San Francisco to Oakland, I won't plop a down the Ben Franklin bridge. But I may call the engineer who built it.

David Weinberger said...

Don't we want both of the alternatives you end with? We want to be able to search by objective, controlled metadata such as department and date, and we want to search using far fuzzier semantics ("What do you know about Japanese business-card protocol?"). Both are ways we find relevant info, in the ordinary English meaning of "relevant." (In that meaning, relevance is always relevant _to_ interests that usually aren't expressed in the query itself, which makes the objective/subjective dichotomy slippery.) Or have I missed your point?

The difference between Web and enterprise search would then be that because the enterprise is a (semi-)closed system, the metadata is more predictable and reliable, and the interests can be assumed with greater confidence.

So, don't enterprises really want both of the alternatives? (This can be taken as a specification of the more general principle "People want everything," which itself is a specification of the most general principle "Those damn people!" :)

Daniel Tunkelang said...

Perhaps "objective" vs. "subjective" isn't the right dichotomy. There has been lots of progress on user modeling to personalize ranking. Nonetheless, I'd rather the system be less clever and more predictable by giving me more control over the experience. Work by Koenemann and Belkin suggests that others share my view.

And, speaking of imperfect dichotomies, I concede that I'm oversimplifying the distinction between web search and enterprise search. Let me try to add some nuance.

I don't have statistics handy, but I believe that a majority of web search queries are either navigational queries best answered by a home page or popular informational queries best answered by a Wikipedia page. At least for these queries, there is near-universal agreement on what constitutes the best result. Moreover, the nature of the web makes it possible to quantify this user-independent relevance reasonably well.

How is enterprise search different? Part of the problem with characterizing enterprise search is that there really is no single characterization. There are known-item searches best answered by the enterprise analogs of home pages and Wikipedia pages. But the highest-value information needs in an enterprise are not known-item searches. Rather, they are scenarios where of searching for information to solve a problem without even the certainty that the information is available, or that the problem is framed correctly.

As for searching using metadata vs. free text, that's a different--though very related--issue. At the risk of another oversimplification, I'd say that free text suffices for most known-item information needs, while metadata is essential for most exploratory needs. And I agree that enterprises often benefit from more predictable and reliable metadata than the web at large.

But my point here is not to try to articulate the differences between web data and enterprise data. Rather, I'm asserting a general difference between the typical expectations / needs of web search and enterprise search users.

Omar Alonso said...

Enterprise search is tough because the expectations in terms of relevance are difficult to achieve, in my opinion.

As somebody who has worked on both camps (enterprise and internet search), the typical user expects a Google-like quality inside the company given his/her experience on the Web.

The problem is, it is unlikely that your business information need is looking for a good hotel review for your next vacations. It is likely that is about a potential customer or competitor. And the answer is not a fact. Involves digesting and summarizing different things. Kind of sensemaking & exploratory search.

So maybe, one way to tackle the problem for enterprise search is to not use the Web search as a model, but something different.

fd said...

Daniel, I think you brought up an important point. An IR scenario is not defined by a corpus alone; that would be boring. Rather, an IR scenario is defined by a corpus and the users entering the system. The diversity of needs and search types suggest one place where we can distinguish IR tasks. If the web were only known-item searches, we would have been "done" a long time ago. The tail's pretty long and I-ahem-suspect that's where a lot of current web research is going on.

FWIW, I did not understand "Moreover, the nature of the web makes it possible to quantify this user-independent relevance reasonably well.".

Daniel Tunkelang said...

I did find some statistics to justify my belief about the distribution of web search queries.

According to a study published by Andrei Broder in 2002, navigational queries represent 26.4% of web search queries.

According to a 2007 study by Jean Véronis, 27% of Google searches and 31% of Yahoo searches return a Wikipedia result at their first link.

Daniel Tunkelang said...

As for my statement about the nature of the web making it possible to quantify this user-independent relevance reasonably well, I mean two things.

1) There is sufficient agreement about relevance as applied to typical web searches that it is at least approximately true to talk about user-independent relevance.

2) The success of Google, Yahoo, and Microsoft at returning results that satisfy users show that this relevance function can be codified (even if each of these companies guards its formula as carefully as the Coca-Cola formula.