Mr. Lott has been involved in over 70 software development projects in a career that spans 30 years. He has worked in the capacity of internet strategist, software architect, project leader, DBA, programmer. Since 1993 he has been focused on data warehousing and the associated e-business architectures that make the right data available to the right people to support their business decision-making. Steven is a DZone MVB and is not an employee of DZone and has posted 144 posts at DZone. You can read more from them at their website. View Full User Profile

The Users Just Want "Search" -- What's So Hard?

  • submit to reddit
Great article on "Search" from back in '08 in Forbes. "Why Google Isn't Enough", by Dan Woods. He's talking about "Enterprise Search": why in-house Google-style search is really hard and often unsatisfying.

Here's the cool quote.

enterprise search systems also index and navigate information that may reside in databases, content management systems and other structured or semi-structured repositories. The contents may include not only text documents, but also spreadsheets, presentations, XML documents and so on. Even text documents may include some amount of structure, perhaps stored in an XML format.

Everyone thinks (hopes) that the mere presence of data is sufficient. That fact that it's structured doesn't seem to influence their hopes.

The complication is simple -- and harsh. Many enterprise databases are really bad. Really, really epically bad. So bad as to be incomprehensible to a search engine.


How many spreadsheets or reports "stand alone" as tidy, complete, usable documents?

Almost none.

You create a budget for a project. It seems clear enough. Then the project director wants to know if the labor costs are "burdened or unburdened". So the column labeled "cost" has to be further qualified. And "burdened" costs need to be detailed as to which -- exact -- overheads are included.

So a search engine might find your spreadsheet. If a person can't interpret the data, neither can a search engine.

Star Schema Nuance

You can build a clever star schema from source data. But what you find is that your sources have nuanced definitions. Each field isn't directly mappable because it includes one or more subtleties.

Customer name and address. Seems simple enough. But... is that mailing address or shipping address or billing address? Phone number. Seems simple. Fax, Voice, Mobile, Land-line, corporate switch-board, direct? Sigh. So much detail.

Of course the users "just want search".

Sadly, they've created data so subtle and nuanced that they can't have search.
Published at DZone with permission of Steven Lott, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Alois Cochard replied on Wed, 2010/07/07 - 8:07am

Text-mining may be a solution to obtain structured data from unstructured document. It's what we are using internally. For the search engine we are developing.

I don't agree that a 'google style' search engine is useless in entreprise, the fact is that a clustering aproach like the one on google (I mean, images, books, etc...) is useless. But it seems obvious that the clustering of the result must match business needs.

And then the search engine can become the entry point for any document inside the entreprise.

You can't ask people to create document that suit your search engine model... your search engine must be able to understand unstructured content and add structured value to it by doing some analysis (like text-mining for exemple, or analysing meta-data).

Perhaps one day... a semantic entreprise platform would help us in this quest... but I don't think most of today enterprises are ready for this. It's a big move... and it's not even democratized on the web...

Best regards,

Alois Cochard

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.