Monday, June 25, 2007

Overrated Semantic Search

Several IT news outlets have been fawning over Xerox's new semantic-based search engine, which I've covered before. The general idea behind the technology is to analyze linguistic structures in order to improve search results.

Xerox plans to use this technology in legal software to enable "e-discovery" by sifting through massive amounts of documents, searching for information relevant to a case. Perhaps this will lead to the second instance of software being sued for practicing law without a license.

This is all well and good, and a natural progression for the science of search. Not all that dramatic an improvement as the articles would lead us to believe, buy hey, you gotta sell papers, right? However, it aggravates me when the media makes a factual error while covering a topic I’m familiar with...
For example, common searches using keywords "Lincoln" and "vice president" likely won't reveal President Abraham Lincoln's first vice president. A semantic search should yield the answer: Hannibal Hamlin.

Except a Google search for “lincoln’s first vice president” provides the correct result, as does running “Who was Lincoln’s first vice president?” through my quite unsophisticated Answer Machine. While I can’t fault the reporter for overlooking my humble research, shouldn’t they be capable of running a simple Google query? Wouldn’t this fall under the category of “thorough fact checking?” Shouldn’t they run their “facts” through a subject matter expert before publishing them? And I mean an actual expert, not a PR staffer from the company at hand. Unfortunately, more and more tech articles in the media have regressed to little more than paraphrasing press releases.

Furthermore, if I notice obvious mistakes regarding topics I know a little something about, what other incorrect information am I obliviously consuming? And the media wonders why we don’t trust them anymore...

No comments: