Tuesday, June 5, 2007

Using NLP to Organize Unstructured Data

Another facet of the information overload problem is trying to get a handle of the volumes of unstructured data created by organizations on a daily basis, and package them into a searchable, manageable package. Some establishment struggle with file plan compliance and enforcement, while others provide tools to index and search documents based on keywords. IBM, on the other hand, is applying NLP techniques to try and solve the problem. OmniFind tackles content classification by scanning varied types of unstructured data, automatically learning and categorizing information into newly-created as well as existing taxonomies. By understanding linguistics, semantics, and context, OmniFind is able to determine connections and make inferences beyond the reach of even the greatest keyword-based search algorithms. Another example of NLP making information easier to find, access, and use.

No comments: