Last night I did some work on the Answer Machine, and today I'm working on adding topic detection to AutoSummary.
Check out the entry in the Answer Machine project log for details about its new functionality. As for AutoSummary, I came up with a good idea about how to implement topic detection within the current framework of the program. I was checking out this post at the Search Science blog that I read, and thought "I can do that."
What I plan on doing is after determining the likely sense of a given word, I'll build a list of all of the possible topics that word is connected to (using the WordNet domain function). From that I'll be able to find the topic of a sentence by taking the best intersection of all the anchor words (not "the" or "and" etc).
You can probably see how it will scale from there, building intersections of sentences into paragraphs, and paragraphs into entire texts. So very shortly I just might be able to take an fully body of text and determine just what the heck it is all about. Could be an interesting step forward in search relevance and data mining...