Paper abstract: Available document collections are more and more required for supervised text categorization tasks.
They are typically collections of documents classified by domain engineers. In this paper, we propose a
semantic text categorization approach able to automatically create document collections in which
documents are classified according to WordNet Domains taxonomy. Experiments have been performed
by training a classifier with an automatic document collection and comparing results with those obtained by training the same classifier with a document collection classified by domain engineers. Experimental
results point out that, on average, the performances of the automatic approach are quite similar to those
obtained on a document collection classified by hand.
Keywords:
Text Categorization, Document Collections, Intelligent Software Systems, Machine Learning.