Chuong B. Do, Andrew Y. Ng
Linear text classiﬁcation algorithms work by computing an inner prod- uct between a test document vector and a parameter vector. In many such algorithms, including naive Bayes and most TFIDF variants, the parame- ters are determined by some simple, closed-form, function of training set statistics; we call this mapping mapping from statistics to parameters, the parameter function. Much research in text classiﬁcation over the last few decades has consisted of manual efforts to identify better parameter func- tions. In this paper, we propose an algorithm for automatically learning this function from related classiﬁcation problems. The parameter func- tion found by our algorithm then deﬁnes a new learning algorithm for text classiﬁcation, which we can apply to novel classiﬁcation tasks. We ﬁnd that our learned classiﬁer outperforms existing methods on a variety of multiclass text classiﬁcation tasks.