Free HTML5 by March 24, 2014

Automatic Categorization of Questions from Q&A Sites

Q&A sites are attracting growing interest of software developers. The categorization of questions in terms of user concerns would open new opportunities to extract valuable information from millions of posts.

This paper presents a comparison between different classification algorithms to find the one that best classifies questions from Q&A sites, such as, Stack Overflow. In the classification process, we used the following classification algorithms: Naive Bayes, Multilayer Perceptron, Support Vector Machine, K-Nearest Neighbors, J4.8 Decision Tree and Random Forests.

We conducted an experimental study with Stack Overflow questions with posts equally divided into three domain categories: How-to-do-it, Need-to-know and Seeking-something. The attributes were extracted from a textual analysis of the title and body of each question. We considered a total of 8 attributes to get the data for each question. We found a classifier with an overall success rate of 84.16% and 92.5% on How-to-do-it category.