Vandalism Detection in Wikipedia

If you have to develop a classifier for detecting vandalism in Wikipedia with just a small number of features, what kind of features give the best results? According to our latest work on vandalism detection in Wikipedia, to be presented at WikiSym 2011, the best features are the ones pertaining to user behavior within the system — things like the deletion of other users’ content, the survivability of the user’s additions, number of words deleted by a user, whether the user has a page on Wikipedia or not, etc. Other kinds of features such as textual and language model features are routinely used in email spam filters, but it turns out that these don’t do as well as the user behavior features. That’s right, the user behavior within these systems contains a very strong signal for detecting what the users are capable of doing in the future, and therefore can detect vandalism fairly well, especially the more subtle kinds of vandalism. I’ve been wanting to write an overview of this work for a long time, finally here it is. For all the details, read the paper.

The ultimate goal of this line of work in my group is to get a better understanding of the social dynamics that emerged on the Web with user-generated content sites — wikis, social networks, virtual worlds, etc. Unlike interactions in the real world, these sites collect an enormous amount of data regarding the interactions that people have with each other and with the content. As such, one can now feed that data into computing machinery and hope to gain insights on social dynamics [on the Web, at least, but maybe beyond]. Wikipedia happens to be a wonderful playground for that line of investigation, because all the data is available.

Vandalism is a very strong word but with a clear definition within Wikipedia. Wikipidians have strict policies and guidelines for editing articles, and they spend a lot of time fighting editorial “crimes.” They have some bots that do basic housekeeping; some of those bots [1] trigger alarms and/or revert edits in obvious cases of vandalism. But not all vandalism is obvious. Improving automatic vandalism detection is therefore a goal that many people set out to accomplish — including one of my former students, Sara, who just graduated last month and is now working for Microsoft. There is a research community around this topic that curates data, organizes workshops and puts up competitions from time to time — that’s PAN. Sara participated in this community, and won 3rd price in the PAN competition in 2010.

The paper I’m focusing on here is the culmination of Sara’s work. Here is what we did, in a nutshell: first, we collected a relatively large number of features that had been known to be of value for vandalism detection in Wikipedia — a total of 66 features. Most of these features had been proposed and tested by other people, others were our own. These features all fell nicely into four groups: user features, textual features, metadata features and language model features. Then we trained a random forest classifier with all those 66 features using the PAN corpus train set. Finally we run that classifier on the PAN corpus test set. With such a feature-rich model we were able to achieveĀ  the highest performance ever reported for that corpus, an AUC of 0.9553 — the previous record was 0.9218.

However, feature-rich models aren’t very practical; they are slow to compute. For all practical usages of ML, especially the ones that run online, we need models with few and cheap features. Machine learning approaches tend to suffer from this problem: we throw a large number of intuitions about what matters for classification, and let the machine figure it out, but the machine doesn’t ever tell us what’s really important, what’s not so important, or how the features correlate. So we need to do extra work in order to find that out. Here’s what we did.

In order to detect and eliminate redundant features, we performed two sets of experiments. First, we studied the contribution of each of the 4 groups of featuresĀ to determine if any of those groups could be eliminated without a significant drop in AUC. Then we studied the contribution of each feature individually and used the results for eliminating redundant features, using a technique called Lasso (Least Absolute Shrinkage and Selection Operator).

The first set of experiments told us that the User features were the most important group — by a lot. With the User features alone, we obtained an AUC of 0.9225. In the second set of experiments, we were able to reduce the model to 28 features (down from 66) and still obtain an AUC of 0.9505. These 28 features include features from all groups, but the User features have a strong presence.

And there you have it, this is how we answered the question “If you have to develop a classifier for detecting vandalism in Wikipedia with just a small number of features, what kind of features do best?” But I think the result we obtained is more interesting that its practical application on vandalism detection in Wikipedia. What the result suggests is that there are very strong signals associated with the users’ actions within a system, i.e. who the user is, as given by the sequence of actions of that user within the system. It’s not just that someone added text; it’s who that person is. This gets at the concept of reputation, but goes at it from an implicit, within-the-system perspective, rather than with an explicit thumbs-up-thumbs-down kind of approach. It suggests that it is possible to automatically build extremely accurate models of users’ reputations without explicit endorsements from other users.

This work was a collaboration with David McDonald, and it was supported by the National Science Foundation under grant No. OCI-074806.

This entry was posted in research, social software systems and tagged , , . Bookmark the permalink.