Wikipedia - Article Draft Quality

1 minute read

Published:

I’m quite excited to write about my most recent volunteer work on , “Automatic classification of “Article Draft Quality that I’m doing in collaboration with Wikimedia Foundation’s AI team that helps develope Machine Learning models to counter vandalism on Wikipedia and research on improving editor’s and reader’s expereince. Well, too much of interesting things to turn around! For any technical details and update, refer to the wiki page

Background

Wikipedia is a very large collaborative encyclopedia and a very strict policy is followed for each and every edit of its articles. As a result, a team of volunteers work relentlessly for oversight and control the quality through collaboration. An important aspect of this quality control is page deletion when deemed necessary and deletion has a strict set of guidelines.

Apart from systematic deletion, Wikipedia also has a criteria for speedy deletion where administrators have broad consensus to bypass deletion discussion at their discretion and immediately delete Wikipedia Pages. The page on speedy deletion defines comprehensively the aspects under which these have to be carried out, but few are noteworthy:

  • G10: Pages that disparage, threaten, intimidate, or harass their subject or some other entity, and serve no other purpose
  • G11: Unambiguous advertising or promotion
  • A7: No indication of importance (people, animals, organizations, web content, events)

Its interesting to note that these are the properties which are most frequently cited when deleting pages under the category “vandalism”, “spam”, and “attack”

The Problem

Since administrators have an overriding authority in deleting articles under the above mentioned clauses, it often happens that drafts created by new editors get deleted because of their inexperience regarding Wikipedia standards and hence Wikipedia loses such editors out of sheer demotivation.

The Solution

The project Automatic Classification of Article Draft Quality addresses precisely this aspect of article deletion by providing a Machine Learning model to suggest draft categories to administrators so that both the work of administrators is reduced and new editors get some time to improve their work.

I’m quite excited to be part of such a project! More updates here, or on the wiki.