Submissions/The next million articles in Wikipedia

This is an accepted submission for Wikimania 2015.

Submission no.
3045
Title of the submission
The next million articles in Wikipedia
Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)
presentation
Author of the submission
E-mail address
  • west@cs.stanford.edu
  • ellery@wikimedia.org
  • leila@wikimedia.org
Username
Country of origin
USA
Affiliation, if any (organisation, company etc.)
  • Stanford University
  • Wikimedia Foundation
Personal homepage or blog
Abstract (at least 300 words to describe your proposal)

Wikipedia is the largest encyclopedia in human history and a main source of knowledge for many people, with more than 26 million page views per hour. Though the access to Wikipedia content is quite impressive, the online encyclopedia has a lot of missing content across all languages it supports: There are about 5 million articles in the English Wikipedia, but even this largest of all Wikipedias is far from complete, and incompleteness is even more substantial for smaller language versions.

The goal of this project is to identify missing content in Wikipedia and recommend the best editors to write those articles through a three-step process. First, we use interlanguage links from Wikidata to identify the missing articles in each language version. This step creates many missing-article candidates for each language, in fact too many than all current editors can reasonably tackle. Therefore, the list of missing articles needs to be prioritized. Hence, in the second step we leverage access-volume statistics of existing articles to predict which new articles would be most accessed if they existed in a language version. Since not all editors are equally suited for writing about a given topic, the third and last step is editor selection: given a missing article and Wikipedia's complete edit history, we predict the best editors for creating and writing the missing article.

The results of this study will empower the editors by recommending articles to edit based on Wikipedia's edit history and the availability of content in other languages. The output of this research can provide valuable input to the ContentTranslation tool currently developed by the Language Engineering team.

Track

Technology, Interface & Infrastructure

Length of session (if other than 30 minutes, specify how long)
30 minutes
Will you attend Wikimania if your submission is not accepted?
possibly
Slides or further information (optional)
Special requests


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. Daniel Mietchen (talk) 02:47, 15 February 2015 (UTC)[reply]
  2. Mrjohncummings (talk) 05:24, 16 February 2015 (UTC)[reply]
  3. Pginer-WMF (talk) 06:51, 16 February 2015 (UTC)[reply]
  4. Jaredzimmerman (WMF) (talk) 08:56, 23 February 2015 (UTC)[reply]
  5. Sage (Wiki Ed) (talk) 01:05, 25 February 2015 (UTC)[reply]
  6. Harej (talk) 23:03, 25 June 2015 (UTC)[reply]
  7. --Ziko (talk) 12:58, 28 June 2015 (UTC)[reply]
  8. Santhosh.thottingal (talk) 01:33, 17 July 2015 (UTC)[reply]
  9. बिप्लब आनन्द (talk) 16:06, 17 July 2015 (UTC)[reply]
  10. Nabin K. Sapkota (talk) 16:10, 17 July 2015 (UTC)[reply]
  11. Darafsh Kaviyani (Talk) 20:29, 18 July 2015 (UTC)[reply]