Department of Information Systems

Colloquium on automatic assessment of Wikipedia quality at Tufts University

The beginning of the academic year in the United States coincided with a speech by Dr. Włodzimierz Lewoniewski at Tufts University. The colloquium discussed theoretical and practical aspects of the use of artificial intelligence and large open data sets to automate the process of assessing the quality of Wikipedia articles and its information sources in various language versions.
September 23, 2023

The event took place on September 7, 2023 at the Joyce Cummings Center (JCC) during six-week visit of Dr. Włodzimierz Lewoniewski to the United States. This is the first colloquium (discussion seminar) at Tufts University in the 2023/2024 academic year.[1]

In a world where information can spread quickly, it is important that society has access to reliable sources of knowledge. Wikipedia, one of the most visited websites in the world, plays an important role in educating and informing people. This open-access encyclopedia contains over 60 million articles in over 300 languages and offers free access to a huge amount of information on virtually any topic.[2] Additionally, content from Wikipedia helps improve various web services (e.g. Google Search, ChatGPT, etc.).

Assessment of the quality of Wikipedia and its information sources

Wikipedia is created by volunteers from all over the world, which makes it dynamic and constantly evolving. This collaboration model allows for quick updates and corrections of information. More than half a million new editions are made to this encyclopedia every day. Manually assessing all these changes in real time is a major challenge.

Wikipedia has certain standards for assessing the quality of content. However, the evaluation criteria may vary depending on the language version and may change over time.[3] Moreover, assessing the quality of information is largely a subjective process, depending on the interpretation and experience of the individual editors of this encyclopedia. Therefore, evaluating Wikipedia articles often requires dialogue and consensus among the community.

Automating the process of assessing the quality of Wikipedia’s information can significantly contribute to improving the quality of content, the efficiency of editors’ work and the credibility of the platform as a whole.[4] Algorithms that are well designed are free from emotion, bias, and bias, which can help provide a more objective assessment of information quality. Additionally, automation allows for a uniform and consistent assessment of the quality of articles based on established criteria, which contributes to greater consistency in content assessment. Thanks to automation, large amounts of information quality data can also be collected and analyzed, which can provide valuable tips on areas requiring improvement and directions for further development of the platform. Additionally, automation can help relieve Wikipedia users from routine tasks, allowing them to focus on more complex aspects of editing and moderation.

Specially prepared tools can immediately identify potential problems, such as vandalism, inappropriate content or disinformation, which allows for faster response and improvement of content quality. These tools can provide editors with valuable real-time feedback, helping them create and edit articles according to Wikipedia’s guidelines. Additionally, automatic rating systems for Wikipedia articles and its information sources can be integrated with other tools and platforms, allowing for better use of technology to improve the quality of content.

It’s also important to remember that the Wikipedia community is made up of many volunteers who typically manually review and correct content. In the event of significant activity towards posting false information or mass vandalism, automatic tools can serve as the first line of defense, quickly identifying and reacting to unwanted changes.

A key aspect of content quality on Wikipedia is the principle of information verifiability. This means that every claim in the articles in this encyclopedia must be based on a reliable source of information. All Wikipedia articles in various languages have hundreds of millions of references to different sources of information.[5] Automating the source evaluation process can help quickly identify sources that are potentially unreliable, outdated, or that do not meet academic standards, allowing editors to focus on verifying them or replacing them with more credible sources. Additionally, in times of increasing fake news, automatic source assessment can quickly detect and flag information based on questionable sources, preventing their spread. Additionally, new Wikipedia editors may not be sure which sources are the most reliable in a given field. Automatic source evaluation can provide them with guidance and recommendations, helping them select appropriate source materials.

The presentation also included tools that, based on scientific research and large data sets, allow to automatically assess the quality of Wikipedia articles[6] and evaluation of information sources[7] of this encyclopedia. One of such tools can compare and integrate information from various open multilingual sources, such as Wikipedia, Wikidata, DBpedia and others.[8]

References

  1. A list of discussion seminars at Tufts University for the current semester is available at: http://www.cs.tufts.edu/t/colloquia/current.php. More information about seminars with guest speakers discussing research challenges and the latest advances in computer science can be found at Tufts University website.
  2. Overall statistics for all Wikipedia language versions: https://meta.wikimedia.org/wiki/List_of_Wikipedias
  3. For example, for the English version of Wikipedia you can find tips, recommendations and guidelines aimed at improving the quality of the article: https://en.wikipedia.org/wiki/Wikipedia:The_perfect_article
  4. Lewoniewski W., Węcel K., Abramowicz W., (2017), Relative Quality and Popularity Evaluation of Multilingual Wikipedia Articles. Informatics 2017, 4, 43.
  5. Lewoniewski W. (2022). Identification of Important Web Sources of Information on Wikipedia across various Topics and Languages. Procedia Computer Science, 207, 3290-3299.
  6. WikiRank – assessment of the quality and popularity of Wikipedia articles in various languages.
  7. BestRef – evaluation of Wikipedia information sources in different language versions.
  8. DBpedia blog. (2021). Giving knowledge back to Wikipedia: Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources.

This site uses cookies to deliver services in accordance with this Cookie Policy.
You can specify the conditions for storage or access cookies on your browser or the configuration of the service.