Assessing the quality of Wikipedia content and identifying important sources of information
The Internet is an open space, offering access to a multitude of perspectives and opinions. With over a billion web pages, Wikipedia stands out as one of the most renowned platforms, offering over 60 million articles in over 300 languages, giving global access to knowledge. Services such as Google and ChatGPT often use Wikipedia content to improve the quality of their services.
Wikipedia is created by diverse communities of users, where each language version has its own quality standards. These standards are shaped by the communities of users of this encyclopedia. However, Wikipedia’s quality assessment standards may vary depending on the language version and are subject to change. Content evaluation is often subjective and requires cooperation and agreement between editors. Automation of this process, using artificial intelligence algorithms, can help in objective assessment of content, quick detection of errors and problems, identification of vandalism or disinformation, which increases the credibility and efficiency of editing.
In virtually every language version there are special awards for articles of the highest quality. In the most developed of all language versions of Wikipedia – English – the term “Featured Article” (FA) refers to model articles, i.e. those that meet all quality criteria in a given language version. “Good Article” (GA) is a title for articles that are close to the benchmark standards, but do not yet fully meet them. In the Polish-language version of Wikipedia, such articles are referred to as “Artykuł na Medal” and “Dobry Artykuł” (FA and GA, respectively).
A central element of ensuring the quality of articles on Wikipedia is the principle of content verifiability. This means that the information presented in this encyclopedia must be based on sources considered reliable. However, the process of assessing the credibility of sources may vary depending on the topic of the article and the language in which it is published. Factors such as the reputation of the publisher or author, the quality of the review process, and the precision of the data presented play a key role in assessing the credibility of a source. Wikipedia editors strive to select sources that are widely recognized as reliable in their fields. However, the main challenge in assessing the “reliability” of a source, as well as the “quality of information”, is the subjectivity of this process. This means that Wikipedia users must reach a common consensus regarding each source to be used in the encyclopedia.
Artificial intelligence can help improve the process of evaluating Wikipedia articles and their information sources in several key areas. Algorithms can analyze content in terms of its objectivity, consistency and compliance with quality standards. They can also help identify pages that require corrections or updates, as well as detect attempts to introduce disinformation or vandalism. Artificial intelligence can also assist in assessing the credibility of sources by analyzing their provenance and historical accuracy. There are already publicly available tools based on methods described in scientific publications that allow for assessing the quality of Wikipedia articles and identifying important sources of information in various languages. Such tools can used to improve effectiveness of the content management process, which is of great importance in the case of a global source of knowledge such as Wikipedia.
More information can be found in the article entitled “Information Quality on Wikipedia” in ACADEMIA magazine no. 4 (80). The journal is published by the Polish Academy of Sciences and has been issued quarterly since 2003. Goal – promotion of the achievements of Polish researchers in Poland and abroad. The in-depth analyzes are written in a language accessible to a diverse group of scientists, students, pupils and all other readers interested in popular science topics.