Department of Information Systems

From science to practice: identifying important sources of information in multilingual Wikipedia

Wikipedia takes great care to ensure that the content on its sites is accurate and trustworthy. Crucial to maintaining this standard is the principle of verifiability, which requires that information - especially controversial information - be based on reliable and published sources. Thanks to this, the content on Wikipedia is not based on subjective opinions or unverified research. However, credibility is a subjective concept and its assessment depends on many factors, such as the language version of Wikipedia or the topic of the article, which may pose a challenge for editors when selecting appropriate sources.

Automatic identification and evaluation of information sources in Wikipedia

With over a billion websites, it is a huge challenge for Wikipedia users to individually assess the credibility of each source. Although there are detailed guidelines for reliable sources in the various language versions of Wikipedia, there is no comprehensive list of sites that can be considered reliable in various topical contexts. Additionally, the credibility and reputation of sites may change over time, requiring regular updates to such listings. For this reason, automating the process of creating and updating a list of reliable sources is extremely important. Such a list would be a valuable resource not only for Wikipedia editors, but also for its readers seeking accurate and reliable information.

The Department of Information Systems PUEB conducts research in the area of automatic assessment of the quality of articles and the reliability of information sources in various language versions of Wikipedia. The analysis of over 60 million Wikipedia articles allowed the identification of over 330 million references to sources. Various evaluation models identified important sources of information. The table below shows the results of references extraction for selected language versions of this encyclopedia and the number of unique websites in October 2023:

Wiki Language Version Number of Articles Number of References Unique Websites
ar Arabic 1,219,168 6,355,164 294,089
ca Catalan 735,551 3,895,389 197,470
cs Czech 532,602 2,752,877 119,313
de German 2,839,878 14,473,501 622,551
en English 6,722,214 79,687,819 1,942,579
es Spanish 1,833,749 12,558,623 509,313
fa Persian 975,931 2,477,763 133,634
fi Finnish 559,931 3,371,084 138,320
fr French 2,557,559 19,455,752 576,523
he Hebrew 342,285 1,867,068 103,848
hi Hindi 162,954 496,057 47,617
hu Hungarian 530,977 2,545,152 124,536
id Indonesian 661,844 2,672,604 162,924
it Italian 1,829,095 8,856,574 278,232
ja Japanese 1,388,532 14,684,917 359,446
ko Korean 646,717 1,885,878 91,918
nl Dutch 2,133,536 3,010,002 112,318
no Norwegian 616,624 2,102,507 107,343
pl Polish 1,583,919 8,847,928 242,835
pt Portuguese 1,110,209 7,692,600 319,534
ru Russian 1,940,113 15,461,960 454,351
sv Swedish 2,572,575 11,791,609 134,081
th Thai 158,905 1,010,438 70,395
tr Turkish 533,201 2,773,455 146,854
uk Ukrainian 1,289,727 5,455,954 217,787
vi Vietnamese 1,288,093 3,796,577 147,041
zh Chinese 1,379,496 8,130,187 283,516

During the webinar, Dr. Włodzimierz Lewoniewski presented the possibilities of identifying and automatically assessing the importance of information sources of Wikipedia articles from different language versions. As part of the practical part, some of the capabilities of the BestRef tool were shown, which contains information about the results of the evaluation of millions of Internet sources in Wikipedia articles from the point of view of individual language versions.

Webinar recording:

The webinar took place on November 23, 2023. The organizer of the event is the Wikimedia Polska, which supports and promotes Wikipedia and its sister projects (such as Wikidata, Wiktionary, Wikinews, Wikisource and others).

More information about research on the analysis of information sources on Wikipedia can be found in scientific publications:

This site uses cookies to deliver services in accordance with this Cookie Policy.
You can specify the conditions for storage or access cookies on your browser or the configuration of the service.