Wikipedia is a unique source of free knowledge, but the world’s most popular encyclopedia is not always accurate†
The site’s crowdsourced editing model is prone to vandalism and bias. Although its reputation for accuracy is improvedeven Wikipedia does not consider itself a reliable source.
The Wikimedia Foundation, the non-profit organization that oversees Wikipedia, regularly explores new solutions to these shortcomings† The latest effort harnesses the power of AI.
The foundation recently partnered with Meta to improve Wikipedia’s citations. These references are used to confirm crowdsourced information on the site, but they are often missing, incomplete, or inaccurate.
While Wikipedia volunteers double check the footnotes, it’s hard for them to keep track of when more than… 17,000 new articles are added every month† This scale makes the problem a compelling use case for machine learning.
Meta’s proposal fact checks the references. The team says this is the first model that can automatically scan hundreds of thousands of citations at once to check their accuracy.
The model’s knowledge source is a new dataset of 134 million public web pages. Dubbed Sphere, says Meta the open source library is larger and more complex than any corpus ever used for such research.
Our work can help with fact-checking.
To find suitable sources in the dataset, the researchers trained their algorithms on 4 million Wikipedia citations. This enabled the system to find a single source to validate each statement.
An evidence ranking model compares the alternative sources with the original reference.
If a quote seems irrelevant, the system will recommend a better source, in addition to a specific passage supporting the claim. A human editor can then review and approve the suggestion.
To illustrate how this works, the researchers used the example of a Wikipedia page on retired boxer Joe Hipp.
The item describes the Blackfeet Tribe member as the first Native American to compete for the WBA World Heavyweight title. But the model found that the citation for this claim was a webpage that didn’t even mention Hipp or boxing.
The system then searched the Sphere corpus for a replacement reference. It has this passage from a 2015 article in the Great Falls Tribune†
In 1989, at the end of his career, [Marvin] Camel fought Joe Hipp of the Blackfeet Nation. Hipp, who became the first Indian to challenge for the World Heavyweight Championship, said the fight was one of the strangest of his career.
Although the passage doesn’t explicitly mention boxing, the model inferred the context from clues. These include the term “heavyweight” and the word “challenge” as a synonym for “compete,” which was in the original Wikipedia entry.
Future Fact Checking
The team now wants to turn their research into a comprehensive system. In time, they plan to create a platform that Wikipedia editors can use to systematically identify and resolve citation problems.
Meta also has open source the projectwhich could give external researchers new tools to develop their own AI language systems.
“Our results indicate that an AI-based system could be used, in conjunction with humans, to improve Wikipedia’s verifiability,” the report said. study authors wrote:†
“More broadly, we hope our work can be used to verify facts and increase the general trustworthiness of information online.”
The investigation may further fear that automated fact-checking and Big Tech companies may become arbiters of truth. The more optimistic view is that Meta has finally found a way to be disinformation experience for good.