Wikimedia Launches AI-Friendly Wikidata Project

Wikimedia Deutschland has introduced a major upgrade aimed at making the massive pool of Wikipedia knowledge more usable for artificial intelligence systems. The initiative, known as the Wikidata Embedding Project, brings cutting-edge technology to one of the world’s most trusted information sources.

So, what’s new? The project applies semantic search using vector-based AI models — a fancy way of saying that computers can now understand the meaning and context behind words and how they relate to one another. This method is being applied to around 120 million data entries from Wikipedia and related platforms.

What really sets this apart is its compatibility with the Model Context Protocol (MCP), a framework that allows AI models to interact more naturally with databases like Wikidata. In simpler terms, this means AI tools can now ask complex, conversational questions and get more accurate, context-rich answers from Wikipedia-based content.

The project was built in partnership with Jina.AI, a company that specializes in neural search, and DataStax, a real-time data platform owned by IBM. This collaboration updates Wikidata from a basic keyword and SPARQL search tool (used by more advanced developers) to something that works seamlessly with today’s AI training techniques — like retrieval-augmented generation (RAG) — which pull in verified external knowledge in real time.

To illustrate, a simple query like “scientist” won’t just return a generic list. It might show notable nuclear scientists, those affiliated with Bell Labs, translations of the term, images of scientists at work, and even links to related terms like “researcher” or “scholar.”

Developers and researchers can already access the tool on Toolforge, and a webinar is scheduled for October 9th for those interested in exploring its potential.

This upgrade comes at a critical time — as AI developers are in a race to find high-quality, trustworthy data to train their models. While internet-scale datasets like Common Crawl contain vast amounts of data, they often lack credibility. In contrast, Wikipedia’s editor-verified content offers a more reliable alternative.

Wikidata AI project manager Philippe Saadé stressed the project’s independence from big tech: “This shows that powerful AI tools can be open, collaborative, and built for the public good — not just controlled by tech giants.”