Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

Generative AI & LLMs
Published: arXiv: 2512.05967v1
Authors

Francesco Granata Francesco Poggi Misael Mongiovì

Abstract

In the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable knowledge sources. Despite their impressive effectiveness in many areas, RAG systems based solely on semantic similarity often fail to ensure factual accuracy in specialized domains, where terminological ambiguity can affect retrieval relevance. This study proposes an enhanced RAG architecture that integrates a factual signal derived from Entity Linking to improve the accuracy of educational question-answering systems in Italian. The system includes a Wikidata-based Entity Linking module and implements three re-ranking strategies to combine semantic and entity-based information: a hybrid score weighting model, reciprocal rank fusion, and a cross-encoder re-ranker. Experiments were conducted on two benchmarks: a custom academic dataset and the standard SQuAD-it dataset. Results show that, in domain-specific contexts, the hybrid schema based on reciprocal rank fusion significantly outperforms both the baseline and the cross-encoder approach, while the cross-encoder achieves the best results on the general-domain dataset. These findings confirm the presence of an effect of domain mismatch and highlight the importance of domain adaptation and hybrid ranking strategies to enhance factual precision and reliability in retrieval-augmented generation. They also demonstrate the potential of entity-aware RAG systems in educational environments, fostering adaptive and reliable AI-based tutoring tools.

Paper Summary

Problem
The main problem this paper addresses is the limitation of Large Language Models (LLMs) in providing accurate and reliable information, especially in specialized domains like education. These models can produce incorrect or inconsistent information, known as "hallucination," which can have serious consequences in critical areas like science, medicine, and education.
Key Innovation
The researchers propose an enhanced Retrieval-Augmented Generation (RAG) architecture that integrates a factual signal derived from Entity Linking to improve the accuracy of educational question-answering systems. This innovation combines the strengths of LLMs with external knowledge sources, such as Wikidata, to ground the model's output in real, verifiable information.
Practical Impact
This research has significant practical implications for educational platforms, where accurate and reliable information is crucial. By integrating Entity Linking into RAG systems, educators and learners can access high-quality educational content, fostering adaptive and reliable AI-based tutoring tools. The proposed system can also be applied in other domains where specialized knowledge and terminology are essential.
Analogy / Intuitive Explanation
Think of it like a librarian who helps you find the right book in a vast library. Traditional LLMs are like a librarian who gives you a list of books based on their title, but might not always understand the context or nuances of the subject. The proposed RAG architecture with Entity Linking is like a librarian who not only gives you a list of relevant books but also checks the book's contents to ensure it's accurate and relevant to your question. This way, you get the most accurate and reliable information possible.
Paper Information
Categories:
cs.IR cs.AI cs.CL cs.LG
Published Date:

arXiv ID:

2512.05967v1

Quick Actions