Detecting and Explaining Hallucinations in Large Language Models
Project Description
This project aims to build a demo tool that detects when AI-generated text is unsupported by source documents and provides users with clear explanations of which claims can be trusted. It is ideal for final-year students who are interested in applying AI and data analysis skills in a practical context by designing transparent solutions that improve the reliability of language model outputs.
Supervisor
ZHOU, Xiaofang
Quota
2
Course type
UROP1000
UROP1100
UROP2100
UROP3100
UROP3200
UROP4100
Applicant's Roles
You will be tasked with building a hallucination detection demo for retrieval-augmented generation (RAG) systems. Large language models sometimes produce confident-sounding but factually incorrect statements—a phenomenon known as hallucination. In RAG systems, an LLM generates answers based on retrieved source documents, but may still produce claims not grounded in any source. Your task is to build a tool that compares each claim in an AI-generated response to the retrieved documents and classifies it as supported, partially supported, or unsupported. You will work with existing open-source tools (such as LangChain) to set up a basic RAG pipeline, then implement sentence-level attribution using text similarity techniques such as embedding-based semantic matching. You will apply the methodology to a curated document collection (e.g., Wikipedia articles on a specific topic) and evaluate detection accuracy on sample question-answer pairs. Furthermore, you will create an interactive demo with color-coded trust indicators, conduct an in-depth analysis of false positives and false negatives, and propose potential improvements to the detection pipeline such as confidence thresholding or multi-source verification.
Applicant's Learning Objectives
Review relevant concepts in LLM hallucination and retrieval-augmented generation to guide the demo's development.
Set up a RAG pipeline using existing tools (LangChain or LlamaIndex) with a curated document collection.
Implement sentence-level attribution that links generated claims to source documents using semantic similarity.
Build an interactive demo (e.g., using Streamlit) with color-coded highlighting of trust levels.
Conduct evaluation on sample queries to analyze detection accuracy and characterize error cases.
Design visualizations to effectively communicate source attribution and trust indicators to end users.
Complexity of the project
Moderate