Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Hallucination and Factuality Assurance

Section Goal: Understand the causes of LLM hallucinations and master practical techniques to reduce hallucinations and improve factuality.


What Is Hallucination?

LLM Hallucination Types and Defense Strategies

Hallucination refers to LLM-generated content that looks fluent and reasonable but is actually incorrect or fabricated. It's like a "knowledgeable" friend who, when they don't know the answer, doesn't say "I don't know" — instead, they confidently make up an answer that sounds right.

Types of Hallucination

TypeDescriptionExample
Factual hallucinationGenerates information that contradicts facts"Python was released in 1995" (actually 1991)
Citation hallucinationCites non-existent papers or links"According to Smith et al. (2023)..." (paper doesn't exist)
Logical hallucinationReasoning appears sound but is actually flawedMath calculation shows correct process but wrong result
Instruction hallucinationClaims to have performed an action but didn't"I've sent the email for you" (no email-sending capability)

Strategies to Reduce Hallucinations

Strategy 1: Require Citations

Force the Agent to cite sources; declare uncertainty when no source is available:

ANTI_HALLUCINATION_PROMPT = """
## Factuality Requirements (Must Be Followed)

1. **Only say what you know**: Only provide information you are confident is correct
2. **Cite sources**: When stating specific facts or data, cite the source
3. **Admit uncertainty**: For uncertain content, clearly say "I'm not sure; I recommend you verify this"
4. **Distinguish facts from opinions**: Use declarative sentences for facts; use "I think" or "generally speaking" for opinions

### Examples
✅ "Python 3.12 was released in October 2023, introducing better error messages."
✅ "I'm not very sure about the specific numbers for this question; I recommend checking the official documentation."
❌ "The latest version of this library is 5.2.1." (if you're not sure of the version number)
"""

Strategy 2: RAG Fact-Checking

Use retrieved documents to verify the Agent's answers:

class FactChecker:
    """RAG-based fact checker"""
    
    def __init__(self, retriever, llm):
        self.retriever = retriever
        self.llm = llm
    
    def check(self, claim: str) -> dict:
        """Check the factuality of a claim"""
        
        # 1. Retrieve relevant documents
        docs = self.retriever.invoke(claim)
        
        if not docs:
            return {
                "verdict": "unverifiable",
                "confidence": 0.0,
                "explanation": "No relevant documents found to verify this claim"
            }
        
        # 2. Let the LLM judge
        context = "\n\n".join(doc.page_content for doc in docs[:3])
        
        check_prompt = f"""Based on the following reference documents, determine whether this claim is correct.

Claim: {claim}

Reference documents:
{context}

Please reply in JSON format:
{{
    "verdict": "supported" or "contradicted" or "unverifiable",
    "confidence": 0.0-1.0,
    "explanation": "basis for judgment"
}}"""
        
        response = self.llm.invoke(check_prompt)
        import json
        return json.loads(response.content)
    
    def check_response(self, response: str) -> list[dict]:
        """Check key claims in an entire response"""
        
        # First extract key claims
        extract_prompt = f"""Extract all verifiable factual claims from the following response.
One claim per line; only extract objective facts, not opinions.

Response:
{response}

List of claims:"""
        
        claims_response = self.llm.invoke(extract_prompt)
        claims = [
            c.strip().lstrip("- ·•")
            for c in claims_response.content.strip().split("\n")
            if c.strip()
        ]
        
        # Check each one
        results = []
        for claim in claims:
            result = self.check(claim)
            result["claim"] = claim
            results.append(result)
        
        return results

Strategy 3: Self-Consistency Check

Have the Agent answer the same question multiple times; if answers are inconsistent, hallucination may be present:

async def self_consistency_check(
    question: str,
    llm,
    num_samples: int = 3,
    temperature: float = 0.7
) -> dict:
    """Self-consistency check: generate multiple times and compare"""
    import asyncio
    
    # Generate multiple answers
    tasks = []
    for _ in range(num_samples):
        tasks.append(llm.ainvoke(
            question,
            temperature=temperature
        ))
    
    responses = await asyncio.gather(*tasks)
    answers = [r.content for r in responses]
    
    # Let the LLM judge consistency
    consistency_prompt = f"""The following are {num_samples} answers to the same question.
Please determine whether they are consistent.

Question: {question}

""" + "\n\n".join(
        f"Answer {i+1}: {a}" for i, a in enumerate(answers)
    ) + """

Please reply with:
1. Consistency score (0–1)
2. Which answer is most reliable
3. Specific areas of inconsistency
"""
    
    analysis = await llm.ainvoke(consistency_prompt)
    
    return {
        "answers": answers,
        "analysis": analysis.content,
        "num_samples": num_samples
    }

Strategy 4: Tool Grounding

For scenarios requiring precise data, force the use of tools rather than memory:

def create_grounded_agent(llm, tools):
    """Create a 'grounded' Agent — prioritize using tools to obtain facts"""
    
    system_prompt = """You are a rigorous assistant.

## Core Principle: Tools First
- For specific data (prices, dates, quantities, etc.), you must use tools to query
- For real-time information (weather, news, stock prices, etc.), you must use tools to query
- Only use your knowledge when tools cannot obtain the information
- When answering from knowledge, you must note "Based on my knowledge"

## Prohibited Behaviors
- Do not fabricate specific numbers, dates, or links
- Do not pretend to have performed an action (e.g., "Email sent")
- Do not cite uncertain sources
"""
    
    # Build Agent using LangChain
    from langchain.agents import create_openai_tools_agent, AgentExecutor  # legacy; new projects should use LangGraph
    from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", "{input}"),
        MessagesPlaceholder("agent_scratchpad"),
    ])
    
    agent = create_openai_tools_agent(llm, tools, prompt)
    return AgentExecutor(agent=agent, tools=tools, verbose=True)

Frontier Methods (2025–2026)

The following methods represent the latest advances in hallucination mitigation as of early 2026, providing stronger factuality guarantees from three levels: model capability, training strategy, and system architecture.

Strategy 5: Reasoning Models — "Think Before You Speak"

Reasoning models such as OpenAI o1/o3 and DeepSeek-R1 perform self-verification before generating answers through internalized Chain-of-Thought, significantly reducing hallucination rates.

Traditional model:
  Question → Directly generate answer → May "confidently make mistakes"

Reasoning model:
  Question → Internal reasoning chain:
    "Let me analyze this... Am I sure about this information?"
    "I'm not very sure about this date; let me verify from another angle..."
    "This might be wrong; let me reconsider..."
  → Verified answer → Hallucinations significantly reduced

Empirical data (SimpleQA benchmark):
  GPT-4o:   38.2% error rate
  o1:       16.0% error rate (58% reduction)
  o3-mini:  12.8% error rate (66% reduction)

Best practices for using reasoning models in Agents:

def create_reasoning_agent(task_type: str):
    """Select the appropriate model based on task type"""
    
    # Use reasoning models for high-risk tasks to reduce hallucinations
    HIGH_RISK_TASKS = ["medical", "legal", "financial", "safety"]
    
    if task_type in HIGH_RISK_TASKS:
        # Reasoning model: higher factuality, but higher latency and cost
        model = "o3-mini"  # or deepseek-reasoner
        system_note = "Please carefully reason and verify each factual claim before answering."
    else:
        # Regular model: faster and cheaper, suitable for low-risk scenarios
        model = "gpt-4o"
        system_note = "Please ensure the accuracy of your answer; state when uncertain."
    
    return {"model": model, "system_note": system_note}

⚠️ Reasoning models are not a silver bullet: At the boundaries of knowledge not covered by training data, reasoning models still hallucinate. Reasoning model + RAG is the most reliable combination currently.

Strategy 6: Calibration Training — Teaching Models to Say "I Don't Know"

A core problem with traditional LLMs is overconfidence — even when they don't know the answer, they generate a seemingly reasonable response. Calibration training aligns the model's "confidence level" with its "actual accuracy rate."

CALIBRATED_PROMPT = """
## Confidence Annotation Requirements

For each of your factual claims, please annotate the confidence level:

- 🟢 **High confidence**: You are very certain this is correct (e.g., widely known facts)
- 🟡 **Medium confidence**: You think it's probably correct, but recommend the user verify
- 🔴 **Low confidence**: You are uncertain; clearly inform the user they need to verify
- ⚪ **Cannot determine**: Beyond your knowledge; simply say "I don't know"

### Examples
✅ "Python was created by Guido van Rossum 🟢, first released in 1991 🟢."
✅ "The latest version of this library may be 3.2 🟡; I recommend checking the official documentation to confirm."
✅ "I cannot confirm the specific revenue figures for this company ⚪; I recommend consulting their financial reports."
"""


class CalibratedAgent:
    """Agent with confidence calibration"""
    
    def __init__(self, llm, retriever=None):
        self.llm = llm
        self.retriever = retriever
    
    def answer(self, question: str) -> dict:
        """Generate an answer with confidence annotations"""
        
        # Step 1: Generate initial answer (with confidence annotations)
        response = self.llm.invoke(
            CALIBRATED_PROMPT + f"\n\nUser question: {question}"
        )
        
        # Step 2: For low-confidence content, try RAG supplementation
        if self.retriever and ("🟡" in response.content or "🔴" in response.content):
            docs = self.retriever.invoke(question)
            if docs:
                context = "\n".join(d.page_content for d in docs[:3])
                # Re-answer the low-confidence parts using retrieved documents
                refined = self.llm.invoke(
                    f"Based on the following reference materials, re-answer the parts you were uncertain about:\n\n"
                    f"Reference materials:\n{context}\n\n"
                    f"Original answer:\n{response.content}\n\n"
                    f"Please only correct the low-confidence parts; keep the high-confidence content."
                )
                return {"answer": refined.content, "rag_enhanced": True}
        
        return {"answer": response.content, "rag_enhanced": False}

Strategy 7: Chain-of-Verification (CoVe)

CoVe was proposed by Meta in 2023 and widely adopted in 2024–2025. The core idea is: after generating an answer, automatically generate verification questions, independently answer those questions, and then use the verification results to correct the original answer.

class ChainOfVerification:
    """Chain of Verification: Generate → Verify → Correct"""
    
    def __init__(self, llm):
        self.llm = llm
    
    async def generate_and_verify(self, question: str) -> dict:
        """Complete CoVe process"""
        
        # Step 1: Generate initial answer
        initial = await self.llm.ainvoke(
            f"Please answer the following question:\n{question}"
        )
        
        # Step 2: Generate verification questions
        verification_qs = await self.llm.ainvoke(
            f"The following is an answer. Please generate a verification question for each factual claim in it.\n\n"
            f"Answer: {initial.content}\n\n"
            f"Please generate 3–5 verification questions, one per line:"
        )
        questions = [
            q.strip().lstrip("0123456789.-) ")
            for q in verification_qs.content.strip().split("\n")
            if q.strip()
        ]
        
        # Step 3: Independently answer verification questions
        # (Key: do NOT show the model the initial answer, to avoid confirmation bias)
        import asyncio
        verify_tasks = [
            self.llm.ainvoke(f"Please answer concisely: {q}")
            for q in questions
        ]
        verify_answers = await asyncio.gather(*verify_tasks)
        
        # Step 4: Use verification results to correct the initial answer
        verification_context = "\n".join(
            f"Verification question: {q}\nVerification result: {a.content}"
            for q, a in zip(questions, verify_answers)
        )
        
        final = await self.llm.ainvoke(
            f"Original question: {question}\n\n"
            f"Initial answer: {initial.content}\n\n"
            f"The following are fact verification results for the initial answer:\n{verification_context}\n\n"
            f"Please correct any errors in the initial answer based on the verification results (if any), and output the final answer:"
        )
        
        return {
            "initial_answer": initial.content,
            "verification_questions": questions,
            "final_answer": final.content
        }

Strategy 8: Structured Output Constraints

By forcing the LLM to output structured formats (JSON Schema, Pydantic models), hallucinations can be reduced at the structural level — the model must provide explicit values for each field, rather than being vague in free text.

from pydantic import BaseModel, Field
from enum import Enum


class ConfidenceLevel(str, Enum):
    HIGH = "high"        # Certain it's correct
    MEDIUM = "medium"    # Probably correct
    LOW = "low"          # Uncertain
    UNKNOWN = "unknown"  # Don't know


class FactClaim(BaseModel):
    """A factual claim"""
    statement: str = Field(description="The content of the factual claim")
    confidence: ConfidenceLevel = Field(description="Confidence level")
    source: str | None = Field(
        default=None,
        description="Source of information; null if no clear source"
    )


class StructuredAnswer(BaseModel):
    """Structured answer with mandatory source and confidence annotations"""
    summary: str = Field(description="Answer summary")
    facts: list[FactClaim] = Field(description="List of factual claims in the answer")
    caveats: list[str] = Field(
        default_factory=list,
        description="Notes and disclaimers"
    )
    needs_verification: bool = Field(
        description="Whether the user is recommended to verify further"
    )


# Use OpenAI's structured output feature
from openai import OpenAI

client = OpenAI()

def get_structured_answer(question: str) -> StructuredAnswer:
    """Get a structured answer with confidence annotations"""
    
    response = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": (
                "You are a rigorous assistant. For each factual claim, "
                "you must annotate the confidence level and source. "
                "Mark uncertain content as low or unknown."
            )},
            {"role": "user", "content": question}
        ],
        response_format=StructuredAnswer,
    )
    
    return response.choices[0].message.parsed

Strategy 9: Multi-Agent Cross-Verification

Drawing on the academic "peer review" mechanism, having multiple Agents check each other significantly improves factuality:

class CrossVerificationSystem:
    """Multi-Agent cross-verification system"""
    
    def __init__(self, models: list[str]):
        """
        Use different models as different "review experts"
        Different models have different knowledge biases, making cross-verification more effective
        """
        from openai import OpenAI
        self.client = OpenAI()
        self.models = models  # e.g., ["gpt-4o", "claude-3-opus", "deepseek-chat"]
    
    async def cross_verify(self, question: str) -> dict:
        """Multi-model cross-verification"""
        import asyncio
        
        # Step 1: Each model answers independently
        async def get_answer(model: str) -> str:
            response = self.client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": "Please answer the question rigorously; clearly annotate uncertain content."},
                    {"role": "user", "content": question}
                ]
            )
            return response.choices[0].message.content
        
        answers = await asyncio.gather(
            *[get_answer(m) for m in self.models]
        )
        
        # Step 2: Let one model act as "arbitrator" to synthesize all answers
        all_answers = "\n\n".join(
            f"[{model}'s answer]:\n{answer}"
            for model, answer in zip(self.models, answers)
        )
        
        synthesis = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": (
                    "You are a fact-checking arbitrator. The following are answers from multiple AI models to the same question.\n"
                    "Please:\n"
                    "1. Find facts that all models agree on (high credibility)\n"
                    "2. Find contradictions between models (needs verification)\n"
                    "3. Synthesize and provide the most reliable answer"
                )},
                {"role": "user", "content": (
                    f"Question: {question}\n\nModel answers:\n{all_answers}"
                )}
            ]
        )
        
        return {
            "individual_answers": dict(zip(self.models, answers)),
            "synthesized_answer": synthesis.choices[0].message.content
        }

Traditional RAG relies on an offline-built knowledge base, but for time-sensitive questions (news, stock prices, latest version numbers, etc.), real-time web search must be combined:

class RealtimeFactAgent:
    """Factuality Agent combining real-time search"""
    
    def __init__(self, llm, search_tool, local_retriever=None):
        self.llm = llm
        self.search_tool = search_tool        # Web search tool
        self.local_retriever = local_retriever  # Local knowledge base (optional)
    
    def answer(self, question: str) -> dict:
        """First determine if real-time information is needed, then decide retrieval strategy"""
        
        # Step 1: Classify the question type
        classification = self.llm.invoke(
            f"Determine whether the following question requires real-time/latest information "
            f"(such as current date, latest version, recent events, etc.). "
            f"Reply only with 'realtime' or 'static'.\n\nQuestion: {question}"
        ).content.strip().lower()
        
        # Step 2: Choose retrieval strategy based on type
        if "realtime" in classification:
            # Real-time search
            search_results = self.search_tool.invoke(question)
            context_source = "Real-time web search"
            context = search_results
        elif self.local_retriever:
            # Local knowledge base retrieval
            docs = self.local_retriever.invoke(question)
            context_source = "Local knowledge base"
            context = "\n\n".join(d.page_content for d in docs[:3])
        else:
            context_source = "Model internal knowledge"
            context = ""
        
        # Step 3: Generate answer based on retrieved results
        if context:
            answer = self.llm.invoke(
                f"Answer the question based on the following reference information. "
                f"If the reference information is insufficient, please clearly state so.\n\n"
                f"Reference information (source: {context_source}):\n{context}\n\n"
                f"Question: {question}"
            )
        else:
            answer = self.llm.invoke(
                f"Please answer the following question. If uncertain, please clearly state so.\n\nQuestion: {question}"
            )
        
        return {
            "answer": answer.content,
            "source": context_source,
            "has_grounding": bool(context)
        }

Strategy Combination: Building a Multi-Layer Factuality Defense

In production environments, a single strategy is often insufficient. Recommended combinations based on scenario:

                    ┌─────────────────────────────────┐
                    │         User Question             │
                    └──────────┬──────────────────────┘
                               │
                    ┌──────────▼──────────────────────┐
                    │  Layer 1: Real-time RAG (S10)    │
                    │  Determine if real-time info     │
                    │  needed → Search/RAG             │
                    └──────────┬──────────────────────┘
                               │
                    ┌──────────▼──────────────────────┐
                    │  Layer 2: Reasoning Model (S5)   │
                    │  Use reasoning model + retrieved │
                    │  results to generate answer      │
                    └──────────┬──────────────────────┘
                               │
                    ┌──────────▼──────────────────────┐
                    │  Layer 3: Structured Output (S8) │
                    │  Force annotation of confidence  │
                    │  and sources                     │
                    └──────────┬──────────────────────┘
                               │
                    ┌──────────▼──────────────────────┐
                    │  Layer 4: CoVe Correction (S7)   │
                    │  Auto-generate verification      │
                    │  questions → Correct errors      │
                    └──────────┬──────────────────────┘
                               │
                    ┌──────────▼──────────────────────┐
                    │  Final Answer (with confidence)  │
                    └─────────────────────────────────┘

Recommended combinations for different scenarios:

ScenarioRecommended Strategy CombinationNotes
Medical/legal consultationReasoning model + RAG + CoVe + multi-Agent verificationHighest reliability; higher latency acceptable
Customer service Q&ARAG + structured output + tool groundingBalance accuracy and response speed
Data analysisTool grounding + self-consistencyData must come from tools, not memory
Knowledge base Q&ARAG + citation + calibration promptEnsure answers are evidence-based
Real-time information queriesReal-time search + tool groundingTimeliness is the priority

Summary

StrategyPurposeUse CasesFrontier Level
Require citationsForce model to find evidence for informationKnowledge Q&ABasic
RAG fact-checkingVerify answers with retrieved documentsDocument Q&ABasic
Self-consistencyDetect uncertainty through multiple generationsHigh-risk decisionsBasic
Tool groundingUse tools to obtain precise dataData queriesBasic
Reasoning models"Think before speaking," internalized verificationHigh-risk tasks⭐ 2024–2025
Calibration trainingTeach models to say "I don't know"General scenarios⭐ 2024–2025
Chain-of-Verification (CoVe)Auto-verify and correct after generationLong-form generation⭐ 2024–2025
Structured output constraintsForce annotation of sources and confidenceStructured Q&A⭐ 2024–2025
Multi-Agent cross-verificationMultiple models check each otherCritical decisions⭐ 2025–2026
Real-time retrieval augmentationCombine web search for latest informationTime-sensitive questions⭐ 2025–2026

📖 Want to dive deeper into the academic frontiers of hallucination detection and mitigation? Read 19.6 Paper Readings: Frontier Research in Security and Reliability, which covers in-depth analysis of core papers including FActScore, SelfCheckGPT, Self-Consistency, and CoVe.

Preview of next section: Agents need not only to "say the right things" but also to "act safely" — permission control is crucial.


Next section: 19.3 Permission Control and Sandbox Isolation →