Large language models (LLMs) like GPT-4, PaLM, and Llama have unlocked remarkable advances in natural language generation capabilities. However, a persistent challenge limiting their reliability and safe deployment is their tendency to hallucinate – generating content that seems coherent but is factually incorrect or ungrounded from the input context.
As LLMs continue to grow more powerful and ubiquitous across real-world applications, addressing hallucinations becomes imperative. This article provides a comprehensive overview of the latest techniques researchers have introduced to detect, quantify, and mitigate hallucinations in LLMs.
Understanding Hallucination in LLMs
Hallucination refers to factual inaccuracies or fabrications generated by LLMs that are not grounded in reality or the provided context. Some examples include:
- Inventing biographical details or events not evidenced in source material when generating text about a person.
- Providing faulty medical advice by confabulating drug side-effects or treatment procedures.
- Concocting non-existent data, studies or sources to support a claim.
This phenomenon arises because LLMs are trained on vast amounts of online text data. While this allows them to attain strong language modeling capabilities, it also means they learn to extrapolate information, make logical leaps, and fill in gaps in a manner that seems convincing but may be misleading or erroneous.
Some key factors responsible for hallucinations include:
- Pattern generalization – LLMs identify and extend patterns in the training data which may not generalize well.
- Outdated knowledge – Static pre-training prevents integration of new information.
- Ambiguity – Vague prompts allow room for incorrect assumptions.
- Biases – Models perpetuate and amplify skewed perspectives.
- Insufficient grounding – Lack of comprehension and reasoning means models generating content they don't fully understand.
Addressing hallucinations is critical for trustworthy deployment in sensitive domains like medicine, law, finance and education where generating misinformation could lead to harm.
Taxonomy of Hallucination Mitigation Techniques
Researchers have introduced diverse techniques to combat hallucinations in LLMs, which can be categorized into:
1. Prompt Engineering
This involves carefully crafting prompts to provide context and guide the LLM towards factual, grounded responses.
- Retrieval augmentation – Retrieving external evidence to ground content.
- Feedback loops – Iteratively providing feedback to refine responses.
- Prompt tuning – Adjusting prompts during fine-tuning for desired behaviors.
2. Model Development
Creating models inherently less prone to hallucinating via architectural changes.
- Decoding strategies – Generating text in ways that increase faithfulness.
- Knowledge grounding – Incorporating external knowledge bases.
- Novel loss functions – Optimizing for faithfulness during training.
- Supervised fine-tuning – Using human-labeled data to enhance factuality.
Next, we survey prominent techniques under each approach.
Notable Hallucination Mitigation Techniques
Retrieval Augmented Generation
Retrieval augmented generation enhances LLMs by retrieving and conditioning text generation on external evidence documents, rather than relying solely on the model's implicit knowledge. This grounds content in up-to-date, verifiable information, reducing hallucinations.
Prominent techniques include:
- RAG – Uses a retriever module providing relevant passages for a seq2seq model to generate from. Both components are trained end-to-end.
- RARR – Employs LLMs to research unattributed claims in generated text and revise them to align with retrieved evidence.
- Knowledge Retrieval – Validates unsure generations using retrieved knowledge before producing text.
- LLM-Augmenter – Iteratively searches knowledge to construct evidence chains for LLM prompts.
Feedback and Reasoning
Leveraging iterative natural language feedback or self-reasoning allows LLMs to refine and improve their initial outputs, reducing hallucinations.
CoVe employs a chain of verification technique. The LLM first drafts a response to the user's query. It then generates potential verification questions to fact check its own response, based on its confidence in various statements made. For example, for a response describing a new medical treatment, CoVe may generate questions like “What is the efficacy rate of the treatment?”, “Has it received regulatory approval?”, “What are the potential side effects?”. Crucially, the LLM then tries to independently answer these verification questions without being biased by its initial response. If the answers to the verification questions contradict or cannot support statements made in the original response, the system identifies those as likely hallucinations and refines the response before presenting it to the user.
DRESS focuses on tuning LLMs to align better with human preferences through natural language feedback. The approach allows non-expert users to provide free-form critiques on model generations, such as “The side effects mentioned seem exaggerated” or refinement instructions like “Please also discuss cost effectiveness”. DRESS uses reinforcement learning to train models to generate responses conditioned on such feedback that better align with human preferences. This enhances interactability while reducing unrealistic or unsupported statements.
MixAlign deals with situations where users ask questions that do not directly correspond to the evidence passages retrieved by the system. For example, a user may ask “Will pollution get worse in China?” whereas retrieved passages discuss pollution trends globally. To avoid hallucinating with insufficient context, MixAlign explicitly clarifies with the user when unsure how to relate their question to the retrieved information. This human-in-the-loop mechanism allows obtaining feedback to correctly ground and contextualize evidence, preventing ungrounded responses.
The Self-Reflection technique trains LLMs to evaluate, provide feedback on, and iteratively refine their own responses using a multi-task approach. For instance, given a response generated for a medical query, the model learns to score its factual accuracy, identify any contradictory or unsupported statements, and edit those by retrieving relevant knowledge. By teaching LLMs this feedback loop of checking, critiquing and iteratively improving their own outputs, the approach reduces blind hallucination.
Prompt tuning allows adjusting the instructional prompts provided to LLMs during fine-tuning for desired behaviors.
The SynTra method employs a synthetic summarization task to minimize hallucination before transferring the model to real summarization datasets. The synthetic task provides input passages and asks models to summarize them through retrieval only, without abstraction. This trains models to rely completely on sourced content rather than hallucinating new information during summarization. SynTra is shown to reduce hallucination issues when fine-tuned models are deployed on target tasks.
UPRISE trains a universal prompt retriever that provides the optimal soft prompt for few-shot learning on unseen downstream tasks. By retrieving effective prompts tuned on a diverse set of tasks, the model learns to generalize and adapt to new tasks where it lacks training examples. This enhances performance without requiring task-specific tuning.
Novel Model Architectures
FLEEK is a system focused on assisting human fact-checkers and validators. It automatically identifies potentially verifiable factual claims made in a given text. FLEEK transforms these check-worthy statements into queries, retrieves related evidence from knowledge bases, and provides this contextual information to human validators to effectively verify document accuracy and revision needs.
The CAD decoding approach reduces hallucination in language generation through context-aware decoding. Specifically, CAD amplifies the differences between an LLM's output distribution when conditioned on a context versus generated unconditionally. This discourages contradicting contextual evidence, steering the model towards grounded generations.
DoLA mitigates factual hallucinations by contrasting logits from different layers of transformer networks. Since factual knowledge tends to be localized in certain middle layers, amplifying signals from those factual layers through DoLA's logit contrasting reduces incorrect factual generations.
The THAM framework introduces a regularization term during training to minimize the mutual information between inputs and hallucinated outputs. This helps increase the model's reliance on given input context rather than untethered imagination, reducing blind hallucinations.
Grounding LLM generations in structured knowledge prevents unbridled speculation and fabrication.
The RHO model identifies entities in a conversational context and links them to a knowledge graph (KG). Related facts and relations about those entities are retrieved from the KG and fused into the context representation provided to the LLM. This knowledge-enriched context steering reduces hallucinations in dialogue by keeping responses tied to grounded facts about mentioned entities/events.
HAR creates counterfactual training datasets containing model-generated hallucinations to better teach grounding. Given a factual passage, models are prompted to introduce hallucinations or distortions generating an altered counterfactual version. Fine-tuning on this data forces models to better ground content in the original factual sources, reducing improvisation.
- Coach – Interactive framework which answers user queries but also asks for corrections to improve.
- R-Tuning – Refusal-aware tuning refuses unsupported questions identified through training-data knowledge gaps.
- TWEAK – Decoding method that ranks generations based on how well hypotheses support input facts.
Challenges and Limitations
Despite promising progress, some key challenges remain in mitigating hallucinations:
- Techniques often trade off quality, coherence and creativity for veracity.
- Difficulty in rigorous evaluation beyond limited domains. Metrics do not capture all nuances.
- Many methods are computationally expensive, requiring extensive retrieval or self-reasoning.
- Heavily depend on training data quality and external knowledge sources.
- Hard to guarantee generalizability across domains and modalities.
- Fundamental roots of hallucination like over-extrapolation remain unsolved.
Addressing these challenges likely requires a multilayered approach combining training data enhancements, model architecture improvements, fidelity-enhancing losses, and inference-time techniques.
The Road Ahead
Hallucination mitigation for LLMs remains an open research problem with active progress. Some promising future directions include:
- Hybrid techniques: Combine complementary approaches like retrieval, knowledge grounding and feedback.
- Causality modeling: Enhance comprehension and reasoning.
- Online knowledge integration: Keep world knowledge updated.
- Formal verification: Provide mathematical guarantees on model behaviors.
- Interpretability: Build transparency into mitigation techniques.
As LLMs continue proliferating across high-stakes domains, developing robust solutions to curtail hallucinations will be key to ensuring their safe, ethical and reliable deployment. The techniques surveyed in this article provide an overview of the techniques proposed so far, where more open research challenges remain. Overall there is a positive trend towards enhancing model factuality, but continued progress necessitates addressing limitations and exploring new directions like causality, verification, and hybrid methods. With diligent efforts from researchers across disciplines, the dream of powerful yet trustworthy LLMs can be translated into reality.
The post Tackling Hallucination in Large Language Models: A Survey of Cutting-Edge Techniques appeared first on Unite.AI.