llm-editing / HalluEditBench

Can Knowledge Editing Really Correct Hallucinations?

☆11

Alternatives and similar repositories for HalluEditBench:

Users that are interested in HalluEditBench are comparing it to the libraries listed below

GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆35Updated 7 months ago
icip-cas / Verifier-Engineering
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆54Updated last month
yuzhaouoe / SAE-based-representation-engineering
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆39Updated last month
THU-KEG / RM-Bench
☆15Updated 2 months ago
GXimingLu / IPA
Codebase for Inference-Time Policy Adapters
☆23Updated last year
ADaM-BJTU / W2SG
The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”
☆15Updated 10 months ago
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆53Updated 10 months ago
GAIR-NLP / self-improvement-reversal
☆13Updated 6 months ago
technion-cs-nlp / hallucination-mitigation
☆23Updated last month
rookie-joe / AutoPSV
☆39Updated 2 months ago
GAIR-NLP / benbench
Benchmarking Benchmark Leakage in Large Language Models
☆47Updated 8 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆44Updated last month
shizhediao / R-Tuning
[NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…
☆106Updated 6 months ago
GAIR-NLP / BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
☆31Updated 5 months ago
BunsenFeng / AbstainQA
AbstainQA, ACL 2024
☆25Updated 3 months ago
WeiminXiong / IPR
Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)
☆49Updated 3 months ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆33Updated last year
yihuaihong / ConceptVectors
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
☆32Updated 3 months ago
GAIR-NLP / weak-to-strong-reasoning
☆57Updated 4 months ago
ruiqi-zhong / nlparam
Augmenting Statistical Models with Natural Language Parameters
☆22Updated 4 months ago
tatsu-lab / test_set_contamination
☆36Updated last year
AlphaPav / mem-kk-logic
On Memorization of Large Language Models in Logical Reasoning
☆19Updated 2 months ago
Zhenwen-NLP / MathChat
Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inte…
☆16Updated 7 months ago
OpenMatch / RAG-DDR
This is the code repo for our paper "RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards".
☆21Updated last month
ChengpengLi1003 / DotaMath
☆26Updated 3 weeks ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆88Updated 3 months ago
starrYYxuan / LeCo
This the implementation of LeCo
☆30Updated 6 months ago
GAIR-NLP / MoPS
[ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"
☆33Updated 6 months ago
CriticBench / CriticBench
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆20Updated 10 months ago
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆55Updated 6 months ago