IAAR-Shanghai / UHGEvalLinks

[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.

☆178

Alternatives and similar repositories for UHGEval

Users that are interested in UHGEval are comparing it to the libraries listed below

Sorting:

IAAR-Shanghai / CTGSurvey
Controllable Text Generation for Large Language Models: A Survey
☆195Updated last year
krystalan / Multi-hopRC
notes for Multi-hop Reading Comprehension and open-domain question answering
☆88Updated 3 years ago
HSLiu-Initial / CtrlA
This includes the original implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.
☆64Updated last year
IAAR-Shanghai / ICSFSurvey
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…
☆171Updated 11 months ago
OPPO-PersonalAI / TaskCraft
A library for generating difficulty-scalable, multi-tool, and verifiable agentic tasks with execution trajectories.
☆169Updated 5 months ago
IAAR-Shanghai / Grimoire
Grimoire is All You Need for Enhancing Large Language Models
☆116Updated last year
syr-cn / AutoRefine
[NeurIPS 2025 Poster] Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning
☆111Updated 2 weeks ago
Ljyustc / SocraticLM
☆157Updated 8 months ago
Justherozen / FreeAL
[EMNLP 2023] FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models
☆93Updated last year
jordddan / Pruning-LLMs
The framework to prune LLMs to any size and any config.
☆94Updated last year
YangLinyi / GLUE-X
We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that …
☆93Updated 2 years ago
cmriat / l0
A scalable, end-to-end training pipeline for general-purpose agents
☆361Updated 5 months ago
yiyihum / da-code
[EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
☆81Updated 4 months ago
uw-nsl / TinyV
Your efficient and accurate answer verification system for RL training.
☆43Updated 5 months ago
longyuewangdcu / Document-MT-LLM
☆102Updated 2 years ago
huxiaosheng123 / open-llama2
从预训练到强化学习的中文llama2
☆87Updated 2 years ago
yafuly / MAGE
Machine-generated text detection in the wild (ACL 2024)
☆218Updated 9 months ago
Flitternie / GraphQ_IR
A Unified Intermediate Representation for Graph Query Languages
☆66Updated 2 years ago
Ablustrund / MPLSandbox
MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler a…
☆178Updated 7 months ago
CSHaitao / Awesome-LLMs-as-Judges
The official repo for paper, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods.
☆498Updated 4 months ago
smartyfh / LLM-Uncertainty-Bench
Benchmarking LLMs via Uncertainty Quantification
☆252Updated last year
ShuaiLyu0110 / SQL-o1
SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL
☆196Updated 6 months ago
WeixiangYAN / CodeScope
[ACL 2024] CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and …
☆100Updated last year
HUST-AI-HYZ / MemoryAgentBench
Open source code for Paper: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
☆171Updated this week
RLHFlow / Online-DPO-R1
Codebase for Iterative DPO Using Rule-based Rewards
☆263Updated 7 months ago
wyu97 / GenRead
Code and Checkpoints for "Generate rather than Retrieve: Large Language Models are Strong Context Generators" in ICLR 2023.
☆291Updated 2 years ago
OpenDCAI / RARE
Official implementation of RARE: Retrieval-Augmented Reasoning Modeling
☆185Updated 6 months ago
xiongsiheng / TG-LLM
[ACL 24 main] Large Language Models Can Learn Temporal Reasoning
☆64Updated 11 months ago
KodCode-AI / kodcode
✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork
☆297Updated 2 months ago
Davion-Liu / Awesome-Robustness-in-Information-Retrieval
A curated list of awesome papers related to adversarial attacks and defenses for information retrieval. If I missed any papers, feel free…
☆221Updated last year