IAAR-Shanghai/xVerify

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IAAR-Shanghai/xVerify)

IAAR-Shanghai / xVerify

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

☆149

Alternatives and similar repositories for xVerify

Users that are interested in xVerify are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

IAAR-Shanghai / SEAP
View on GitHub
☆23Jun 10, 2025Updated last year
IAAR-Shanghai / PGRAG
View on GitHub
PGRAG
☆53Jul 16, 2024Updated 2 years ago
IAAR-Shanghai / SafeRAG
View on GitHub
☆61Mar 11, 2025Updated last year
IAAR-Shanghai / NewsBench
View on GitHub
[ACL 2024 Main] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Jou…
☆34Jun 25, 2024Updated 2 years ago
TIGER-AI-Lab / General-Reasoner
View on GitHub
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆229Nov 27, 2025Updated 8 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
IAAR-Shanghai / ICSFSurvey
View on GitHub
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…
☆173Dec 7, 2024Updated last year
huggingface / Math-Verify
View on GitHub
☆1,172Jan 10, 2026Updated 6 months ago
hkust-nlp / dart-math
View on GitHub
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆120Dec 10, 2024Updated last year
IAAR-Shanghai / Grimoire
View on GitHub
Grimoire is All You Need for Enhancing Large Language Models
☆120Feb 29, 2024Updated 2 years ago
IAAR-Shanghai / DATG
View on GitHub
[ACL 2024]Controlled Text Generation for Large Language Model with Dynamic Attribute Graphs
☆40Sep 24, 2024Updated last year
zhaoxlpku / PromptCoT
View on GitHub
☆17Apr 10, 2025Updated last year
MasterVito / SwS
View on GitHub
Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoning
☆42Nov 11, 2025Updated 8 months ago
OpenBMB / RLPR
View on GitHub
Extrapolating RLVR to General Domains without Verifiers
☆205Aug 12, 2025Updated 11 months ago
DualityRL / multi-attempt
View on GitHub
☆19Mar 10, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
MingLiiii / Gradient_Unified
View on GitHub
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
☆20Jun 17, 2025Updated last year
IAAR-Shanghai / UHGEval
View on GitHub
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
☆181Jun 7, 2025Updated last year
PRIME-RL / Entropy-Mechanism-of-RL
View on GitHub
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆446Jul 11, 2025Updated last year
MemTensor / HaluMem
View on GitHub
HaluMem is the first operation level hallucination evaluation benchmark tailored to agent memory systems.
☆148Apr 30, 2026Updated 2 months ago
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,269Aug 27, 2025Updated 11 months ago
wizard-III / Archer2.0
View on GitHub
Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…
☆31Oct 10, 2025Updated 9 months ago
hiyouga / MathRuler
View on GitHub
A light-weight tool for evaluating LLMs in rule-based ways.
☆87Jun 19, 2025Updated last year
inclusionAI / PromptCoT
View on GitHub
A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…
☆132Jan 31, 2026Updated 5 months ago
shuyhere / about-super-alignment
View on GitHub
Feeling confused about super alignment? Here is a reading list
☆43Jan 9, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
BaohaoLiao / frac-cot
View on GitHub
[COLM 2026] An efficient 3D sampling method for long-CoT LLM.
☆16May 25, 2025Updated last year
SkyworkAI / Skywork-OR1
View on GitHub
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
☆740Jun 6, 2025Updated last year
tianyi-lab / MiP-Overthinking
View on GitHub
[COLM'25] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
☆39Jun 5, 2025Updated last year
microsoft / x-reasoner
View on GitHub
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
☆49Feb 4, 2026Updated 5 months ago
IAAR-Shanghai / CRUD_RAG
View on GitHub
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
☆400May 20, 2025Updated last year
VisuLogic-Benchmark / VisuLogic-Train
View on GitHub
☆21Jul 9, 2025Updated last year
microsoft / SuperRL
View on GitHub
☆15Sep 8, 2025Updated 10 months ago
Parallel-Reasoning / APR
View on GitHub
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
☆145Dec 17, 2025Updated 7 months ago
PRIME-RL / PRIME
View on GitHub
Scalable RL solution for advanced reasoning of language models
☆1,866Mar 18, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
IAAR-Shanghai / xFinder
View on GitHub
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
☆179Nov 14, 2025Updated 8 months ago
HKUNLP / critic-rl
View on GitHub
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆127May 6, 2025Updated last year
ShadeCloak / ADORA
View on GitHub
☆47Apr 9, 2025Updated last year
WooooDyy / BMMR
View on GitHub
Code and resources for the NeurIPS 2025 Paper "BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset" by Zhiheng X…
☆18Oct 14, 2025Updated 9 months ago
xufangzhi / Genius
View on GitHub
[ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework
☆72Jun 1, 2025Updated last year
neulab / VisualPuzzles
View on GitHub
☆18Nov 30, 2025Updated 7 months ago
RM-R1-UIUC / RM-R1
View on GitHub
[ICLR'26] RM-R1: Unleashing the Reasoning Potential of Reward Models
☆167Jun 26, 2025Updated last year