allenai/IFBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/IFBench)

allenai / IFBench

☆160

Alternatives and similar repositories for IFBench

Users that are interested in IFBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

THU-KEG / VerIF
View on GitHub
[EMNLP 2025] Verification Engineering for RL in Instruction Following
☆57Mar 30, 2026Updated 3 months ago
Rainier-rq / verl-if
View on GitHub
Official implementation of the paper "Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following"
☆40Jan 11, 2026Updated 6 months ago
yuleiqin / RAIF
View on GitHub
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆32Oct 9, 2025Updated 9 months ago
kkk-an / UltraIF
View on GitHub
Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.
☆21Apr 3, 2025Updated last year
Junjie-Ye / MulDimIF
View on GitHub
[ACL 2026] A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
☆23Jul 10, 2026Updated 2 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
thu-coai / ComplexBench
View on GitHub
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆102Feb 20, 2025Updated last year
TIGER-AI-Lab / General-Reasoner
View on GitHub
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆229Nov 27, 2025Updated 7 months ago
EQ-bench / creative-writing-bench
View on GitHub
☆121Jun 24, 2026Updated last month
PKU-Baichuan-MLSystemLab / CFBench
View on GitHub
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
☆56Aug 26, 2024Updated last year
facebookresearch / multimodal_rewardbench
View on GitHub
Multimodal RewardBench
☆68Feb 21, 2025Updated last year
kohjingyu / multi-agent-computer-use
View on GitHub
Code for the multi-agent computer use project.
☆21Jul 3, 2026Updated 3 weeks ago
EPFLiGHT / FullyOpenMeditron
View on GitHub
We release Open Meditron, a fully open, clinician-audited medical training corpus and evaluation protocol that closes the open-vs-closed …
☆15May 15, 2026Updated 2 months ago
ekwinox117 / multi-challenge
View on GitHub
☆91Feb 5, 2025Updated last year
THU-KEG / Crab
View on GitHub
[CIKM 2025] Constraint Back-translation Improves Complex Instruction Following of Large Language Models
☆18May 23, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated last year
Tongyi-CCAI / Complex-IF
View on GitHub
☆34Jan 26, 2026Updated 5 months ago
mandyyyyii / east
View on GitHub
☆19Aug 4, 2025Updated 11 months ago
facebookresearch / AdvancedIF
View on GitHub
This is the github to open source benchmark AdvancedIF, see LAMA L1387358RCRO
☆37Nov 26, 2025Updated 7 months ago
meituan-longcat / Meeseeks
View on GitHub
A iterative feedback driven benchmark on LLM's instruction following ability
☆58May 25, 2026Updated 2 months ago
inclusionAI / Ring-V2
View on GitHub
Ring-V2 is a reasoning MoE LLM provided and open-sourced by InclusionAI.
☆98Oct 23, 2025Updated 9 months ago
THU-KEG / Agentic-Reward-Modeling
View on GitHub
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆134Jun 11, 2025Updated last year
lucy3 / whos_filtered
View on GitHub
☆15Oct 4, 2024Updated last year
facebookresearch / mexma
View on GitHub
MEXMA: Token-level objectives improve sentence representations
☆43Jan 6, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
allenai / DataDecide
View on GitHub
☆43Aug 20, 2025Updated 11 months ago
import-myself / Membench
View on GitHub
Membenchmark repository
☆55Nov 27, 2025Updated 7 months ago
LARK-AI-Lab / CodeScaler
View on GitHub
The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"
☆35Mar 26, 2026Updated 4 months ago
reka-ai / research-eval
View on GitHub
A benchmark to evaluate search-augmented LLMs
☆17Aug 28, 2025Updated 10 months ago
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated 3 weeks ago
YJiangcm / FollowBench
View on GitHub
[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
☆118Jun 12, 2025Updated last year
allenai / asta-bench
View on GitHub
☆123Updated this week
RenzeLou / Muffin
View on GitHub
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
☆16Oct 31, 2024Updated last year
wwxu21 / CUT
View on GitHub
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Feb 29, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
SalesforceAIResearch / PretrainRL-pipeline
View on GitHub
An automated data pipeline scaling RL to pretraining levels
☆76Jun 2, 2026Updated last month
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,809Updated this week
allenai / hybrid-preferences
View on GitHub
Learning to route instances for Human vs AI Feedback (ACL Main '25)
☆29Jul 23, 2025Updated last year
NovaSky-AI / SkyRL
View on GitHub
SkyRL: A Modular Full-stack RL Library for LLMs
☆2,093Updated this week
PrimeIntellect-ai / verifiers
View on GitHub
Our library for RL environments + evals
☆4,400Updated this week
viswavi / RLCF
View on GitHub
☆24Oct 23, 2025Updated 9 months ago
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,629Updated this week