thu-coai / BARRELLinks

☆15

Alternatives and similar repositories for BARREL

Users that are interested in BARREL are comparing it to the libraries listed below

Sorting:

hkust-nlp / RL-Verifier-Pitfalls
Pitfalls of Rule- and Model-based Verifiers: A Case Study on Mathematical Reasoning.
☆21Updated last month
Kamichanw / CoS
[ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"
☆21Updated 3 weeks ago
weizhepei / WebAgent-R1
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
☆27Updated last month
SkyworkAI / MindLink
☆53Updated this week
RUCAIBox / JiuZhang3.0
The code and data for the paper JiuZhang3.0
☆47Updated last year
ZHZisZZ / weak-to-strong-search
[NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
☆62Updated 7 months ago
sail-sg / ActivePRM
☆15Updated 3 months ago
Yifan-Song793 / GoodBadGreedy
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
☆30Updated last year
jinzhuoran / RAG-RewardBench
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆16Updated 7 months ago
googleinterns / localizing-paragraph-memorization
☆14Updated last year
inclusionAI / PromptCoT
A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…
☆63Updated last month
MozerWang / DEMO
[ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling
☆15Updated 7 months ago
mathllm / Step-Controlled_DPO
☆22Updated last year
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆127Updated this week
kiaia / GIRAFFE
Extending context length of visual language models
☆11Updated 7 months ago
byronBBL / Context-DPO
Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"
☆15Updated 5 months ago
DynaMath / DynaMath
A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
☆24Updated 7 months ago
satrams / rent-rl
RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.
☆31Updated last week
qiuzh20 / gated_attention
The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
☆45Updated 2 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆63Updated 7 months ago
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆51Updated last month
yiqingxyq / RepoST
Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"
☆22Updated 4 months ago
dqxiu / KAssess
☆14Updated last year
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 3 months ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
GuanghaoYe / Emergence-of-Thinking
☆52Updated 5 months ago
feiyang-k / AutoScale
Official Code Repository for [AutoScale–Automatic Prediction of Compute-optimal Data Compositions for Training LLMs]
☆12Updated 5 months ago
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆33Updated 9 months ago
gl-ybnbxb / BoNBoN
☆18Updated last year
swtheing / PF-PPO-RLHF
☆33Updated 10 months ago