thu-coai / BARRELLinks
☆15Updated last month
Alternatives and similar repositories for BARREL
Users that are interested in BARREL are comparing it to the libraries listed below
Sorting:
- Pitfalls of Rule- and Model-based Verifiers: A Case Study on Mathematical Reasoning.☆21Updated last month
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆21Updated 3 weeks ago
- WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆27Updated last month
- ☆53Updated this week
- The code and data for the paper JiuZhang3.0☆47Updated last year
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆62Updated 7 months ago
- ☆15Updated 3 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Updated last year
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 7 months ago
- ☆14Updated last year
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆63Updated last month
- [ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling☆15Updated 7 months ago
- ☆22Updated last year
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆127Updated this week
- Extending context length of visual language models☆11Updated 7 months ago
- Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"☆15Updated 5 months ago
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆24Updated 7 months ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆31Updated last week
- The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free☆45Updated 2 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆63Updated 7 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆51Updated last month
- Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆22Updated 4 months ago
- ☆14Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 3 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- ☆52Updated 5 months ago
- Official Code Repository for [AutoScale–Automatic Prediction of Compute-optimal Data Compositions for Training LLMs]☆12Updated 5 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆33Updated 9 months ago
- ☆18Updated last year
- ☆33Updated 10 months ago