thu-coai / BARRELLinks
☆15Updated last month
Alternatives and similar repositories for BARREL
Users that are interested in BARREL are comparing it to the libraries listed below
Sorting:
- ☆15Updated 2 months ago
- BeHonest: Benchmarking Honesty in Large Language Models☆34Updated 10 months ago
- Reproducing R1 for Code with Reliable Rewards☆10Updated 2 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆22Updated 6 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆61Updated 6 months ago
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆24Updated 7 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆29Updated 7 months ago
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆16Updated last year
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Updated 11 months ago
- ☆22Updated 11 months ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆35Updated last month
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 4 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 6 months ago
- instruction-following benchmark for large reasoning models☆34Updated last month
- ☆38Updated last year
- ☆13Updated 6 months ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆13Updated last year
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆32Updated 9 months ago
- Revisiting Mid-training in the Era of RL Scaling☆62Updated 2 months ago
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆24Updated 4 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆63Updated 6 months ago
- ☆46Updated 8 months ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆29Updated 3 weeks ago
- ☆14Updated last year
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆30Updated last month
- ☆18Updated last year
- Official Code Repository for [AutoScale–Automatic Prediction of Compute-optimal Data Compositions for Training LLMs]☆12Updated 4 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆26Updated last year