facebookresearch / AbstentionBenchLinks
A holistic benchmark for LLM abstention
☆68Updated 4 months ago
Alternatives and similar repositories for AbstentionBench
Users that are interested in AbstentionBench are comparing it to the libraries listed below
Sorting:
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆35Updated 10 months ago
- ☆50Updated 11 months ago
- ☆19Updated 5 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆29Updated 3 months ago
- Exploration of automated dataset selection approaches at large scales.☆53Updated 10 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆46Updated 9 months ago
- ☆45Updated 6 months ago
- Reinforcing General Reasoning without Verifiers☆93Updated 6 months ago
- ☆33Updated last year
- ☆72Updated 6 months ago
- ☆17Updated 5 months ago
- [ACL 2025] An inference-time decoding strategy with adaptive foresight sampling☆106Updated 7 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Updated 5 months ago
- ☆70Updated 7 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Updated 3 months ago
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆28Updated 2 months ago
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆38Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆41Updated 2 weeks ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆56Updated 2 months ago
- ☆22Updated 5 months ago
- Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoning☆42Updated 2 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆34Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last month
- ☆110Updated 8 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Updated 5 months ago
- ☆19Updated 10 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Updated 3 weeks ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆57Updated 11 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆43Updated last year
- ☆126Updated this week