facebookresearch / AbstentionBenchLinks
A holistic benchmark for LLM abstention
☆68Updated 4 months ago
Alternatives and similar repositories for AbstentionBench
Users that are interested in AbstentionBench are comparing it to the libraries listed below
Sorting:
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆29Updated 3 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆46Updated 5 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆34Updated last year
- Exploration of automated dataset selection approaches at large scales.☆53Updated 10 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆35Updated 10 months ago
- ☆19Updated 5 months ago
- ☆17Updated 5 months ago
- ☆22Updated 5 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Updated 5 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆46Updated 9 months ago
- ☆72Updated 6 months ago
- ☆45Updated 6 months ago
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆33Updated last month
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆55Updated last week
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆41Updated 2 weeks ago
- Long Context Extension and Generalization in LLMs☆62Updated last year
- ☆50Updated 11 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Updated 3 months ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆56Updated 2 months ago
- Codebase for Instruction Following without Instruction Tuning☆36Updated last year
- Reinforcing General Reasoning without Verifiers☆93Updated 6 months ago
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆38Updated last year
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆41Updated 2 months ago
- [ACL 2025] An inference-time decoding strategy with adaptive foresight sampling☆106Updated 7 months ago
- ☆23Updated last year
- ☆70Updated 7 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last month
- [COLING'25] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?☆82Updated 11 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆43Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆119Updated 8 months ago