☆28Feb 13, 2026Updated 2 weeks ago
Alternatives and similar repositories for spinbench
Users that are interested in spinbench are comparing it to the libraries listed below
Sorting:
- ☆33Oct 31, 2024Updated last year
- Recycling diverse models☆46Jan 18, 2023Updated 3 years ago
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆67Sep 13, 2025Updated 5 months ago
- Code and data for the paper: Competing Large Language Models in Multi-Agent Gaming Environments☆95Jan 26, 2026Updated last month
- [NeurIPS'23] Binary Classification with Confidence Difference☆10May 13, 2024Updated last year
- MemRec☆37Jan 16, 2026Updated last month
- [NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"☆161Oct 28, 2025Updated 4 months ago
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.☆10May 16, 2024Updated last year
- A supervised fine-tuning method for controllable reasoning length in large language models (一种通过有监督微调实现大语言模型思考长度可控的方法)☆10May 8, 2025Updated 9 months ago
- ☆10Jul 4, 2024Updated last year
- AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback☆17Oct 15, 2025Updated 4 months ago
- ☆13May 9, 2024Updated last year
- [ICML 2023] Protecting Language Generation Models via Invisible Watermarking☆13Sep 8, 2023Updated 2 years ago
- Diffusing States and Matching Scores: A New Framework for Imitation Learning☆22Nov 16, 2024Updated last year
- ☆11Oct 11, 2023Updated 2 years ago
- [JMLR] Gradual Domain Adaptation: Theory and Algorithms☆11Jan 14, 2025Updated last year
- This book is a practical guide to basic theory, models, methods and analyses that can be used to study human physiology, behaviour and co…☆14Mar 14, 2019Updated 6 years ago
- Repository for the NeurIPS 2023 paper "Beyond Confidence: Reliable Models Should Also Consider Atypicality"☆13Apr 21, 2024Updated last year
- Towards a Mechanistic Understanding of Large Reasoning Models: A Survey of Training, Inference, and Failures☆30Jan 29, 2026Updated last month
- Repo for our AKBC-2021 paper: Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering☆10Oct 10, 2021Updated 4 years ago
- ☆11Mar 20, 2023Updated 2 years ago
- Y Social REST API Server☆12Feb 10, 2026Updated 2 weeks ago
- A minimum demo for PyTorch distributed extension functionality for collectives.☆15Jul 29, 2024Updated last year
- POSIX: A Prompt Sensitivity Index for Language Models☆13Nov 13, 2024Updated last year
- Kernel Library Wheel for SGLang☆17Updated this week
- Code and dataset for the ICLR 2024 paper "Thought Propagation: An analogical Approach to Complex Reasoning with Large Language Models."☆17Mar 4, 2024Updated last year
- Text Adventure Learning Environment Suite - Benchmark to evaluate language models on interactive text environments.☆25Feb 18, 2026Updated last week
- [NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality☆19Oct 22, 2025Updated 4 months ago
- ☆12Oct 23, 2022Updated 3 years ago
- [NeurIPS 2022] "Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks"☆13Nov 11, 2022Updated 3 years ago
- A record of reading list on some MLsys popular topic☆22Mar 20, 2025Updated 11 months ago
- Polymath is a human readable, interoperable, long-term focused markup language for information storage and display☆11Jun 9, 2021Updated 4 years ago
- Low Precision Arithmetic Simulation in PyTorch - extension for posit and beyond☆16Dec 9, 2025Updated 2 months ago
- [ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks☆14Feb 6, 2024Updated 2 years ago
- (ACL 2025) 🔥🔥🔥Code for "Empowering Multimodal Large Language Models with Evol-Instruct"☆20May 15, 2025Updated 9 months ago
- High Performance Sorting Based Distributed memory K-mer counter☆15Dec 8, 2025Updated 2 months ago
- [NeurIPS'22] Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork. Haotao Wang, Junyuan Hong,…☆15Nov 27, 2023Updated 2 years ago
- Official code repo for NeurIPS 2025 Spotlight paper, "Debate or Vote: Which Yields Better Decisions in Multi-Agent LLMs?"☆50Oct 15, 2025Updated 4 months ago
- Automated Capability Discovery via Foundation Model Self-Exploration☆67Feb 12, 2025Updated last year