bespokelabsai / awesome-rlLinks
☆11Updated last month
Alternatives and similar repositories for awesome-rl
Users that are interested in awesome-rl are comparing it to the libraries listed below
Sorting:
- Improving Alignment and Robustness with Circuit Breakers☆208Updated 8 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆31Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆208Updated last month
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆80Updated 3 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆367Updated last week
- ☆97Updated 11 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆232Updated 3 weeks ago
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆81Updated last month
- Reproducible, flexible LLM evaluations☆204Updated 3 weeks ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆215Updated 3 weeks ago
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆228Updated 9 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆225Updated 8 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆77Updated 6 months ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆98Updated 3 months ago
- ☆81Updated 7 months ago
- ☆174Updated last month
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆89Updated 3 weeks ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆54Updated 2 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago
- 🚀 SWE-bench Goes Live!☆24Updated last week
- ☆64Updated last month
- 【ACL 2024】 SALAD benchmark & MD-Judge☆147Updated 2 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆142Updated last year
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆155Updated this week
- A simple unified framework for evaluating LLMs☆215Updated last month
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆67Updated 2 months ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆257Updated this week
- ☆70Updated 4 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆185Updated 6 months ago