EQ-bench / creative-writing-benchLinks
☆31Updated 2 months ago
Alternatives and similar repositories for creative-writing-bench
Users that are interested in creative-writing-bench are comparing it to the libraries listed below
Sorting:
- Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…☆36Updated 2 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 2 months ago
- Lego for GRPO☆28Updated last month
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆31Updated 3 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 7 months ago
- entropix style sampling + GUI☆26Updated 7 months ago
- Lightweight tools for quick and easy LLM demo's☆28Updated 9 months ago
- Official repo for Learning to Reason for Long-Form Story Generation☆63Updated 2 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆30Updated 2 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆20Updated 6 months ago
- ☆35Updated last year
- A 7B parameter model for mathematical reasoning☆36Updated 4 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆140Updated 4 months ago
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- II-Thought-RL is our initial attempt at developing a large-scale, multi-domain Reinforcement Learning (RL) dataset☆20Updated 2 months ago
- ☆61Updated 3 weeks ago
- ☆115Updated 4 months ago
- SWE Arena☆34Updated 2 months ago
- ☆45Updated last year
- A framework for optimizing DSPy programs with RL☆76Updated this week
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆68Updated 3 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆148Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆41Updated last month
- ☆17Updated 5 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 5 months ago
- OpenPipe Reinforcement Learning Experiments☆25Updated 3 months ago
- ☆26Updated 5 months ago
- ☆63Updated last month
- Automated Capability Discovery via Foundation Model Self-Exploration☆52Updated 4 months ago