EQ-bench / creative-writing-benchLinks
☆61Updated last month
Alternatives and similar repositories for creative-writing-bench
Users that are interested in creative-writing-bench are comparing it to the libraries listed below
Sorting:
- Official repo for Learning to Reason for Long-Form Story Generation☆72Updated 6 months ago
 - [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆439Updated last week
 - SWE Arena☆35Updated 3 months ago
 - Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆188Updated 7 months ago
 - ☆59Updated 9 months ago
 - [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆45Updated 3 months ago
 - ☆80Updated 2 weeks ago
 - A benchmark that challenges language models to code solutions for scientific problems☆151Updated last week
 - ☆45Updated last year
 - A benchmark for emotional intelligence in large language models☆370Updated last year
 - ☆26Updated 9 months ago
 - ☆122Updated 8 months ago
 - Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆159Updated 5 months ago
 - OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 9 months ago
 - Prompt-to-Leaderboard☆260Updated 5 months ago
 - ☆40Updated 7 months ago
 - accompanying material for sleep-time compute paper☆117Updated 6 months ago
 - A toolkit for describing model features and intervening on those features to steer behavior.☆209Updated 11 months ago
 - A simple unified framework for evaluating LLMs☆254Updated 6 months ago
 - Scrape and export data from the Open LLM Leaderboard.☆47Updated 10 months ago
 - An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆145Updated 8 months ago
 - Train your own SOTA deductive reasoning model☆109Updated 7 months ago
 - [ACL 2025] An inference-time decoding strategy with adaptive foresight sampling☆106Updated 5 months ago
 - Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆93Updated 5 months ago
 - Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆103Updated this week
 - EvaByte: Efficient Byte-level Language Models at Scale☆110Updated 6 months ago
 - Official code repository for Sketch-of-Thought (SoT)☆129Updated 5 months ago
 - ☆119Updated last year
 - ☆135Updated 6 months ago
 - ☆85Updated last week