EQ-bench/creative-writing-bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/EQ-bench/creative-writing-bench)

EQ-bench / creative-writing-bench

☆121

Alternatives and similar repositories for creative-writing-bench

Users that are interested in creative-writing-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

EQ-bench / EQ-Bench
View on GitHub
A benchmark for emotional intelligence in large language models
☆444Jul 26, 2024Updated 2 years ago
EQ-bench / eqbench3
View on GitHub
☆67May 31, 2026Updated last month
lechmazur / writing
View on GitHub
This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, moti…
☆409Updated this week
allenai / IFBench
View on GitHub
☆160May 13, 2026Updated 2 months ago
phonism / CP-Zero
View on GitHub
Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.
☆18Apr 22, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
multimodal-art-projection / I-SHEEP
View on GitHub
I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment
☆17Jan 16, 2025Updated last year
0xWJ / code-judge
View on GitHub
☆24Oct 10, 2025Updated 9 months ago
multimodal-art-projection / REER_DeepWriter
View on GitHub
REverse-Engineered Reasoning for Open-Ended Generation
☆98Sep 10, 2025Updated 10 months ago
JackShDr / InfluentialRS
View on GitHub
Implementations of Influential Recommender System
☆12Oct 29, 2024Updated last year
WolframRavenwolf / MMLU-Pro
View on GitHub
MMLU-Pro eval results
☆15Aug 21, 2025Updated 11 months ago
Peiying-Yu / Table-Critic
View on GitHub
A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning.
☆21Aug 23, 2025Updated 11 months ago
lechmazur / generalization
View on GitHub
Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a sm…
☆72Apr 16, 2026Updated 3 months ago
novelty-bench / novelty-bench
View on GitHub
☆34Nov 27, 2025Updated 7 months ago
nexusflowai / NexusBench
View on GitHub
Nexusflow function call, tool use, and agent benchmarks.
☆29Dec 13, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
sparkle-reasoning / sparkle
View on GitHub
[NeurIPS'25] Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
☆16Dec 12, 2025Updated 7 months ago
ysy-phoenix / evalhub
View on GitHub
All-in-one benchmarking platform for evaluating LLM.
☆15Nov 12, 2025Updated 8 months ago
osoleve / glitchlings
View on GitHub
Enemies for your LLM
☆38Jan 20, 2026Updated 6 months ago
shirley-wu / daco
View on GitHub
[NeurIPS 2024 D&B Track] DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
☆14Mar 5, 2025Updated last year
HLR / SpartQA-baselines
View on GitHub
All the baselines and experiments settings on the SpartQA
☆12Apr 26, 2023Updated 3 years ago
SkyworkAI / Skywork-Reward-V2
View on GitHub
Scaling Preference Data Curation via Human-AI Synergy
☆152Jul 3, 2025Updated last year
MaximeRivest / brepl
View on GitHub
Universal REPL Bridge for LLMs - Tab completion, interactive prompts, TUI support
☆20Nov 24, 2025Updated 8 months ago
liushulinle / UloRL
View on GitHub
An Ultra-Long Output Reinforcement Learning Approach
☆23Jul 31, 2025Updated 11 months ago
SuperGPQA / SuperGPQA
View on GitHub
☆191Apr 30, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
lemon07r / SanityBoard
View on GitHub
Home of the SanityHarness Leaderboard website.
☆22May 13, 2026Updated 2 months ago
ivanfioravanti / asitop
View on GitHub
Perf monitoring CLI tool for Apple Silicon
☆16Jan 1, 2024Updated 2 years ago
chtmp223 / suri
View on GitHub
Suri: Multi-constraint instruction following for long-form text generation [EMNLP’24]
☆27Oct 3, 2025Updated 9 months ago
Strong-AI-Lab / ChatLogic
View on GitHub
☆16Dec 17, 2023Updated 2 years ago
cmpnd-ai / dspy-qwen-adapter
View on GitHub
A DSPy adapter tailored to Qwen 3+ suggested formatting patterns.
☆23Apr 29, 2026Updated 2 months ago
chenllliang / ParetoMNMT
View on GitHub
Source code for paper "On the Pareto Front of Multilingual Neural Machine Translation" @ NeurIPS 2023
☆17Sep 27, 2023Updated 2 years ago
xyjigsaw / Linux-Knowledge-Graph
View on GitHub
Knowledge Graph for Linux in Triples and Neo4j
☆13Aug 22, 2020Updated 5 years ago
sunnweiwei / PPP-Agent
View on GitHub
Training Proactive and Personalized LLM Agents
☆113Jan 20, 2026Updated 6 months ago
duterscmy / CD-MoE
View on GitHub
Official PyTorch implementation of CD-MOE
☆12Mar 18, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
LeeSureman / E5-Retrieval-Reproduction
View on GitHub
Use contrastive learning to train a large language model (LLM) as a retriever
☆12Jul 19, 2024Updated 2 years ago
THU-KEG / LongWriter-V
View on GitHub
[ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
☆24Mar 29, 2025Updated last year
sam-paech / spiral-bench
View on GitHub
☆53Dec 2, 2025Updated 7 months ago
kmad / dabench-rlm-eval
View on GitHub
Benchmark harness for evaluating DSPy RLMs on data analysis tasks (InfiAgent-DABench)
☆23Mar 22, 2026Updated 4 months ago
alon-albalak / online-data-mixing
View on GitHub
An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.
☆14Jan 9, 2024Updated 2 years ago
Hannibal046 / RWKV-howto
View on GitHub
possibly useful materials for learning RWKV language model.
☆27Jun 8, 2023Updated 3 years ago
HLR / SpartQA_generation
View on GitHub
Generating SpartQA dataset
☆16May 3, 2023Updated 3 years ago