WritingBench: A Comprehensive Benchmark for Generative Writing
☆165Dec 19, 2025Updated 3 months ago
Alternatives and similar repositories for WritingBench
Users that are interested in WritingBench are comparing it to the libraries listed below
Sorting:
- 知予人工智能:从学习者到研究者☆13Jan 20, 2025Updated last year
- ☆97Feb 20, 2026Updated last month
- REverse-Engineered Reasoning for Open-Ended Generation☆94Sep 10, 2025Updated 6 months ago
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆37Mar 9, 2025Updated last year
- ☆240Apr 23, 2024Updated last year
- ☆37Updated this week
- ☆82May 28, 2025Updated 9 months ago
- This repository includes the code implementation of the paper Improving Pacing in Long-Form Story Planning by Yichen Wang, Kevin Yang, Xi…☆17Nov 19, 2024Updated last year
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆31Oct 20, 2025Updated 5 months ago
- ☆76Jan 24, 2025Updated last year
- ☆21Jun 27, 2024Updated last year
- Arena-Hard-Auto: An automatic LLM benchmark.☆1,008Jun 21, 2025Updated 9 months ago
- ☆14Apr 15, 2023Updated 2 years ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆102Feb 20, 2025Updated last year
- A Chinese Open-Domain Dialogue System☆326Aug 16, 2023Updated 2 years ago
- ☆56Aug 10, 2024Updated last year
- ☆68Nov 26, 2024Updated last year
- ☆526Feb 4, 2026Updated last month
- A minimalist benchmarking tool designed to test the routine-generation capabilities of LLMs.☆27Nov 28, 2024Updated last year
- Code corresponding to our paper "Leveraging Dependency Forest for Neural Medical Relation Extraction" at EMNLP 2019☆20Nov 26, 2020Updated 5 years ago
- mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)☆228Jul 21, 2023Updated 2 years ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆224Jul 25, 2025Updated 7 months ago
- BitBrowser Automation System - Batch Google account processing tool (FastAPI + Vue 3 + Playwright)☆59Jan 23, 2026Updated last month
- Benchmarking LLMs with Challenging Tasks from Real Users☆247Nov 3, 2024Updated last year
- Evaluation for AI apps and agent☆44Jan 18, 2024Updated 2 years ago
- Official repository of Graph RAG-Tool Fusion and ToolLinkOS dataset.☆22Feb 13, 2025Updated last year
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆41Oct 17, 2023Updated 2 years ago
- S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models☆111Feb 13, 2026Updated last month
- 中文语料:大量人工标注样本,非常有价值 !!!☆11Aug 15, 2019Updated 6 years ago
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆39May 28, 2025Updated 9 months ago
- A Chainer implementation of a Convolutional Network model for relation classification in the SemEval Task 8 dataset. This model performs …☆17Jan 16, 2018Updated 8 years ago
- Clinical NLP concept extraction of ADEs in the 2018 n2c2 Adverse Drug Events and Medication Extraction (Track 2). Includes data preproce…☆16Nov 21, 2020Updated 5 years ago
- The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''☆112Aug 15, 2025Updated 7 months ago
- Official code and data of "3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset"☆12Dec 8, 2024Updated last year
- ☆11Dec 31, 2020Updated 5 years ago
- [NAACL'25] RuleR: Improving LLM Controllability by Rule-based Data Recycling☆14Sep 27, 2025Updated 5 months ago
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆303Jan 8, 2024Updated 2 years ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- 大模型多维度中文对齐评测基准 (ACL 2024)☆423Oct 25, 2025Updated 4 months ago