X-PLUG / WritingBenchLinks
WritingBench: A Comprehensive Benchmark for Generative Writing
☆156Updated last month
Alternatives and similar repositories for WritingBench
Users that are interested in WritingBench are comparing it to the libraries listed below
Sorting:
- ☆180Updated 9 months ago
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆264Updated 6 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆254Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆186Updated 7 months ago
- ☆147Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆47Updated last year
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆260Updated last year
- ☆76Updated last year
- ☆322Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆284Updated 2 years ago
- ☆51Updated last year
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆118Updated 7 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆140Updated last year
- ☆140Updated 8 months ago
- The demo, code and data of FollowRAG☆75Updated 7 months ago
- A Comprehensive Survey on Long Context Language Modeling☆222Updated 2 months ago
- ☆163Updated 3 months ago
- Collection of papers for scalable automated alignment.☆93Updated last year
- Counting-Stars (★)☆83Updated 2 months ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆101Updated 11 months ago
- a-m-team's exploration in large language modeling☆195Updated 8 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆146Updated last month
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆130Updated 10 months ago
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆196Updated 2 weeks ago
- CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation☆64Updated 8 months ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆61Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆415Updated 7 months ago
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆304Updated last year
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆58Updated 6 months ago
- BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent☆164Updated last month