X-PLUG / WritingBench
WritingBench: A Comprehensive Benchmark for Generative Writing
β69Updated last week
Alternatives and similar repositories for WritingBench:
Users that are interested in WritingBench are comparing it to the libraries listed below
- π WebThinker: Empowering Large Reasoning Models with Deep Research Capabilityβ147Updated 2 weeks ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"β124Updated 10 months ago
- β143Updated 9 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialoguesβ83Updated 9 months ago
- The demo, code and data of FollowRAGβ71Updated this week
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimationβ78Updated 5 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agentβ305Updated this week
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Modelsβ41Updated last year
- EMNLP'2024: Knowledge Verification to Nip Hallucination in the Budβ22Updated last year
- β46Updated 10 months ago
- β81Updated last year
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.β240Updated 5 months ago
- β97Updated last year
- β97Updated last year
- β140Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ147Updated 7 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"β236Updated last week
- β63Updated 3 months ago
- A curated list of awesome works in Routing LLMs paradigm (π Welcome to submit your contributions to this code repository)β30Updated last month
- CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generationβ47Updated 2 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuningβ252Updated last year
- β168Updated last year
- δΈζε€§θ―θ¨ζ¨‘εθ―ζ΅η¬¬δΊζβ70Updated last year
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenariosβ65Updated 4 months ago
- [ACL 2024 Findings] Learning Fine-Grained Grounded Citations for Attributed Large Language Modelsβ18Updated 6 months ago
- β146Updated last month
- Awesome papers for role-playing with language modelsβ186Updated 5 months ago
- β130Updated 3 months ago
- β234Updated 5 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.β282Updated last week