thu-coai / CharacterBenchLinks
[AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Models
☆16Updated last month
Alternatives and similar repositories for CharacterBench
Users that are interested in CharacterBench are comparing it to the libraries listed below
Sorting:
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆92Updated 6 months ago
- ☆83Updated last year
- Do Large Language Models Know What They Don’t Know?☆99Updated 9 months ago
- ☆55Updated last week
- self-adaptive in-context learning☆45Updated 2 years ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆83Updated last year
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMs☆40Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆88Updated 9 months ago
- ☆21Updated last year
- ☆43Updated 2 years ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆82Updated 2 years ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆50Updated last year
- Towards Systematic Measurement for Long Text Quality☆37Updated last year
- ☆28Updated last year
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)☆28Updated last year
- The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.☆22Updated 10 months ago
- A framework for editing the CoTs for better factuality☆51Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆134Updated 11 months ago
- Logiqa2.0 dataset - logical reasoning in MRC and NLI tasks☆99Updated 2 years ago
- [EMNLP 2023]This the repository of Harry Potter Dialogue Dataset.☆124Updated 10 months ago
- A Bilingual Role Evaluation Benchmark for Large Language Models☆42Updated last year
- ☆97Updated 3 months ago
- [EMNLP 2023] C-STS: Conditional Semantic Textual Similarity☆73Updated last year
- ☆30Updated 8 months ago
- Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"☆42Updated 10 months ago
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆42Updated last year
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆63Updated last month
- code for ACL2024-main: BatchEval: Towards Human-like Text Evaluation☆18Updated last year
- [EMNLP 2025] Verification Engineering for RL in Instruction Following☆34Updated this week
- Collection of papers for scalable automated alignment.☆93Updated 10 months ago