tml1026 / RoleCraftLinks
☆22Updated last year
Alternatives and similar repositories for RoleCraft
Users that are interested in RoleCraft are comparing it to the libraries listed below
Sorting:
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆97Updated 9 months ago
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆63Updated last year
- ☆162Updated 9 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆70Updated 6 months ago
- Implementation of "ACL'24: When Do LLMs Need Retrieval Augmentation? Mitigating LLMs’ Overconfidence Helps Retrieval Augmentation"☆25Updated last year
- ☆127Updated 6 months ago
- ☆54Updated last year
- MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs.☆84Updated last year
- repository for CharacterChat, a personalized social support system☆75Updated last year
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆63Updated 11 months ago
- On Memorization of Large Language Models in Logical Reasoning☆72Updated 7 months ago
- ☆106Updated last year
- ☆86Updated last year
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'☆31Updated 6 months ago
- ☆12Updated last year
- ☆21Updated last year
- ☆58Updated last year
- Awesome papers for role-playing with language models☆210Updated last year
- ☆147Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆138Updated 6 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆44Updated last year
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆51Updated last year
- A Bilingual Role Evaluation Benchmark for Large Language Models☆42Updated last year
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆28Updated last year
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆66Updated 3 months ago
- The GitHub repository for the paper "Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning" accepte…☆18Updated last year
- ☆51Updated last year
- [ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use☆100Updated last year
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆89Updated last year
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆82Updated 2 years ago