tml1026 / RoleCraftLinks
☆21Updated last year
Alternatives and similar repositories for RoleCraft
Users that are interested in RoleCraft are comparing it to the libraries listed below
Sorting:
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆101Updated 11 months ago
- ☆142Updated 8 months ago
- ☆163Updated last year
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆73Updated 8 months ago
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆68Updated 6 months ago
- ☆147Updated last year
- Implementation of "ACL'24: When Do LLMs Need Retrieval Augmentation? Mitigating LLMs’ Overconfidence Helps Retrieval Augmentation"☆24Updated last year
- ☆54Updated last year
- ☆51Updated last year
- Official github repo for E-Eval, a Chinese K12 education evaluation benchmark for LLMs.☆29Updated last year
- ☆334Updated last year
- Code implementation of synthetic continued pretraining☆148Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆137Updated 9 months ago
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'☆32Updated 8 months ago
- On Memorization of Large Language Models in Logical Reasoning☆74Updated 10 months ago
- ☆87Updated 2 years ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆145Updated last year
- ☆51Updated last year
- Generative Judge for Evaluating Alignment☆250Updated 2 years ago
- ☆104Updated last year
- CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation☆64Updated 8 months ago
- Awesome papers for role-playing with language models☆216Updated last year
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆63Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆90Updated last year
- A self-ailgnment method for role-play. Benchmark for role-play. Resources for "Large Language Models are Superpositions of All Characters…☆210Updated last year
- MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs.☆86Updated last year
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆135Updated last year
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"☆47Updated 2 years ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆46Updated last year
- repository for CharacterChat, a personalized social support system☆75Updated last year