tml1026 / RoleCraftLinks
β21Updated last year
Alternatives and similar repositories for RoleCraft
Users that are interested in RoleCraft are comparing it to the libraries listed below
Sorting:
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)β93Updated 7 months ago
- π An unofficial implementation of Self-Alignment with Instruction Backtranslation.β139Updated 4 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenariosβ69Updated 4 months ago
- β147Updated last year
- β104Updated 4 months ago
- β83Updated last year
- Unleashing the Power of Cognitive Dynamics on Large Language Modelsβ63Updated last year
- β50Updated last year
- β159Updated 8 months ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"β82Updated 2 years ago
- MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs.β83Updated 10 months ago
- Generative Judge for Evaluating Alignmentβ245Updated last year
- The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmarβ¦β52Updated 10 months ago
- β326Updated last year
- repository for CharacterChat, a personalized social support systemβ75Updated last year
- Implementation of "ACL'24: When Do LLMs Need Retrieval Augmentation? Mitigating LLMsβ Overconfidence Helps Retrieval Augmentation"β24Updated last year
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"β47Updated last year
- β36Updated last year
- Source code of paper: Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learningβ33Updated 3 months ago
- β98Updated last year
- Personality Alignment of Language Modelsβ45Updated 2 months ago
- β145Updated last year
- β54Updated last year
- A Bilingual Role Evaluation Benchmark for Large Language Modelsβ42Updated last year
- β231Updated last year
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"β134Updated last year
- On Memorization of Large Language Models in Logical Reasoningβ72Updated 5 months ago
- This the implementation of LeCoβ31Updated 8 months ago
- β111Updated this week
- [ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Useβ95Updated last year