SkyRiver-2000 / RuleArenaLinks
[ACL 2025] RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
☆15Updated 3 months ago
Alternatives and similar repositories for RuleArena
Users that are interested in RuleArena are comparing it to the libraries listed below
Sorting:
- Model Selection with Large Language Models for Reasoning (EMNLP2023 Findings)☆30Updated last year
- Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.☆20Updated last year
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆185Updated 5 months ago
- The code and data for paper "Large Language Models are few(1)-shot Table Reasoners" [EACL2023]☆47Updated last year
- Codes for papers on Large Language Models Personalization (LaMP)☆170Updated 7 months ago
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆64Updated 2 months ago
- Enable Comprehensive LLM Evaluation on Graph Reasoning☆73Updated 3 months ago
- A Survey of Hallucination in Large Foundation Models☆54Updated last year
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆88Updated 5 months ago
- [NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"☆113Updated 2 years ago
- Official code for paper Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation☆20Updated last year
- ☆19Updated last month
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆130Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆138Updated last year
- The source code of paper "Semantic Enhanced Text-to-SQL Parsing via Iteratively Learning Schema Linking Graph" in KDD2022.☆15Updated 2 years ago
- Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"☆42Updated 11 months ago
- ☆31Updated 4 months ago
- This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction☆13Updated 11 months ago
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆28Updated last year
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆116Updated last year
- The awesome agents in the era of large language models☆69Updated last year
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆109Updated 2 months ago
- Code for "Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models", ICLR 2024 Oral.☆21Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆77Updated last year
- ☆45Updated last year
- ☆100Updated last year
- A comprehensive paper list of Table-based Question Answering.☆36Updated 2 years ago
- ☆35Updated last year
- This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.☆549Updated 11 months ago
- Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets and…☆51Updated 4 months ago