ServiceNow / insight-benchLinks
☆56Updated last month
Alternatives and similar repositories for insight-bench
Users that are interested in insight-bench are comparing it to the libraries listed below
Sorting:
- A banchmark list for evaluation of large language models.☆151Updated 2 months ago
- augmented LLM with self reflection☆135Updated 2 years ago
- ☆317Updated last year
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆112Updated 4 months ago
- ☆241Updated last year
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆86Updated 3 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆109Updated last year
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆368Updated last year
- Code for paper Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding☆86Updated last year
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆156Updated 11 months ago
- ☆157Updated last month
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆159Updated 2 weeks ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆113Updated 4 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆188Updated 3 months ago
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆110Updated last month
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆68Updated 9 months ago
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆181Updated last year
- InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks (ICML 2024)☆160Updated 6 months ago
- Official implementation of the paper Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers☆193Updated 2 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆134Updated last year
- This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'☆130Updated 6 months ago
- [NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?☆134Updated last year
- 🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…☆324Updated 2 weeks ago
- ☆37Updated 10 months ago
- ☆300Updated 4 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆67Updated last year
- ☆117Updated 10 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆216Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆122Updated last year
- (ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.019…☆190Updated last month