agential-ai / agential
ππ§ Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!
β51Updated 2 weeks ago
Alternatives and similar repositories for agential:
Users that are interested in agential are comparing it to the libraries listed below
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".β64Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.β74Updated 5 months ago
- β50Updated 2 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"β81Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ54Updated 5 months ago
- The Library for LLM-based multi-agent applicationsβ75Updated this week
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Modelsβ¦β31Updated last year
- β27Updated this week
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)β35Updated last month
- β41Updated 2 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding forβ¦β24Updated 2 months ago
- β60Updated last week
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ24Updated 11 months ago
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agentsβ23Updated 2 years ago
- β38Updated 6 months ago
- β14Updated 4 months ago
- A set of utilities for running few-shot prompting experiments on large-language modelsβ118Updated last year
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM datasetβ14Updated 11 months ago
- β63Updated last month
- β48Updated 3 months ago
- β54Updated 5 months ago
- Scalable Meta-Evaluation of LLMs as Evaluatorsβ43Updated last year
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environmentsβ44Updated last month
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award β¦β38Updated 3 months ago
- β20Updated last week
- Code/data for MARG (multi-agent review generation)β38Updated 3 months ago
- Small, simple agent task environments for training and evaluationβ18Updated 3 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β48Updated 7 months ago
- Evaluating tool-augmented LLMs in conversation settingsβ77Updated 8 months ago