GoodAI / goodai-ltm-benchmarkLinks
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆79Updated 11 months ago
Alternatives and similar repositories for goodai-ltm-benchmark
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
Sorting:
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 9 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆92Updated last month
- Mixing Language Models with Self-Verification and Meta-Verification☆109Updated 11 months ago
- The first dense retrieval model that can be prompted like an LM☆89Updated 6 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆128Updated last year
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆70Updated last year
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated 11 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆189Updated 8 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆100Updated last week
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆63Updated 11 months ago
- ☆104Updated 10 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆88Updated this week
- Just a bunch of benchmark logs for different LLMs☆118Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 10 months ago
- LLM reads a paper and produce a working prototype☆57Updated 7 months ago
- Evaluating LLMs with CommonGen-Lite☆91Updated last year
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆102Updated 3 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆122Updated 9 months ago
- ☆60Updated 4 months ago
- ☆86Updated last year
- Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation☆48Updated last year
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆90Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 7 months ago
- ☆122Updated 5 months ago
- Train your own SOTA deductive reasoning model☆108Updated 8 months ago
- ☆81Updated this week
- ☆85Updated 2 years ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆82Updated 7 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆105Updated 7 months ago