GoodAI / goodai-ltm-benchmarkLinks
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆76Updated 7 months ago
Alternatives and similar repositories for goodai-ltm-benchmark
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
Sorting:
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆87Updated 10 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 6 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆125Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆105Updated 7 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆68Updated last year
- accompanying material for sleep-time compute paper☆99Updated 3 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆83Updated 4 months ago
- Track the progress of LLM context utilisation☆55Updated 3 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 10 months ago
- Train your own SOTA deductive reasoning model☆103Updated 5 months ago
- LLM reads a paper and produce a working prototype☆58Updated 3 months ago
- ☆54Updated last month
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆175Updated 5 months ago
- ☆83Updated last year
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆53Updated 2 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆59Updated 7 months ago
- Simple Graph Memory for AI applications☆89Updated 2 months ago
- Implementation of Google's SELF-DISCOVER☆298Updated 11 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆99Updated 9 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆45Updated 3 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 6 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆111Updated 3 months ago
- ☆88Updated 7 months ago
- The first dense retrieval model that can be prompted like an LM☆81Updated 2 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆149Updated 6 months ago
- Automating enterprise workflows with multimodal agents☆108Updated 9 months ago
- A framework for optimizing DSPy programs with RL☆96Updated this week
- ☆123Updated last year
- A strongly typed Python DSL for developing message passing multi agent systems☆53Updated last year