GoodAI / goodai-ltm-benchmarkLinks
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆79Updated 10 months ago
Alternatives and similar repositories for goodai-ltm-benchmark
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
Sorting:
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆90Updated 3 weeks ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 9 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆109Updated 10 months ago
- ☆58Updated 4 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆57Updated 5 months ago
- Just a bunch of benchmark logs for different LLMs☆118Updated last year
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆127Updated last year
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆69Updated last year
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 9 months ago
- EcoAssistant: using LLM assistant more affordably and accurately☆133Updated last year
- ☆86Updated last year
- ☆102Updated 9 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆88Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆63Updated 10 months ago
- LLM reads a paper and produce a working prototype☆57Updated 6 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆58Updated last week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆188Updated 7 months ago
- Train your own SOTA deductive reasoning model☆108Updated 7 months ago
- ☆85Updated 2 years ago
- The first dense retrieval model that can be prompted like an LM☆89Updated 5 months ago
- A strongly typed Python DSL for developing message passing multi agent systems☆53Updated last year
- ☆68Updated 5 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆102Updated 2 months ago
- ☆41Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆105Updated last month
- ☆55Updated 11 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆110Updated 6 months ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆70Updated last year
- Automating enterprise workflows with multimodal agents☆112Updated last year