GoodAI / goodai-ltm-benchmarkLinks
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆76Updated 8 months ago
Alternatives and similar repositories for goodai-ltm-benchmark
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
Sorting:
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated 7 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆105Updated 8 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆88Updated 10 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆126Updated last year
- Just a bunch of benchmark logs for different LLMs☆120Updated last year
- ☆89Updated 7 months ago
- Track the progress of LLM context utilisation☆55Updated 4 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆73Updated 8 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆84Updated 5 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 7 months ago
- LLM reads a paper and produce a working prototype☆57Updated 4 months ago
- ☆46Updated last year
- ☆67Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆88Updated this week
- ☆123Updated last year
- Train your own SOTA deductive reasoning model☆104Updated 5 months ago
- Simple GRPO scripts and configurations.☆59Updated 6 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆177Updated 5 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆54Updated 3 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆102Updated last year
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆46Updated 3 months ago
- ☆74Updated last year
- ☆55Updated 2 months ago
- Simple Graph Memory for AI applications☆89Updated 3 months ago
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆32Updated last year
- Simple examples using Argilla tools to build AI☆53Updated 9 months ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆68Updated last year
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆146Updated 6 months ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆46Updated last year
- The first dense retrieval model that can be prompted like an LM☆83Updated 3 months ago