GoodAI / goodai-ltm-benchmarkLinks
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆83Updated last year
Alternatives and similar repositories for goodai-ltm-benchmark
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
Sorting:
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated last year
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆132Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆112Updated last year
- LLM reads a paper and produce a working prototype☆60Updated 10 months ago
- The first dense retrieval model that can be prompted like an LM☆90Updated 9 months ago
- ☆105Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆98Updated 4 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆189Updated 11 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆175Updated last year
- ☆87Updated 2 years ago
- Repository for the paper Stream of Search: Learning to Search in Language☆153Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆69Updated last year
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆62Updated 10 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆69Updated last year
- Evaluating LLMs with CommonGen-Lite☆94Updated last year
- Train your own SOTA deductive reasoning model☆107Updated 11 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆90Updated last month
- Code for ExploreTom☆90Updated 7 months ago
- ☆137Updated 10 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆65Updated 9 months ago
- accompanying material for sleep-time compute paper☆119Updated 9 months ago
- ☆61Updated 7 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- ☆68Updated last year
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆128Updated last year
- Track the progress of LLM context utilisation☆55Updated 9 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆90Updated 10 months ago
- ☆177Updated 11 months ago