GoodAI / goodai-ltm-benchmarkLinks
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆78Updated 9 months ago
Alternatives and similar repositories for goodai-ltm-benchmark
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
Sorting:
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆89Updated 11 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆110Updated 9 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated 7 months ago
- ☆56Updated 2 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆85Updated this week
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- ☆86Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆62Updated 9 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆182Updated 6 months ago
- The first dense retrieval model that can be prompted like an LM☆86Updated 4 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 11 months ago
- Track the progress of LLM context utilisation☆55Updated 5 months ago
- Evaluating LLMs with CommonGen-Lite☆91Updated last year
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆127Updated last year
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆102Updated last month
- ☆41Updated last year
- accompanying material for sleep-time compute paper☆108Updated 4 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 8 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆69Updated last year
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆89Updated last year
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆49Updated 4 months ago
- Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.☆147Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆150Updated 7 months ago
- ☆99Updated 8 months ago
- ☆159Updated last year
- LILO: Library Induction with Language Observations☆88Updated last year
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆146Updated 6 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform☆90Updated last week
- Small, simple agent task environments for training and evaluation☆18Updated 10 months ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆61Updated 5 months ago