GoodAI / goodai-ltm-benchmarkLinks
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆82Updated last year
Alternatives and similar repositories for goodai-ltm-benchmark
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
Sorting:
- Mixing Language Models with Self-Verification and Meta-Verification☆111Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 11 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆96Updated 3 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆132Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆175Updated last year
- The first dense retrieval model that can be prompted like an LM☆90Updated 8 months ago
- ☆86Updated 2 years ago
- Function Calling Benchmark & Testing☆92Updated last year
- Evaluating LLMs with CommonGen-Lite☆93Updated last year
- Train your own SOTA deductive reasoning model☆107Updated 10 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆65Updated 8 months ago
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆90Updated 2 years ago
- accompanying material for sleep-time compute paper☆118Updated 8 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆190Updated 10 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆90Updated last month
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆120Updated 2 months ago
- ☆41Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆152Updated 11 months ago
- Track the progress of LLM context utilisation☆55Updated 9 months ago
- Synthetic Data for LLM Fine-Tuning☆120Updated 2 years ago
- LILO: Library Induction with Language Observations☆90Updated last year
- LLM reads a paper and produce a working prototype☆60Updated 9 months ago
- Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation☆49Updated 2 years ago
- Official repo for Learning to Reason for Long-Form Story Generation☆73Updated 9 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆103Updated 5 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated 2 years ago
- ☆45Updated 2 years ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- ☆129Updated 7 months ago