GoodAI / goodai-ltm-benchmarkLinks

A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:

☆76

Alternatives and similar repositories for goodai-ltm-benchmark

Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below

Sorting:

zbambergerNLP / strategic-debate-tot
A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments
☆87Updated 10 months ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 6 months ago
agiresearch / Formal-LLM
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
☆125Updated last year
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆105Updated 7 months ago
rhyang2021 / SELFGOAL
Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".
☆68Updated last year
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆99Updated 3 months ago
zhudotexe / redel
ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)
☆83Updated 4 months ago
stunningpixels / lou-eval
Track the progress of LLM context utilisation
☆55Updated 3 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆103Updated 5 months ago
phunterlau / paper_without_code
LLM reads a paper and produce a working prototype
☆58Updated 3 months ago
yueqis / API-Based-Agent
☆54Updated last month
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆175Updated 5 months ago
allenai / clin
☆83Updated last year
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆53Updated 2 months ago
oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆59Updated 7 months ago
Technoculture / personal-graph
Simple Graph Memory for AI applications
☆89Updated 2 months ago
catid / self-discover
Implementation of Google's SELF-DISCOVER
☆298Updated 11 months ago
aymeric-roucher / agent_reasoning_benchmark
🔧 Compare how Agent systems perform on several benchmarks. 📊🚀
☆99Updated 9 months ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆45Updated 3 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
weaviate / structured-rag
Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models
☆111Updated 3 months ago
joshuacnf / Ctrl-G
☆88Updated 7 months ago
orionw / promptriever
The first dense retrieval model that can be prompted like an LM
☆81Updated 2 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
HazyResearch / eclair-agents
Automating enterprise workflows with multimodal agents
☆108Updated 9 months ago
Ziems / arbor
A framework for optimizing DSPy programs with RL
☆96Updated this week
normal-computing / extended-mind-transformers
☆123Updated last year
CyrusNuevoDia / llegos
A strongly typed Python DSL for developing message passing multi agent systems
☆53Updated last year