GoodAI / goodai-ltm-benchmark
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆68Updated 4 months ago
Alternatives and similar repositories for goodai-ltm-benchmark:
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆90Updated 3 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 8 months ago
- Track the progress of LLM context utilisation☆54Updated last week
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- Functional Benchmarks and the Reasoning Gap☆85Updated 6 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 4 months ago
- ☆40Updated 9 months ago
- ☆66Updated 11 months ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated last year
- ☆80Updated 3 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆122Updated 10 months ago
- ☆48Updated last year
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆66Updated 9 months ago
- ☆48Updated 5 months ago
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆25Updated 10 months ago
- A strongly typed Python DSL for developing message passing multi agent systems☆52Updated last year
- LLM reads a paper and produce a working prototype☆52Updated 2 weeks ago
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆68Updated 2 months ago
- ☆73Updated last year
- LMQL implementation of tree of thoughts☆34Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆77Updated 6 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆54Updated 4 months ago
- ☆50Updated 5 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated last week
- Train your own SOTA deductive reasoning model☆88Updated last month
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆78Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆145Updated 2 months ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆60Updated 3 weeks ago
- ☆20Updated last year
- ☆81Updated last year