GoodAI / goodai-ltm-benchmark
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆65Updated 3 months ago
Alternatives and similar repositories for goodai-ltm-benchmark:
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 2 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆102Updated 3 months ago
- ☆48Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated 7 months ago
- A strongly typed Python DSL for developing message passing multi agent systems☆52Updated 11 months ago
- ☆66Updated 9 months ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated last year
- ☆87Updated last year
- Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation☆41Updated last year
- Train your own SOTA deductive reasoning model☆81Updated 2 weeks ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆121Updated 9 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆52Updated 3 months ago
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- ☆81Updated last year
- ☆84Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 10 months ago
- ☆38Updated last year
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆83Updated this week
- LLM reads a paper and produce a working prototype☆51Updated last week
- LMQL implementation of tree of thoughts☆34Updated last year
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆22Updated 2 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆165Updated 2 weeks ago
- Track the progress of LLM context utilisation☆53Updated 8 months ago
- Based on the tree of thoughts paper☆46Updated last year
- ☆73Updated last year
- ☆39Updated 8 months ago
- look how they massacred my boy☆63Updated 5 months ago
- ☆57Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆142Updated last month