GoodAI / goodai-ltm-benchmarkLinks
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆72Updated 5 months ago
Alternatives and similar repositories for goodai-ltm-benchmark
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below
Sorting:
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆90Updated 4 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 5 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆81Updated 8 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 10 months ago
- The first dense retrieval model that can be prompted like an LM☆73Updated 3 weeks ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆67Updated 11 months ago
- accompanying material for sleep-time compute paper☆90Updated last month
- Repository for the paper Stream of Search: Learning to Search in Language☆147Updated 4 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆78Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆239Updated 3 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆123Updated 11 months ago
- Evaluating LLMs with CommonGen-Lite☆90Updated last year
- ☆50Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago
- Functional Benchmarks and the Reasoning Gap☆86Updated 8 months ago
- ☆41Updated 4 months ago
- ☆83Updated last month
- ☆41Updated 5 months ago
- LMQL implementation of tree of thoughts☆34Updated last year
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆139Updated 3 months ago
- A framework for optimizing DSPy programs with RL☆58Updated this week
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆38Updated last month
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆60Updated 10 months ago
- Sphynx Hallucination Induction☆54Updated 4 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆56Updated 5 months ago
- ☆59Updated 2 weeks ago
- ☆83Updated 5 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 4 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated 11 months ago