A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
☆84Dec 17, 2024Updated last year
Alternatives and similar repositories for goodai-ltm-benchmark
Users that are interested in goodai-ltm-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LMQL implementation of tree of thoughts☆36Jan 31, 2024Updated 2 years ago
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆31Jun 27, 2024Updated last year
- Benchmarking LLM Inference Speeds☆13Apr 7, 2026Updated last week
- ☆10Nov 6, 2024Updated last year
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.☆10May 16, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- LLM-Powered Data Discovery System for Tabular Data☆26Apr 7, 2026Updated last week
- Measuring RAG solutions throughput and latency☆20Jul 23, 2024Updated last year
- ☆88Dec 15, 2023Updated 2 years ago
- ☆19Aug 7, 2024Updated last year
- Track the progress of LLM context utilisation☆55Apr 14, 2025Updated last year
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Jan 19, 2024Updated 2 years ago
- ☆12Jul 16, 2024Updated last year
- The official PyTorch code for AAAI'23 Paper "Sparse Coding in a Dual Memory System for Lifelong Learning"☆12Feb 15, 2023Updated 3 years ago
- code for training and using chess embeddings models☆13Jun 9, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆14May 9, 2024Updated last year
- ☆14Nov 12, 2024Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- A simple editing tool for cutting, compressing, and processing visuals.☆22Mar 16, 2025Updated last year
- a deep learning framework for essential protein prediction☆13Mar 24, 2023Updated 3 years ago
- Cortex: Advanced Memory System for AI Agents☆96Feb 10, 2026Updated 2 months ago
- ComfyUI custom node to extend Wan videos in loops with overlap consistency, per loop prompts, and optional LoRA control.☆26Nov 29, 2025Updated 4 months ago
- Implementation of CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation☆25Feb 18, 2025Updated last year
- Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"☆32Jun 25, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Multiple GPT agents to have brainstorms and make decisions.☆20Nov 9, 2023Updated 2 years ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆11Sep 14, 2025Updated 7 months ago
- ☆16Dec 12, 2025Updated 4 months ago
- ☆46Mar 9, 2026Updated last month
- ☆25Feb 26, 2026Updated last month
- ☆20Nov 1, 2024Updated last year
- Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'☆27May 16, 2025Updated 10 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Jan 23, 2025Updated last year
- [CVPR 2020] A generative model with latent factors that are independent and localized.☆12Mar 27, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆14Apr 25, 2025Updated 11 months ago
- The official repo for SocKET: Social Knowledge Evaluation Tests☆24May 12, 2025Updated 11 months ago
- https://www.nature.com/articles/s41597-025-04725-2☆27Jul 22, 2025Updated 8 months ago
- KITE (Knowledge-Intensive Task Evaluation) is an end-to-end benchmark for RAG pipelines☆23Aug 14, 2024Updated last year
- Native Salesforce app to codify compliance rules and check documents against them. DMD is the PMD for Business Documents.☆13Jan 20, 2026Updated 2 months ago
- blablado is an extensible Assistant that listens to your voice and can execute custom Python functions you provided. It can speak as well…☆69Aug 4, 2024Updated last year
- A trivial programmatic Llama 3 jailbreak. Sorry Zuck!☆568Jan 26, 2025Updated last year