letta-ai / letta-leaderboardLinks
An LLM leaderboard for stateful agents
☆19Updated last month
Alternatives and similar repositories for letta-leaderboard
Users that are interested in letta-leaderboard are comparing it to the libraries listed below
Sorting:
- accompanying material for sleep-time compute paper☆117Updated 6 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆150Updated last year
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆94Updated 6 months ago
- The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution☆135Updated last week
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆56Updated 4 months ago
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆213Updated this week
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆33Updated last month
- The official repo for "LLoCo: Learning Long Contexts Offline"☆118Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆189Updated 8 months ago
- All information and news with respect to Falcon-H1 series☆93Updated last month
- ☆35Updated 3 months ago
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆81Updated 3 months ago
- SWE Arena☆35Updated 4 months ago
- Replicating O1 inference-time scaling laws☆90Updated 11 months ago
- Storing long contexts in tiny caches with self-study☆216Updated last month
- Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 9 months ago
- ☆125Updated 6 months ago
- [ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.☆56Updated 3 months ago
- ☆78Updated 3 weeks ago
- ☆75Updated last year
- ☆59Updated 9 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆218Updated 2 weeks ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆130Updated 3 months ago
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike stat…☆364Updated this week
- A benchmark for testing memorization abilities of LMs☆20Updated last year
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆67Updated last year
- RepoQA: Evaluating Long-Context Code Understanding☆123Updated last year
- A curated list of awesome Compound AI Systems☆35Updated 4 months ago
- SSRL: Self-Search Reinforcement Learning☆152Updated 3 months ago
- ☆82Updated this week