chroma-core / context-rotLinks
This repository contains the toolkit for replicating results from our technical report.
☆152Updated last month
Alternatives and similar repositories for context-rot
Users that are interested in context-rot are comparing it to the libraries listed below
Sorting:
- ☆232Updated 3 months ago
- Real-Time Detection of Hallucinated Entities in Long-Form Generation☆262Updated 2 weeks ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆463Updated 2 months ago
- Ranking LLMs on agentic tasks☆196Updated last month
- Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)☆176Updated 4 months ago
- MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents☆461Updated 2 weeks ago
- Public repository containing METR's DVC pipeline for eval data analysis☆124Updated 6 months ago
- ☆79Updated last month
- ☆172Updated last week
- Routing on Random Forest (RoRF)☆214Updated last year
- MCP-based Agent Deep Evaluation System☆135Updated last month
- An Automatic Prompt Optimization Framework for Large Language Models☆130Updated 2 months ago
- A Text-Based Environment for Interactive Debugging☆275Updated this week
- ☆199Updated last week
- Tutorial for building LLM router☆231Updated last year
- frozen-in-time version of our Paper Finder agent for reproducing evaluation results☆192Updated 2 months ago
- Official Repo for CRMArena and CRMArena-Pro☆119Updated 4 months ago
- ☆101Updated last year
- ☆494Updated 4 months ago
- Code to accompany the Universal Deep Research paper (https://arxiv.org/abs/2509.00244)☆445Updated 2 months ago
- Super basic implementation (gist-like) of RLMs with REPL environments.☆204Updated last week
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆136Updated 2 months ago
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆202Updated this week
- Readymade evaluators for agent trajectories☆365Updated last month
- An open-source tool for LLM prompt optimization.☆687Updated this week
- Beating the GAIA benchmark with Transformers Agents. 🚀☆138Updated 8 months ago
- Workflows are an event-driven, async-first, step-based way to control the execution flow of AI applications like agents.☆231Updated this week
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆182Updated 5 months ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆609Updated last week
- Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.☆151Updated last year