chroma-core / context-rotLinks
This repository contains the toolkit for replicating results from our technical report.
☆185Updated 3 months ago
Alternatives and similar repositories for context-rot
Users that are interested in context-rot are comparing it to the libraries listed below
Sorting:
- Real-Time Detection of Hallucinated Entities in Long-Form Generation☆273Updated last month
- ☆236Updated last month
- Public repository containing METR's DVC pipeline for eval data analysis☆168Updated 8 months ago
- Ranking LLMs on agentic tasks☆205Updated last month
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆249Updated this week
- Repo for "Adaptation of Agentic AI"☆483Updated last week
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆776Updated 2 weeks ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆486Updated 4 months ago
- MCP-based Agent Deep Evaluation System☆142Updated 3 months ago
- Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)☆195Updated 6 months ago
- A clean, modular SDK for building AI agents with OpenHands V1.☆371Updated this week
- MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents☆531Updated this week
- Code to accompany the Universal Deep Research paper (https://arxiv.org/abs/2509.00244)☆455Updated 4 months ago
- A Text-Based Environment for Interactive Debugging☆287Updated last week
- ☆79Updated 3 months ago
- ☆103Updated 9 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆140Updated 4 months ago
- ☆734Updated last week
- ☆105Updated last year
- ☆208Updated this week
- frozen-in-time version of our Paper Finder agent for reproducing evaluation results☆215Updated 4 months ago
- benchmark and evaluate generative research synthesis☆102Updated last month
- CodeScientist: An automated scientific discovery system for code-based experiments☆305Updated last month
- Source code for the collaborative reasoner research project at Meta FAIR.☆111Updated 8 months ago
- Super basic implementation (gist-like) of RLMs with REPL environments.☆290Updated 2 months ago
- A method for steering llms to better follow instructions☆68Updated 4 months ago
- 🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch or based on seed data.☆585Updated last week
- The Granite Guardian models are designed to detect risks in prompts and responses.☆123Updated 2 months ago
- Verbalized Sampling, a training-free prompting strategy to mitigate mode collapse in LLMs by requesting responses with probabilities. Ach…☆624Updated last month
- RAG evaluation without the need for "golden answers"☆330Updated 2 weeks ago