chroma-core / context-rotLinks
This repository contains the toolkit for replicating results from our technical report.
☆103Updated last month
Alternatives and similar repositories for context-rot
Users that are interested in context-rot are comparing it to the libraries listed below
Sorting:
- ☆230Updated last month
- Ranking LLMs on agentic tasks☆180Updated 2 weeks ago
- ☆406Updated 2 months ago
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆449Updated last month
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆410Updated this week
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆134Updated last week
- MCP-based Agent Deep Evaluation System☆115Updated 2 weeks ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 4 months ago
- ☆75Updated 6 months ago
- Verifiers for LLM Reinforcement Learning☆75Updated 3 weeks ago
- CodeScientist: An automated scientific discovery system for code-based experiments☆289Updated 2 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆75Updated 8 months ago
- A Text-Based Environment for Interactive Debugging☆257Updated this week
- Public repository containing METR's DVC pipeline for eval data analysis☆104Updated 4 months ago
- frozen-in-time version of our Paper Finder agent for reproducing evaluation results☆152Updated last week
- Official Repo for CRMArena and CRMArena-Pro☆109Updated 2 months ago
- DIffbot LLM Inference Server☆185Updated last week
- Routing on Random Forest (RoRF)☆200Updated 11 months ago
- Inference, Fine Tuning and many more recipes with Gemma family of models☆267Updated last month
- Agentic Web: Weaving the Next Web with AI Agents.☆332Updated this week
- A method for steering llms to better follow instructions☆49Updated 3 weeks ago
- Beating the GAIA benchmark with Transformers Agents. 🚀☆133Updated 6 months ago
- ☆262Updated 2 months ago
- Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)☆155Updated 2 months ago
- Analysis code for paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆47Updated 3 weeks ago
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆180Updated 3 months ago
- ☆94Updated 5 months ago
- ☆51Updated 6 months ago
- ☆181Updated 6 months ago
- Enterprise-grade memory framework for LLMs featuring GPU-optimized inference, vector storage, and automated scaling. Enables hyper-person…☆89Updated 3 months ago