[ACL2026 Main] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
☆84Jan 23, 2026Updated 4 months ago
Alternatives and similar repositories for AgencyBench
Users that are interested in AgencyBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆58Jan 31, 2026Updated 3 months ago
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆23Mar 4, 2025Updated last year
- Benchmark dataset for the paper "Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with …☆26May 20, 2025Updated last year
- the final homework code for the class "intelligence engineering"☆12Mar 1, 2020Updated 6 years ago
- ☆14Oct 31, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- official repo for `thinking with images through-self-calling`☆26Dec 28, 2025Updated 4 months ago
- All-in-one benchmarking platform for evaluating LLM.☆15Nov 12, 2025Updated 6 months ago
- [AAAI 2025 (Oral)] SAIL: Sample-Centric In-Context Learning for Document Information Extraction☆20Dec 24, 2024Updated last year
- A scalable benchmark for state representation learning in visual reinforcement learning.☆17Jun 23, 2025Updated 11 months ago
- Author implementation of "Learning to Search in Long Documents Using Document Structure" (Mor Geva and Jonathan Berant, 2018)☆22Jul 12, 2018Updated 7 years ago
- 🧜♀️ Pi extension that renders Mermaid diagrams as ASCII in the TUI, with width-aware output and safe handling for larger diagrams.☆64Feb 23, 2026Updated 3 months ago
- Comostional question answering☆17Jun 18, 2021Updated 4 years ago
- ☆15May 27, 2019Updated 6 years ago
- SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)☆17Aug 22, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Table logger using Rich☆13Aug 13, 2025Updated 9 months ago
- ☆12Sep 23, 2024Updated last year
- walterra's collections of helpers for agentic coding☆34Mar 23, 2026Updated 2 months ago
- JAX implementation of the Mistral 7b v0.1 model☆13Mar 27, 2024Updated 2 years ago
- ☆13Jul 14, 2024Updated last year
- (🔥ICML2026) Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios☆35Jan 24, 2026Updated 4 months ago
- ☆12Mar 22, 2025Updated last year
- [ACL 2023] To Copy Rather Than Memorize: A Vertical Learning Paradigm for Knowledge Graph Completion☆13Feb 3, 2023Updated 3 years ago
- This repo is reproduction resources for linear alignment paper, still working☆18May 19, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- a simple pokerogue.net save editor☆11May 14, 2024Updated 2 years ago
- ☆14Mar 6, 2020Updated 6 years ago
- Hugo theme for documenting One-Day-Only projects☆11Jun 20, 2021Updated 4 years ago
- Code and Models for paper "AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning. Han Guo, Ramakanth Pasunuru, and Mohit Ba…☆24Apr 15, 2019Updated 7 years ago
- ☆42Nov 8, 2025Updated 6 months ago
- ☆12Oct 9, 2020Updated 5 years ago
- ROS Virtual Joystick on rqt☆26Feb 12, 2023Updated 3 years ago
- ☆34May 1, 2026Updated 3 weeks ago
- [CVPR' 25] Official repo for From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Cal…☆22Jun 6, 2025Updated 11 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Short course using RStudio for biological data analysis☆14Jul 7, 2022Updated 3 years ago
- Safety-J: Evaluating Safety with Critique☆16Jul 28, 2024Updated last year
- This repository is a research and educational tool intended to archive any and all available evidence of the decline in Russian military …☆27May 18, 2026Updated last week
- [ICML 2024] Generalizing Knowledge Graph Embedding with Universal Orthogonal Parameterization☆16May 12, 2024Updated 2 years ago
- Is a simple pytest plugin for testing async python code☆15Feb 12, 2026Updated 3 months ago
- A Fine-Grained Benchmark for Open Information Extraction☆21May 17, 2022Updated 4 years ago
- LangChain + llamaCPP + babyAGI implementation☆13Apr 12, 2023Updated 3 years ago