[ACL2026 Main] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
☆80Jan 23, 2026Updated 3 months ago
Alternatives and similar repositories for AgencyBench
Users that are interested in AgencyBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆55Jan 31, 2026Updated 3 months ago
- The official implementation of Bi-Mamba☆16Oct 22, 2025Updated 6 months ago
- Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".☆12Dec 27, 2023Updated 2 years ago
- Benchmark dataset for the paper "Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with …☆26May 20, 2025Updated 11 months ago
- the final homework code for the class "intelligence engineering"☆12Mar 1, 2020Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆13Oct 31, 2024Updated last year
- official repo for `thinking with images through-self-calling`☆26Dec 28, 2025Updated 4 months ago
- 🧜♀️ Pi extension that renders Mermaid diagrams as ASCII in the TUI, with width-aware output and safe handling for larger diagrams.☆54Feb 23, 2026Updated 2 months ago
- All-in-one benchmarking platform for evaluating LLM.☆15Nov 12, 2025Updated 5 months ago
- [AAAI 2025 (Oral)] SAIL: Sample-Centric In-Context Learning for Document Information Extraction☆20Dec 24, 2024Updated last year
- Your friendly terminal-based AI pair programmer☆41Jun 5, 2023Updated 2 years ago
- A modified version of Andrej Karpathy's build-nanogpt☆35Oct 26, 2025Updated 6 months ago
- Code generation from natural language with less prior and more monolingual data☆12Aug 24, 2021Updated 4 years ago
- A scalable benchmark for state representation learning in visual reinforcement learning.☆17Jun 23, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Comostional question answering☆17Jun 18, 2021Updated 4 years ago
- a library of works related to Large Language Models (LLMs) based Agent Hallucination☆54Oct 30, 2025Updated 6 months ago
- Vim plugin to copy text to Windows clipboard on WSL☆12Jan 8, 2023Updated 3 years ago
- SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)☆17Aug 22, 2025Updated 8 months ago
- Table logger using Rich☆13Aug 13, 2025Updated 8 months ago
- walterra's collections of helpers for agentic coding☆32Mar 23, 2026Updated last month
- Transcripts of Democratic Debates as R Package☆10Jun 17, 2020Updated 5 years ago
- ☆20May 14, 2024Updated last year
- ☆13Jul 14, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Official Repo for Paper: "Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios"☆31Jan 24, 2026Updated 3 months ago
- ☆12Mar 22, 2025Updated last year
- [ACL 2023] To Copy Rather Than Memorize: A Vertical Learning Paradigm for Knowledge Graph Completion☆13Feb 3, 2023Updated 3 years ago
- a simple pokerogue.net save editor☆11May 14, 2024Updated last year
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆21Jan 31, 2026Updated 3 months ago
- 3 experiments for Pattern Recognition course in USTC 2020fall☆10Jan 25, 2021Updated 5 years ago
- ROS Virtual Joystick on rqt☆26Feb 12, 2023Updated 3 years ago
- ☆24Aug 26, 2025Updated 8 months ago
- Short course using RStudio for biological data analysis☆14Jul 7, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆28Apr 24, 2026Updated last week
- Clean and simple theme for ZSH.☆13Dec 8, 2016Updated 9 years ago
- Python port of the Flue: The Agent Harness Framework☆53Updated this week
- This repository is a research and educational tool intended to archive any and all available evidence of the decline in Russian military …☆25Updated this week
- [ICML 2024] Generalizing Knowledge Graph Embedding with Universal Orthogonal Parameterization☆16May 12, 2024Updated last year
- Is a simple pytest plugin for testing async python code☆15Feb 12, 2026Updated 2 months ago
- ☆17Dec 12, 2020Updated 5 years ago