[ACL2026 Main] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
☆73Jan 23, 2026Updated 2 months ago
Alternatives and similar repositories for AgencyBench
Users that are interested in AgencyBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"☆20Feb 21, 2025Updated last year
- Code and data repository for "The Mirage of Model Editing: Revisiting Evaluation in the Wild"☆17Aug 27, 2025Updated 7 months ago
- Harness for deep search agent☆38Updated this week
- [ICLR 2026] The official repository for the paper "AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning".☆78Feb 27, 2026Updated last month
- Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".☆12Dec 27, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Benchmark dataset for the paper "Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with …☆25May 20, 2025Updated 10 months ago
- the final homework code for the class "intelligence engineering"☆12Mar 1, 2020Updated 6 years ago
- official repo for `thinking with images through-self-calling`☆26Dec 28, 2025Updated 3 months ago
- ☆51Oct 20, 2025Updated 5 months ago
- [AAAI 2025 (Oral)] SAIL: Sample-Centric In-Context Learning for Document Information Extraction☆19Dec 24, 2024Updated last year
- A scalable benchmark for state representation learning in visual reinforcement learning.☆17Jun 23, 2025Updated 9 months ago
- Vim plugin to copy text to Windows clipboard on WSL☆12Jan 8, 2023Updated 3 years ago
- SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)☆17Aug 22, 2025Updated 7 months ago
- ☆19May 14, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- JAX implementation of the Mistral 7b v0.1 model☆13Mar 27, 2024Updated 2 years ago
- ☆13Jul 14, 2024Updated last year
- Official Repo for Paper: "Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios"☆31Jan 24, 2026Updated 2 months ago
- ☆12Mar 22, 2025Updated last year
- [ACL 2023] To Copy Rather Than Memorize: A Vertical Learning Paradigm for Knowledge Graph Completion☆13Feb 3, 2023Updated 3 years ago
- This repo is reproduction resources for linear alignment paper, still working☆18May 19, 2024Updated last year
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆21Jan 31, 2026Updated 2 months ago
- 3 experiments for Pattern Recognition course in USTC 2020fall☆10Jan 25, 2021Updated 5 years ago
- ☆12Oct 9, 2020Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ROS Virtual Joystick on rqt☆26Feb 12, 2023Updated 3 years ago
- [CVPR' 25] Official repo for From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Cal…☆22Jun 6, 2025Updated 10 months ago
- ☆24Aug 26, 2025Updated 7 months ago
- Short course using RStudio for biological data analysis☆14Jul 7, 2022Updated 3 years ago
- USTC-SGY 人工智能通识课 主页☆22Jun 19, 2025Updated 9 months ago
- Safety-J: Evaluating Safety with Critique☆16Jul 28, 2024Updated last year
- ☆17Mar 17, 2022Updated 4 years ago
- This repository is a research and educational tool intended to archive any and all available evidence of the decline in Russian military …☆22Updated this week
- [ICML 2024] Generalizing Knowledge Graph Embedding with Universal Orthogonal Parameterization☆16May 12, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Is a simple pytest plugin for testing async python code☆15Feb 12, 2026Updated 2 months ago
- ☆17Dec 12, 2020Updated 5 years ago
- [ACL 2024] ReactXT: Understanding Molecular “Reaction-ship” via Reaction-Contextualized Molecule-Text Pretraining. by Zhiyuan Liu*, Yaoru…☆28Sep 3, 2024Updated last year
- Freesurfer Port to R☆10Apr 9, 2026Updated last week
- LangChain + llamaCPP + babyAGI implementation☆13Apr 12, 2023Updated 3 years ago
- EMNLP 2022: "A Unified Positive-Unlabeled Learning Framework for Document-Level Relation Extraction with Different Levels of Labeling"☆27Feb 3, 2023Updated 3 years ago
- Download UKB bulk data☆12Jul 27, 2020Updated 5 years ago