[ICLR 2026] VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
☆152Feb 22, 2026Updated 4 months ago
Alternatives and similar repositories for vitabench
Users that are interested in vitabench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multilingual and Multiculture Benchmark and LLM☆41May 18, 2026Updated last month
- [NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆67Sep 27, 2025Updated 9 months ago
- [KDD 2025] Fine-tuning Multimodal Large Language Models for Product Bundling☆15Sep 20, 2025Updated 9 months ago
- Individual learning to implement some modules☆28Aug 12, 2024Updated last year
- Internal utility libraries for Pkl☆17Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Baidu Qianfan Deep Research☆35Jun 8, 2026Updated 3 weeks ago
- This repository contains the code for the IEEE Robotics and Automation Letters paper "Open-Set Object Detection Using Classification-Free…☆16Dec 6, 2023Updated 2 years ago
- ☆23Oct 22, 2025Updated 8 months ago
- ☆23Feb 3, 2024Updated 2 years ago
- C^3-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking☆39Mar 1, 2026Updated 3 months ago
- AutoThink is a reinforcement learning framework designed to equip R1-style language models with adaptive reasoning capabilities. Instead …☆52Oct 14, 2025Updated 8 months ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 8 months ago
- Original implementation of QA4IE☆25Jul 28, 2021Updated 4 years ago
- Rethinking the Trust Region in LLM Reinforcement Learning☆61Mar 2, 2026Updated 3 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code & data accompanying the paper ["Unveiling Implicit Deceptive Patterns in Multi-modal Fake News via Neuro-Symbolic Reasoning"].☆13Dec 21, 2023Updated 2 years ago
- ☆23Dec 3, 2025Updated 6 months ago
- ☆43Jun 9, 2026Updated 3 weeks ago
- 🍎Wende Chinese QA system (experimental)☆10Jun 1, 2021Updated 5 years ago
- ☆14May 7, 2024Updated 2 years ago
- [ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆391Mar 30, 2026Updated 2 months ago
- ☆19Jun 21, 2025Updated last year
- [NeurIPS 2023] "Learning to Augment Distributions for Out-of-distribution Detection"☆11Nov 14, 2023Updated 2 years ago
- The official implemention of "Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration" (ICML 2026)