[ICLR 2026] VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
☆117Feb 22, 2026Updated last month
Alternatives and similar repositories for vitabench
Users that are interested in vitabench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multilingual and Multiculture Benchmark and LLM☆32Apr 10, 2026Updated last week
- ☆20Jan 22, 2026Updated 2 months ago
- [ACL 2026] Code, benchmark and environment for "OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic…☆45Nov 10, 2025Updated 5 months ago
- A framework aiming to bridge fast robot prototyping, predefined motion primitives, heterogeneous teleoperation, data collection, and flex…☆26Apr 4, 2026Updated 2 weeks ago
- Individual learning to implement some modules☆29Aug 12, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Internal utility libraries for Pkl☆16Updated this week
- This repository contains the code for the IEEE Robotics and Automation Letters paper "Open-Set Object Detection Using Classification-Free…☆14Dec 6, 2023Updated 2 years ago
- 集成学习思维导图☆21Apr 6, 2023Updated 3 years ago
- ☆23Feb 3, 2024Updated 2 years ago
- C^3-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking☆38Mar 1, 2026Updated last month
- AutoThink is a reinforcement learning framework designed to equip R1-style language models with adaptive reasoning capabilities. Instead …☆51Oct 14, 2025Updated 6 months ago
- Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”☆27Oct 23, 2025Updated 5 months ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆37Oct 3, 2025Updated 6 months ago
- [ICCV25] Official implementation of the paper HoliTracer.☆44Apr 7, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 🍎Wende Chinese QA system (experimental)☆10Jun 1, 2021Updated 4 years ago
- ☆35Mar 13, 2026Updated last month
- ☆19Jun 21, 2025Updated 9 months ago
- [NeurIPS 2023] "Learning to Augment Distributions for Out-of-distribution Detection"☆11Nov 14, 2023Updated 2 years ago
- ☆30Jun 12, 2023Updated 2 years ago
- The training codes of Jasper-Token-Compression-600M☆19Nov 19, 2025Updated 5 months ago
- ☆33Oct 23, 2025Updated 5 months ago
- [ACL 2023] S3HQA: A Three-Stage Approach for Multi-hop Text-Table Hybrid Question Answering☆20Jun 8, 2025Updated 10 months ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆46Jul 17, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- (NeurIPS 2025 🔥) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"☆48Feb 11, 2026Updated 2 months ago
- ROSA+: RWKV's ROSA implementation with fallback statistical predictor☆34Oct 13, 2025Updated 6 months ago
- [NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆108Feb 28, 2026Updated last month
- The official github repo for the open online courses: "Dive into LLMs".☆10Mar 15, 2024Updated 2 years ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆22Nov 9, 2025Updated 5 months ago
- ☆17Oct 10, 2023Updated 2 years ago
- MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation☆25Jul 8, 2023Updated 2 years ago
- The official repository of the first version of ACE-Brain foundation model.☆74Mar 13, 2026Updated last month
- JIT-compiled GPU kernels for quantum chemistry☆32Jan 30, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official implementation for the paper "Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation", publish…☆20Jun 3, 2024Updated last year
- The official repo of VideoAgentTrek☆47Oct 24, 2025Updated 5 months ago
- ☆65Updated this week
- [NeurIPS 2025] Reasoning MLLM, Share-GRPO, advantage vanishing, sparse reward☆36Sep 19, 2025Updated 7 months ago
- 2019 Baidu Machine Reading Comprehension Competition!☆10Jun 3, 2019Updated 6 years ago
- Evaluation kit for testing stateful agents☆68Updated this week
- LONGAGENT: Scaling Language Models to 128k Context through Multi-Agent Collaboration☆11Mar 11, 2024Updated 2 years ago