[ICLR 2026] VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
☆100Feb 22, 2026Updated last month
Alternatives and similar repositories for vitabench
Users that are interested in vitabench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code repo for the paper "MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments"☆33Mar 9, 2026Updated 3 weeks ago
- [KDD 2025] Fine-tuning Multimodal Large Language Models for Product Bundling☆15Sep 20, 2025Updated 6 months ago
- A framework aiming to bridge fast robot prototyping, predefined motion primitives, heterogeneous teleoperation, data collection, and flex…☆25Mar 2, 2026Updated 3 weeks ago
- Individual learning to implement some modules☆29Aug 12, 2024Updated last year
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆30Sep 12, 2025Updated 6 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Internal utility libraries for Pkl☆16Updated this week
- τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment☆881Mar 20, 2026Updated last week
- This repository contains the code for the IEEE Robotics and Automation Letters paper "Open-Set Object Detection Using Classification-Free…☆14Dec 6, 2023Updated 2 years ago
- ☆21Oct 22, 2025Updated 5 months ago
- ☆22Feb 3, 2024Updated 2 years ago
- C^3-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking☆38Mar 1, 2026Updated 3 weeks ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆37Oct 3, 2025Updated 5 months ago
- Official implementation of the ICCV 2025 paper HoliTracer.☆43Jan 13, 2026Updated 2 months ago
- Rethinking the Trust Region in LLM Reinforcement Learning☆52Mar 2, 2026Updated 3 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code & data accompanying the paper ["Unveiling Implicit Deceptive Patterns in Multi-modal Fake News via Neuro-Symbolic Reasoning"].☆13Dec 21, 2023Updated 2 years ago
- ☆32Mar 13, 2026Updated 2 weeks ago
- 🍎Wende Chinese QA system (experimental)☆10Jun 1, 2021Updated 4 years ago
- ☆14May 7, 2024Updated last year
- ☆19Jun 21, 2025Updated 9 months ago
- [NeurIPS 2023] "Learning to Augment Distributions for Out-of-distribution Detection"☆11Nov 14, 2023Updated 2 years ago
- The training codes of Jasper-Token-Compression-600M☆19Nov 19, 2025Updated 4 months ago
- The official repository of the first version of ACE-Brain foundation model.☆65Mar 13, 2026Updated 2 weeks ago
- ☆32Oct 23, 2025Updated 5 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Paper Reading Summary(mainly NLP related papers)☆11Nov 6, 2019Updated 6 years ago
- [ACL 2023] S3HQA: A Three-Stage Approach for Multi-hop Text-Table Hybrid Question Answering☆20Jun 8, 2025Updated 9 months ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆46Jul 17, 2025Updated 8 months ago
- [CVPR 2026 (Findings) 🔥🔥] Self Evolving Large Multimodal Models with Continuous Rewards☆21Mar 5, 2026Updated 3 weeks ago
- (NeurIPS 2025 🔥) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"☆46Feb 11, 2026Updated last month
- Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER☆22Jul 19, 2023Updated 2 years ago
- ROSA+: RWKV's ROSA implementation with fallback statistical predictor☆35Oct 13, 2025Updated 5 months ago
- [NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆104Feb 28, 2026Updated last month
- The official github repo for the open online courses: "Dive into LLMs".☆10Mar 15, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆22Nov 9, 2025Updated 4 months ago
- ☆17Oct 10, 2023Updated 2 years ago
- Code for our EMNLP 2023 paper - Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Mode…☆15May 5, 2024Updated last year
- MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation☆25Jul 8, 2023Updated 2 years ago
- [IEEE TPAMI 2025] REST: Holistic Learning for End-to-End Semantic Segmentation of Whole-Scene Remote Sensing Imagery☆36Mar 18, 2026Updated last week
- bootstrap my zsh shell☆17Updated this week
- JIT-compiled GPU kernels for quantum chemistry☆31Jan 30, 2026Updated 2 months ago