V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
☆37Feb 4, 2026Updated last month
Alternatives and similar repositories for V2P-Bench
Users that are interested in V2P-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 8 months ago
- [AAAI 2024] Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-Supervised 3D Object Detection☆10Jan 24, 2025Updated last year
- The reinforcement learning codes for dataset SPA-VL☆47Jun 24, 2024Updated last year
- Multi-agent AI research system — finds academic papers via semantic search & citation snowballing, then answers questions over them using…☆86Feb 28, 2026Updated last month
- [NeurIPS 2025] I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions☆68Dec 30, 2025Updated 3 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆47Apr 9, 2025Updated 11 months ago
- [ICLR 2025] PseDet: Revisiting the Power of Pseudo Label in Incremental Object Detection☆22Sep 16, 2025Updated 6 months ago
- OneWorld: Taming Scene Generation with 3D Unified Representation Autoencoder☆48Updated this week
- A research intelligence agent pipeline for daily paper and blog triage to your email inbox.☆50Updated this week
- AI Agent Security Middleware — 8-layer defense, DLP data flow, prompt injection detection, zero dependencies. SDK + OpenClaw plugin.☆47Mar 21, 2026Updated last week
- [ICLR 2024] ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation☆76Apr 25, 2024Updated last year
- 小红书去水印 支持笔记/评论区图片视频Live Photo去水印保存、表情包保存、批量下载☆32Feb 4, 2026Updated last month
- DataCompare is a Java-based tool designed to verify the consistency of data after replication or migration operations are completed betwe…☆199Mar 2, 2026Updated 3 weeks ago
- ☆10Jun 30, 2025Updated 9 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A modern console UI/UX toolkit for Node.js☆39Nov 19, 2025Updated 4 months ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆206Sep 26, 2024Updated last year
- MUA-RL: MULTI-TURN USER-INTERACTING AGENT REINFORCEMENT LEARNING FOR AGENTIC TOOL USE☆58Nov 5, 2025Updated 4 months ago
- [ACL 2024] Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models. Detect and mitigate object hallucinatio…☆25Jan 31, 2025Updated last year
- 本项目用于Multimodal领域新手的学习路线,包括该领域的经典论文,项目及课程。旨在希望学习者在一定的时间内达到对这个领域有较为深刻的认知,能够自己进行的独立研究。☆47Mar 26, 2024Updated 2 years ago
- ☆21Mar 5, 2026Updated 3 weeks ago
- hiksdk 是海康威视官方 C SDK 的 Go 语言封装,通过 CGO 调用底层 SDK,提供简洁易用的 Go API。支持网络摄像机(IPC)、网络视频录像机(NVR)、数字视频录像机(DVR)等全系列海康设备。☆88Jan 8, 2026Updated 2 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆20Jul 20, 2024Updated last year
- OpenManus-Max: A fully refactored OpenManus with Manus-level capabilities. DAG Scheduler, Hierarchical Memory, 20+ Tools, Multi-Level Per…☆178Mar 17, 2026Updated last week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ToMM2023] - AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval☆20Aug 30, 2024Updated last year
- Everything you need to know to get the job.☆80May 12, 2025Updated 10 months ago
- [NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding☆50Sep 21, 2025Updated 6 months ago
- ☆133Mar 22, 2025Updated last year
- [KDD 2025] AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation☆33Nov 18, 2025Updated 4 months ago
- HuggingChat Python API,make the 'stream' params work☆21Dec 26, 2023Updated 2 years ago
- ☆32Jul 29, 2024Updated last year
- Official Implementation of 'OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model'☆394Updated this week
- Skill Compose is an open-source agent builder and runtime platform for skill-powered agents. No workflow graphs. No CLI.☆1,114Mar 4, 2026Updated 3 weeks ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆57Feb 14, 2026Updated last month
- ☆62Feb 27, 2026Updated last month
- This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"☆65Dec 29, 2025Updated 3 months ago
- Multimodal deep-research MLLM and benchmark. The first long-horizon multimodal deep-research MLLM, extending the number of reasoning turn…☆584Mar 13, 2026Updated 2 weeks ago
- [ICCV2023] DETRDistill: A Universal Knowledge Distillation Framework for DETR-families☆66Nov 3, 2023Updated 2 years ago
- ☆55Apr 1, 2024Updated last year
- ☆157Oct 31, 2024Updated last year