VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
☆18Jun 2, 2025Updated 10 months ago
Alternatives and similar repositories for VCapsBench
Users that are interested in VCapsBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"☆17Aug 27, 2025Updated 8 months ago
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated 3 months ago
- Video-Language Alignment via Spatio–Temporal Graph Transformer; ArXiv: https://arxiv.org/abs/2407.11677☆14Jul 24, 2024Updated last year
- [ICLR 2026] "VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?", Yuanxin Liu, Kun Ouyang, Haoning Wu, Yi Liu, L…☆38Jan 30, 2026Updated 2 months ago
- https://avocado-captioner.github.io/☆33Oct 16, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2026] MotionSight's official code implementation.☆47Updated this week
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆52Feb 22, 2026Updated 2 months ago
- Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset☆107Feb 25, 2026Updated 2 months ago
- official implementation of MGA-CLAP (ACM MM 2024)☆31Oct 25, 2024Updated last year
- Automatic Metric for Evaluating Generated Videos☆45Dec 8, 2025Updated 4 months ago
- Graph Neural Networks Paper List of 2019 Conferences☆20Jul 25, 2019Updated 6 years ago
- The code is tensorflow implement for focal loss for Dense Object Detection. https://arxiv.org/abs/1708.02002☆20Jun 13, 2019Updated 6 years ago
- Developer project for getting basic API integrations working in under 5 minutes☆11Jan 30, 2026Updated 3 months ago
- Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond☆110Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [CVPR 2025] GPS as a Control Signal for Image Generation☆25Mar 18, 2025Updated last year
- A Massive Multi-Discipline Lecture Understanding Benchmark☆34Apr 20, 2026Updated last week
- Music Language Model Generation, Optimization, and Practice☆55Apr 20, 2026Updated last week
- ECCV 2026 paper template☆41Jan 23, 2026Updated 3 months ago
- [CVPR 2026] FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning☆52Mar 26, 2026Updated last month
- Explaining audio differences using language☆16Feb 11, 2025Updated last year
- ☆12Mar 23, 2026Updated last month
- Repository for "Training Audio Captioning Models without Audio"☆10Sep 26, 2023Updated 2 years ago
- Official repo for ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models☆28Mar 24, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [IJCAI 2024] CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment☆18Jul 16, 2024Updated last year
- ☆10Sep 25, 2024Updated last year
- DreamGaussian with 2D-GS☆12Oct 10, 2024Updated last year
- [ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆41Dec 27, 2023Updated 2 years ago
- [CVPR 2026]UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation☆214Jan 29, 2026Updated 3 months ago
- [ECCV 2024] Official code repository of paper titled "Efficient 3D-Aware Facial Image Editing Via Attribute-Specific Prompt Learning"☆10Aug 2, 2024Updated last year
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 5 months ago
- ☆11Dec 28, 2023Updated 2 years ago
- The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…☆28Nov 18, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆15Jun 23, 2024Updated last year
- [ICLR 2026] Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks☆30Feb 5, 2026Updated 2 months ago
- This is a Uyghur language translator that supports speech-to-text in Uyghur language, machine translation to Uyghur language text, and te…☆14Nov 28, 2024Updated last year
- Different feature matching algorithms implemented in PyTorch!☆15Sep 11, 2019Updated 6 years ago
- Audio Entailment: Deductive Reasoning for Audio Understanding☆17Dec 10, 2024Updated last year
- tensorflow implementation of OHEM loss and Support the sigmoid or softmax entropy loss☆30Aug 23, 2019Updated 6 years ago
- [ACM MM 2023] PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation☆12Aug 28, 2023Updated 2 years ago