VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
☆20Jun 2, 2025Updated last year
Alternatives and similar repositories for VCapsBench
Users that are interested in VCapsBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"☆18Aug 27, 2025Updated 9 months ago
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated 4 months ago
- Video-Language Alignment via Spatio–Temporal Graph Transformer; ArXiv: https://arxiv.org/abs/2407.11677☆15Jul 24, 2024Updated last year
- [ICLR 2026] "VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?", Yuanxin Liu, Kun Ouyang, Haoning Wu, Yi Liu, L…☆39Jan 30, 2026Updated 4 months ago
- [ICLR 2026] MotionSight's official code implementation.☆48Apr 24, 2026Updated last month
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆52Feb 22, 2026Updated 3 months ago
- https://avocado-captioner.github.io/☆36Oct 16, 2025Updated 7 months ago
- official implementation of MGA-CLAP (ACM MM 2024)☆30Oct 25, 2024Updated last year
- Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset☆110Feb 25, 2026Updated 3 months ago
- Automatic Metric for Evaluating Generated Videos☆47Dec 8, 2025Updated 6 months ago
- Graph Neural Networks Paper List of 2019 Conferences☆20Jul 25, 2019Updated 6 years ago
- The code is tensorflow implement for focal loss for Dense Object Detection. https://arxiv.org/abs/1708.02002☆20Jun 13, 2019Updated 6 years ago
- Developer project for getting basic API integrations working in under 5 minutes☆11May 22, 2026Updated 2 weeks ago
- [CVPR 2025] GPS as a Control Signal for Image Generation☆25Mar 18, 2025Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- A Massive Multi-Discipline Lecture Understanding Benchmark☆34Apr 20, 2026Updated last month
- ECCV 2026 paper template☆40Jan 23, 2026Updated 4 months ago
- [CVPR 2026] FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning☆56Mar 26, 2026Updated 2 months ago
- Explaining audio differences using language☆16Feb 11, 2025Updated last year
- Music Language Model Generation, Optimization, and Practice☆59Apr 20, 2026Updated last month
- ☆12Mar 23, 2026Updated 2 months ago
- Repository for "Training Audio Captioning Models without Audio"☆10Sep 26, 2023Updated 2 years ago
- Official repo for ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models☆28Mar 24, 2025Updated last year
- [IJCAI 2024] CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment☆18Jul 16, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆27Aug 9, 2025Updated 10 months ago
- ☆10Sep 25, 2024Updated last year
- DreamGaussian with 2D-GS☆12Oct 10, 2024Updated last year
- [ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆41Dec 27, 2023Updated 2 years ago
- [CVPR 2026]UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation☆217Jan 29, 2026Updated 4 months ago
- [ECCV 2024] Official code repository of paper titled "Efficient 3D-Aware Facial Image Editing Via Attribute-Specific Prompt Learning"☆10Aug 2, 2024Updated last year
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆20Nov 4, 2025Updated 7 months ago
- ☆11Dec 28, 2023Updated 2 years ago
- The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…☆29Nov 18, 2025Updated 6 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆15Jun 23, 2024Updated last year
- [ICLR 2026] Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks☆31Feb 5, 2026Updated 4 months ago
- This is a Uyghur language translator that supports speech-to-text in Uyghur language, machine translation to Uyghur language text, and te…☆14Nov 28, 2024Updated last year
- Different feature matching algorithms implemented in PyTorch!☆15Sep 11, 2019Updated 6 years ago
- Audio Entailment: Deductive Reasoning for Audio Understanding☆17Dec 10, 2024Updated last year
- tensorflow implementation of OHEM loss and Support the sigmoid or softmax entropy loss☆30Aug 23, 2019Updated 6 years ago
- [ACM MM 2023] PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation☆12Aug 28, 2023Updated 2 years ago