What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
☆26May 16, 2025Updated 9 months ago
Alternatives and similar repositories for CAPability
Users that are interested in CAPability are comparing it to the libraries listed below
Sorting:
- [IJCAI-2024] The official code of Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition☆10Aug 10, 2025Updated 6 months ago
- [CVPR2025] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models☆19Apr 30, 2025Updated 10 months ago
- An Arena-style Automated Evaluation Benchmark for Detailed Captioning☆57Jun 1, 2025Updated 9 months ago
- The official code of Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition (IJCAI2023)☆27Sep 3, 2023Updated 2 years ago
- Official repository of "FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring"☆77Dec 5, 2025Updated 2 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated last year
- Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"☆123Oct 2, 2025Updated 5 months ago
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"☆35Feb 2, 2024Updated 2 years ago
- ☆48Feb 7, 2025Updated last year
- ☆20Jun 13, 2025Updated 8 months ago
- A simple script to create a virtual camera and route deepfakelive's output stream to it using Python and OpenCV☆16Jan 2, 2023Updated 3 years ago
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆12Mar 6, 2025Updated 11 months ago
- ☆12Mar 5, 2025Updated 11 months ago
- Code release for "Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models"☆14Feb 21, 2024Updated 2 years ago
- ☆11Nov 12, 2018Updated 7 years ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 3 months ago
- ☆18Aug 7, 2025Updated 6 months ago
- CaDiCaL + neural glue variable predictions☆10Oct 21, 2020Updated 5 years ago
- Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral☆91Nov 2, 2023Updated 2 years ago
- Official code of *Towards Event-oriented Long Video Understanding*☆12Jul 26, 2024Updated last year
- LLaVA-Next for STVG☆18Dec 5, 2025Updated 2 months ago
- ☆24Jul 16, 2025Updated 7 months ago
- something show& tensor show nodes. Image processing nodes: ps transfer, greyscale.... Use these nodes to make some Real Money!☆12Oct 24, 2025Updated 4 months ago
- FakeReasoning: Towards Generalizable Forgery Detection and Reasoning.☆14Aug 28, 2025Updated 6 months ago
- [ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduc…☆16Feb 22, 2025Updated last year
- Official implementation for our paper: Rethinking Video Tokenization: A Conditioned Diffusion-based Approach☆14Apr 2, 2025Updated 11 months ago
- Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries☆34Nov 19, 2025Updated 3 months ago
- [ICASSP 2025 Oral] The official implementation of paper "TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfe…☆16Mar 13, 2025Updated 11 months ago
- CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction☆18Oct 20, 2025Updated 4 months ago
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆16May 8, 2025Updated 9 months ago
- stable-diffusion-webui extension that bypass to lsmith☆12Apr 28, 2023Updated 2 years ago
- A toolkit for computing Video Fréchet Inception Distance (VFID) metrics.☆11May 28, 2024Updated last year
- ☆13Jun 26, 2023Updated 2 years ago
- Source code of the paper: Video Inpainting Localization with Contrastive Learning, IEEE SPL 2025.☆12Aug 9, 2025Updated 6 months ago
- ☆18Sep 8, 2021Updated 4 years ago
- 洛谷 API 文档☆14Nov 15, 2025Updated 3 months ago
- Offical code for: PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation☆16Dec 10, 2024Updated last year
- These are ComfyUI nodes to assist in converting images to paintings and to assist the Inspyrenet Rembg node to totally remove, or replace…☆13Oct 2, 2025Updated 5 months ago