yunlong10/CAT-V

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yunlong10/CAT-V)

yunlong10 / CAT-V

[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting

☆68

Alternatives and similar repositories for CAT-V

Users that are interested in CAT-V are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jing-bi / awesome-M.LLM-reasoning
View on GitHub
☆20May 11, 2025Updated last year
yunlong10 / AVicuna
View on GitHub
[AAAI 2025] Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
☆34Mar 21, 2025Updated last year
yunlong10 / Video-R4
View on GitHub
Reinforcing Text-Rich Video Reasoning with Visual Rumination
☆28Jun 5, 2026Updated last month
hanghuacs / FineCaption
View on GitHub
☆39Jun 20, 2025Updated last year
yunlong10 / VidComposition
View on GitHub
[CVPR 2025] VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
☆30May 10, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hanghuacs / V2Xum-LLM
View on GitHub
☆27Jan 4, 2025Updated last year
hanghuacs / MMComposition
View on GitHub
☆17Jun 20, 2025Updated last year
gyxxyg / TRACE
View on GitHub
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆156Aug 22, 2025Updated 11 months ago
ZehuaKcrissLi / GTR-Voice
View on GitHub
☆16Nov 11, 2024Updated last year
yunlong10 / Awesome-Video-LMM-Post-Training
View on GitHub
🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training
☆296Mar 3, 2026Updated 4 months ago
Andy-Cheng / TEMPURA
View on GitHub
TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…
☆27Jun 4, 2025Updated last year
mlvlab / VidChain
View on GitHub
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…
☆25Jan 26, 2025Updated last year
WikiChao / FreSca
View on GitHub
[CVPR 2025 GMCV] Test-Time Frequency Scaling: Instant Frequency Control for Any Diffusion Model
☆55May 31, 2025Updated last year
WikiChao / DAVIS
View on GitHub
[🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound …
☆33Mar 30, 2026Updated 3 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
yunlong10 / Awesome-AI4Animation
View on GitHub
[ICCVW 2025] This repository includes latest papers, projects and datasets on GenAI for Cel-Animation. Accepted by ICCV 2025 AISTORY Wor…
☆206Jan 13, 2026Updated 6 months ago
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
HowieHwong / Agentic-Guardian
View on GitHub
[ICLR'26] Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
☆48Oct 26, 2025Updated 9 months ago
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
TencentARC / ARC-Chapter
View on GitHub
Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
☆44Nov 19, 2025Updated 8 months ago
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
wenhaochai / aurora
View on GitHub
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆147Jun 4, 2025Updated last year
marinero4972 / CyberV
View on GitHub
☆20Jun 10, 2025Updated last year
zjr2000 / REVERIE
View on GitHub
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
☆20Jul 17, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
zihuixue / ProgCaptioner
View on GitHub
Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)
☆26Jul 16, 2025Updated last year
NIneeeeeem / LangDC
View on GitHub
[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.
☆18Sep 7, 2025Updated 10 months ago
WHB139426 / Grounded-Video-LLM
View on GitHub
[EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆149Aug 21, 2025Updated 11 months ago
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated last year
yeates / Aurora
View on GitHub
Aurora: Unified Video Editing with a Tool-Using Agent
☆59Jun 16, 2026Updated last month
heliossun / LaCoT
View on GitHub
[NeurIPS 2025] Official code for paper: Latent Chain-of-Thought for Visual Reasoning
☆36Oct 16, 2025Updated 9 months ago
liangsusan-git / AV-NeRF
View on GitHub
[NeurIPS 2023] AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
☆36Feb 15, 2024Updated 2 years ago
Hon-Wong / Elysium
View on GitHub
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
☆89Oct 25, 2024Updated last year
WikiChao / ScalingConcept
View on GitHub
☆24Nov 1, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MAC-AutoML / WFS-SB
View on GitHub
[CVPR 2026] Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
☆32Apr 12, 2026Updated 3 months ago
minghangz / OnVTG
View on GitHub
Online video temporal grounding
☆16Oct 20, 2025Updated 9 months ago
qiulu66 / Anime-Shooter
View on GitHub
☆56Jun 4, 2025Updated last year
OpenGVLab / VideoChat-R1
View on GitHub
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆268Oct 18, 2025Updated 9 months ago
mlvlab / DeepVideoR1
View on GitHub
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆36Feb 22, 2026Updated 5 months ago
scofield7419 / Video-of-Thought
View on GitHub
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆182Feb 25, 2025Updated last year
ChocoWu / Any2Caption
View on GitHub
This is the project for 'Any2Caption', Interpreting Any Condition to Caption for Controllable Video Generation
☆49Apr 3, 2025Updated last year