dragonlzm / PAVELinks
This repo holds the implementation of PAVE: Patching and Adapting Video Large Language Models (CVPR2025)
☆19Updated 3 months ago
Alternatives and similar repositories for PAVE
Users that are interested in PAVE are comparing it to the libraries listed below
Sorting:
- The benchmark for "Video Object Segmentation in Panoptic Wild Scenes".☆12Updated last year
- ☆26Updated last year
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆41Updated 7 months ago
- ☆33Updated last week
- ☆22Updated 3 months ago
- ☆62Updated last year
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Updated 2 years ago
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"☆48Updated last week
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆31Updated 7 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆37Updated last year
- [TCSVT 2024] Temporally Consistent Referring Video Object Segmentation with Hybrid Memory☆17Updated 3 months ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆29Updated 3 months ago
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Updated last year
- SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality☆32Updated 7 months ago
- Unifying Specialized Visual Encoders for Video Language Models☆21Updated 3 weeks ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆43Updated last month
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆38Updated 2 months ago
- ☆58Updated last year
- ☆12Updated 11 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆15Updated last month
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆77Updated 8 months ago
- Sambor: Boosting Segment Anything Model Towards Open-Vocabulary Learning☆30Updated last year
- ☆32Updated last year
- Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting☆41Updated last week
- [CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception☆66Updated last month
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆19Updated last month
- ☆37Updated last month
- [ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video☆22Updated 11 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆41Updated last year
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆22Updated this week