This repo holds the implementation of PAVE: Patching and Adapting Video Large Language Models (CVPR2025)
☆27Sep 6, 2025Updated 9 months ago
Alternatives and similar repositories for PAVE
Users that are interested in PAVE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.☆12Oct 15, 2021Updated 4 years ago
- ☆17Dec 23, 2022Updated 3 years ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆63Oct 9, 2025Updated 8 months ago
- ☆17Mar 14, 2024Updated 2 years ago
- ☆13Jun 26, 2022Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Video feature extraction pipeline that supports diverse models including I3D, SlowFast, EgoVLP, and CLIP.☆13Apr 20, 2024Updated 2 years ago
- Code release for DeepEDM (ICML 2025)☆29Jan 20, 2026Updated 4 months ago
- MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering…☆13Feb 18, 2023Updated 3 years ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆21Dec 14, 2025Updated 6 months ago
- [CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"☆55Aug 8, 2023Updated 2 years ago
- ☆37Feb 17, 2026Updated 3 months ago
- Code release for RICA^2: Rubric-Informed, Calibrated Assessment of Actions (ECCV 2024)☆15Nov 9, 2025Updated 7 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆17Nov 8, 2023Updated 2 years ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆56Jul 1, 2025Updated 11 months ago
- Official implementation of POODLE: Improving Few-shot Learning via Penalizing Out-of-Distribution Samples (NeurIPS 2021)☆14Aug 6, 2022Updated 3 years ago
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆85Feb 27, 2025Updated last year
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆26Jun 4, 2025Updated last year
- Official code repository of Shuffle-R1☆26Feb 23, 2026Updated 3 months ago
- LaTeX template for the undergraduate thesis of Central South University☆14Dec 4, 2019Updated 6 years ago
- A Fast PyTorch implementation for ICCV 19 paper "BMN: Boundary-Matching Network for Temporal Action Proposal Generation"☆10Jul 29, 2019Updated 6 years ago
- [ICML 2024 Oral] LSH-Based Efficient Point Transformer (HEPT)☆26Jan 24, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Official Implementation of SnAG (CVPR 2024)☆60Apr 26, 2025Updated last year
- [ICCV 2021] Official code for "Learning to Generate Scene Graph from Natural Language Supervision"☆100Apr 4, 2023Updated 3 years ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆35Feb 22, 2026Updated 3 months ago
- ☆29Apr 8, 2025Updated last year
- [ECCV 2020] Official code for "Comprehensive Image Captioning via Scene Graph Decomposition"☆99Aug 20, 2024Updated last year
- Streaming Video Instruction Tuning☆75Feb 25, 2026Updated 3 months ago
- ☆31Jan 18, 2026Updated 4 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Code for our paper "Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers"☆39Jan 27, 2026Updated 4 months ago
- Code of Deno-IF: Unsupervised Noisy Visible and Infrared Image Fusion Method (NeurIPS 2025)☆27Dec 27, 2025Updated 5 months ago
- MediaPipeのFaceMesh検出を用いて、虹彩部分に写輪眼(©NARUTO -ナルト-)を表示するプログラム☆11Apr 16, 2022Updated 4 years ago
- ☆10Apr 7, 2025Updated last year
- Adapting LLaMA Decoder to Vision Transformer☆30May 20, 2024Updated 2 years ago
- A tiny package supporting distributed computation of COCO metrics for PyTorch models.☆15Feb 28, 2023Updated 3 years ago
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics☆37Sep 10, 2025Updated 9 months ago