Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
☆27Mar 13, 2026Updated 2 weeks ago
Alternatives and similar repositories for MC-ViT
Users that are interested in MC-ViT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"☆36Jun 17, 2024Updated last year
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics☆38Sep 10, 2025Updated 6 months ago
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".☆55Oct 21, 2025Updated 5 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Nov 11, 2024Updated last year
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- An experiment with movie scenes and contrastive learning☆11Feb 1, 2025Updated last year
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Nov 11, 2024Updated last year
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆66Jun 7, 2024Updated last year
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated 2 months ago
- Multi-Scale Spatio-Temporal Attention based Video Instance Segmentation☆41Sep 2, 2022Updated 3 years ago
- The official implement of paper 《DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents》☆29Oct 23, 2025Updated 5 months ago
- Implementation of the report: on the domain robustness of prefix and prompt tuning☆20Mar 10, 2022Updated 4 years ago
- Implementation of the proposed LVMAE, from the paper, Extending Video Masked Autoencoders to 128 frames, in Pytorch☆55Nov 25, 2024Updated last year
- ☆20May 11, 2025Updated 10 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- [2022 WACV] FastAno: Fast Anomaly Detection via Spatio-temporal Patch Transformation☆14Aug 7, 2023Updated 2 years ago
- The implementation of SSTAN in SUN-SEG dataset. (Semi-supervised Spatial Temporal Attention Network for Video Polyp Segmentation, MICCAI …☆13Jul 25, 2024Updated last year
- A Pytorch implementation of Diffusion-Based Probabilistic Uncertainty Estimation for Active Domain Adaptation☆15Nov 28, 2023Updated 2 years ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆115Dec 24, 2025Updated 3 months ago
- Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT …☆35Aug 10, 2023Updated 2 years ago
- [ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"☆13Dec 1, 2024Updated last year
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Mar 23, 2026Updated last week
- Official Code for "Large-scale Self-supervised Video Foundation Model for Intelligent Surgery"☆37Jun 4, 2025Updated 9 months ago
- ☆12Dec 15, 2023Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Paper list of Video LLM hallucination. Welcome to Star and Contribute!☆23Mar 6, 2026Updated 3 weeks ago
- ☆16Jul 28, 2024Updated last year
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Updated this week
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆158Dec 9, 2024Updated last year
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆12Jan 29, 2024Updated 2 years ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆23Feb 23, 2025Updated last year
- Learning Debiased and Disentangled Representations for Semantic Segmentation (NeurIPS 2021)☆13Jan 23, 2022Updated 4 years ago
- Official implementation of BPA (CVPR 2022)☆13Jun 17, 2022Updated 3 years ago
- ☆23Sep 19, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Code for the experiments and websites of the paper "Same Task, Different Circuits"☆31Oct 21, 2025Updated 5 months ago
- ☆37Oct 21, 2022Updated 3 years ago
- Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…☆54Apr 21, 2023Updated 2 years ago
- [NeurIPS 2024] Mixture of Experts for Audio-Visual Learning☆24Jan 19, 2025Updated last year
- This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"☆274Oct 15, 2025Updated 5 months ago
- NeurIPS 24 ProMISe: Promptable Medical Image Segmentation using SAM Offical Implementation☆17Feb 10, 2025Updated last year
- [TCSVT] Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection☆17Jul 22, 2023Updated 2 years ago