google-deepmind/videoprism

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-deepmind/videoprism)

google-deepmind / videoprism

Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)

☆386

Alternatives and similar repositories for videoprism

Users that are interested in videoprism are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / perception_models
View on GitHub
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
☆2,324Apr 13, 2026Updated 3 months ago
Yaofang-Liu / Pusa-VidGen
View on GitHub
Pusa: Thousands Timesteps Video Diffusion Model
☆685Feb 13, 2026Updated 5 months ago
lumalabs / tvm
View on GitHub
Terminal Velocity Matching
☆90Feb 14, 2026Updated 5 months ago
bytedance / vidi
View on GitHub
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆646Mar 4, 2026Updated 4 months ago
facebookresearch / vjepa2
View on GitHub
PyTorch code and models for VJEPA2 self-supervised learning from video.
☆4,372Mar 23, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
bytedance / video-SALMONN-2
View on GitHub
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…
☆204Feb 23, 2026Updated 4 months ago
yunlong10 / CAT-V
View on GitHub
[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…
☆67Jan 27, 2026Updated 5 months ago
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
kyutai-labs / kaudio
View on GitHub
Rust crate for some audio utilities
☆32Jun 17, 2026Updated last month
yeliudev / VideoMind
View on GitHub
🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)
☆346Feb 8, 2026Updated 5 months ago
Eyeline-Labs / CineScale
View on GitHub
Code for CineScale, higher-resolution video generation based on Wan
☆185Aug 25, 2025Updated 10 months ago
meta-pytorch / torchcodec
View on GitHub
PyTorch media decoding and encoding
☆1,143Updated this week
playht / PlayDiffusion
View on GitHub
☆538Jun 11, 2026Updated last month
DAMO-NLP-SG / VideoLLaMA3
View on GitHub
Frontier Multimodal Foundation Models for Image and Video Understanding
☆1,172Aug 14, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
magic-research / PLLaVA
View on GitHub
Official repository for the paper PLLaVA
☆669Jul 28, 2024Updated last year
LifuWang-66 / DistillT5
View on GitHub
(CVPR 2025) Scailing Down Text Encoders of Text-to-Image Diffusion Models
☆53Sep 10, 2025Updated 10 months ago
SHI-Labs / Slow-Fast-Video-Multimodal-LLM
View on GitHub
☆29Apr 8, 2025Updated last year
ByteDance-Seed / Bagel
View on GitHub
Open-source unified multimodal model
☆6,103May 4, 2026Updated 2 months ago
jylins / hourllava
View on GitHub
[NeurIPS 2025 Spotlight] Unleashing Hour-Scale Video Training for Long Video-Language Understanding
☆19Jun 24, 2025Updated last year
facebookresearch / dinov3
View on GitHub
Reference PyTorch implementation and models for DINOv3
☆10,973Updated this week
Vision-CAIR / LongVU
View on GitHub
[ICML 2025] Official PyTorch implementation of LongVU
☆429May 8, 2025Updated last year
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,630Jan 30, 2026Updated 5 months ago
google-deepmind / vocap
View on GitHub
☆17Sep 5, 2025Updated 10 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
huggingface / finetrainers
View on GitHub
Scalable and memory-optimized training of diffusion models
☆1,353May 26, 2026Updated last month
OpenGVLab / InternVideo
View on GitHub
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
☆2,339Jul 2, 2026Updated 2 weeks ago
nv-tlabs / ChronoEdit
View on GitHub
[ICLR 2026] ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
☆697Nov 20, 2025Updated 8 months ago
G-U-N / Awesome-Pixel-Flow
View on GitHub
☆38Dec 25, 2025Updated 6 months ago
FeiElysia / Tempo
View on GitHub
Tempo: Small Vision-Language Models are Smart Compressors for Long Video Understanding (ECCV 2026)
☆76Jun 29, 2026Updated 3 weeks ago
facebookresearch / EdgeTAM
View on GitHub
[CVPR 2025] Official PyTorch implementation of "EdgeTAM: On-Device Track Anything Model"
☆946Jan 27, 2026Updated 5 months ago
GaParmar / group-inference
View on GitHub
Scalable group inference for generating high quality and diverse images with diffusion models.
☆43Aug 31, 2025Updated 10 months ago
FunAudioLLM / ThinkSound
View on GitHub
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Tho…
☆1,373Apr 3, 2026Updated 3 months ago
FoundationVision / FlashVideo
View on GitHub
[AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
☆462Mar 5, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Phantom-video / HuMo
View on GitHub
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
☆1,273Jan 25, 2026Updated 5 months ago
PKU-YuanGroup / UniWorld
View on GitHub
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
☆883Dec 23, 2025Updated 6 months ago
bytedance / XVerse
View on GitHub
[NeurIPS 2025] Official implementation of "XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulatio…
☆626Oct 22, 2025Updated 8 months ago
Nojahhh / cogvideox-loras
View on GitHub
CogVideoX-LoRAs is a centralized repository for all LoRA models created for CogVideoX, filling the gap for a unified sharing space. With …
☆81Dec 4, 2024Updated last year
tianweiy / CausVid
View on GitHub
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
☆1,397Aug 7, 2025Updated 11 months ago
rhymes-ai / Aria
View on GitHub
Codebase for Aria - an Open Multimodal Native MoE
☆1,086Jan 22, 2025Updated last year
v-iashin / Synchformer
View on GitHub
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆130Sep 15, 2025Updated 10 months ago