DAMO-NLP-SG / VideoLLaMA3
Frontier Multimodal Foundation Models for Image and Video Understanding
☆664Updated this week
Alternatives and similar repositories for VideoLLaMA3:
Users that are interested in VideoLLaMA3 are comparing it to the libraries listed below
- ☆365Updated 3 weeks ago
- Official repository for the paper PLLaVA☆643Updated 7 months ago
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆327Updated this week
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆206Updated 6 months ago
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs☆1,117Updated 2 months ago
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆838Updated last month
- 🔥🔥First-ever hour scale video understanding models☆259Updated this week
- Long Context Transfer from Language to Vision☆368Updated last week
- R1-onevision, a visual language model capable of deep CoT reasoning.☆464Updated last week
- VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling☆373Updated this week
- ☆743Updated this week
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆173Updated 3 months ago