OpenGVLab / InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
☆1,402Updated last month
Related projects ⓘ
Alternatives and complementary repositories for InternVideo
- [ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding☆834Updated 4 months ago
- 【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment☆717Updated 7 months ago
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding☆524Updated last week
- [CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want☆689Updated 3 months ago
- VisionLLM Series☆900Updated 3 weeks ago
- [ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the cap…☆1,206Updated 2 months ago
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆777Updated 5 months ago
- An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"☆876Updated 6 months ago
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)☆728Updated 3 months ago
- [NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training☆1,368Updated 11 months ago
- [ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models☆294Updated 5 months ago
- EVA Series: Visual Representation Fantasies from BAAI☆2,293Updated 3 months ago
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding☆283Updated 5 months ago
- Grounded Language-Image Pre-training☆2,217Updated 9 months ago
- Recent LLM-based CV and related works. Welcome to comment/contribute!☆837Updated 5 months ago
- Official repository for the paper PLLaVA☆581Updated 3 months ago
- A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''☆1,186Updated 7 months ago
- Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).☆1,134Updated 4 months ago
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs☆847Updated this week
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆660Updated 2 months ago
- Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding☆552Updated last month
- Multi-modality pre-training☆470Updated 6 months ago
- ☆744Updated 4 months ago
- This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.☆689Updated last year
- [CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language☆1,288Updated last year
- The suite of modeling video with Mamba☆231Updated 5 months ago
- 🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.☆1,502Updated last month
- GIT: A Generative Image-to-text Transformer for Vision and Language☆548Updated 11 months ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers☆522Updated 2 weeks ago
- Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆238Updated 7 months ago