yunlong10 / Awesome-LLMs-for-Video-Understanding
π₯π₯π₯Latest Papers, Codes and Datasets on Vid-LLMs.
β2,071Updated 2 months ago
Alternatives and similar repositories for Awesome-LLMs-for-Video-Understanding:
Users that are interested in Awesome-LLMs-for-Video-Understanding are comparing it to the libraries listed below
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsβ1,124Updated 2 months ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarksβ2,063Updated this week
- β3,591Updated last month
- Famous Vision Language Models and Their Architecturesβ726Updated last month
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ602Updated last month
- [ACL 2024 π₯] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capβ¦β1,320Updated 7 months ago
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ795Updated last year
- A family of lightweight multimodal models.β1,006Updated 4 months ago
- Mixture-of-Experts for Large Vision-Language Modelsβ2,127Updated 3 months ago
- [ECCV2024] Video Foundation Models & Data for Multimodal Understandingβ1,757Updated last month
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,877Updated 4 months ago
- A fork to add multimodal model training to open-r1β1,108Updated last month
- A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''β1,266Updated last year
- Next-Token Prediction is All You Needβ2,042Updated last week
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.β611Updated this week
- Emu Series: Generative Multimodal Models from BAAIβ1,695Updated 6 months ago
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.β838Updated last month
- Collection of AWESOME vision-language models for vision tasksβ2,599Updated this week
- VisionLLM Seriesβ1,031Updated last month
- Recent LLM-based CV and related works. Welcome to comment/contribute!β859Updated 2 weeks ago
- γEMNLP 2024π₯γVideo-LLaVA: Learning United Visual Representation by Alignment Before Projectionβ3,205Updated 3 months ago
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoningβ1,910Updated 2 months ago
- Awesome papers & datasets specifically focused on long-term videos.β262Updated 4 months ago
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β783Updated 7 months ago
- π A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).β614Updated this week
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β855Updated 4 months ago
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expertβ¦β1,375Updated 2 weeks ago
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understandingβ351Updated 4 months ago
- β¨β¨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysisβ493Updated last week
- β1,799Updated 9 months ago