bytedance / Portrait-Mode-Video
Video dataset dedicated to portrait-mode video recognition.
☆36Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for Portrait-Mode-Video
- LMM which strictly superset LLM embedded☆30Updated 2 weeks ago
- ☆131Updated 10 months ago
- Official repo for StableLLAVA☆91Updated 10 months ago
- Official repository of MMDU dataset☆75Updated last month
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆50Updated last month
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆82Updated 4 months ago
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆77Updated 7 months ago
- ☆85Updated 11 months ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆64Updated last month
- ☆127Updated 2 weeks ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆115Updated last month
- ☆85Updated 4 months ago
- Official implement of MIA-DPO☆39Updated 2 weeks ago
- Explore the Limits of Omni-modal Pretraining at Scale☆89Updated 2 months ago
- Precision Search through Multi-Style Inputs☆54Updated 3 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆60Updated 2 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆118Updated last month
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated this week
- ☆72Updated 6 months ago
- ☆30Updated 2 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated 2 months ago
- ☆35Updated 5 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆82Updated 4 months ago
- ☆73Updated 8 months ago
- ☆36Updated last month
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆89Updated last week
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆66Updated 5 months ago
- ☆54Updated 4 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 3 months ago