PKU-YuanGroup / LanguageBindLinks
γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
β822Updated last year
Alternatives and similar repositories for LanguageBind
Users that are interested in LanguageBind are comparing it to the libraries listed below
Sorting:
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β832Updated last year
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsβ1,211Updated 6 months ago
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ642Updated 6 months ago
- β790Updated last year
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Languageβ652Updated 10 months ago
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understandingβ389Updated 3 months ago
- A Framework of Small-scale Large Multimodal Modelsβ874Updated 3 months ago
- [ECCV2024] Video Foundation Models & Data for Multimodal Understanding