zehanwang01 / OmniBind
☆25Updated last week
Related projects ⓘ
Alternatives and complementary repositories for OmniBind
- Official implement of MIA-DPO☆41Updated 3 weeks ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆17Updated 3 months ago
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆70Updated 9 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- ☆21Updated 3 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 3 weeks ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated last week
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 3 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆51Updated last month
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆41Updated 5 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆83Updated 4 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated 2 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆42Updated 3 months ago
- Video dataset dedicated to portrait-mode video recognition.☆38Updated 7 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆41Updated 3 weeks ago
- Data-Efficient Multimodal Fusion on a Single GPU☆47Updated 6 months ago
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆66Updated 5 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆52Updated last year
- The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". Th…☆33Updated 2 weeks ago
- Officail Repo of γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆19Updated 3 weeks ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆24Updated last month
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆55Updated 2 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 3 months ago
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆27Updated 5 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆43Updated 11 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆140Updated 2 weeks ago
- ☆105Updated 3 months ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆31Updated last year