zehanwang01 / OmniBind
☆21Updated last month
Related projects: ⓘ
- ☆20Updated 9 months ago
- ☆31Updated 3 months ago
- Efficient Multi-modal Models via Stage-wise Visual Context Compression☆34Updated last month
- Video dataset dedicated to portrait-mode video recognition.☆35Updated 5 months ago
- ☆19Updated last month
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated last month
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆103Updated 3 weeks ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆21Updated 2 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆54Updated last year
- ☆83Updated 9 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆73Updated 2 months ago
- ☆46Updated 10 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆12Updated last month
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆58Updated 3 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆51Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆40Updated 3 months ago
- ☆70Updated 4 months ago
- ☆32Updated 3 months ago
- ☆28Updated 2 weeks ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- Official repo for StableLLAVA☆90Updated 8 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆40Updated 2 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆18Updated last month
- The codebase for our paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model☆37Updated last month
- Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆19Updated this week
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated last week
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆68Updated 7 months ago
- ☆100Updated last month
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆36Updated last year