mlpc-ucsd / BLIVA
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
☆260Updated 5 months ago
Related projects: ⓘ
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆537Updated 3 months ago
- [ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?☆133Updated 2 weeks ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆92Updated last week
- An open-source implementation for training LLaVA-NeXT.☆240Updated 3 months ago
- This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral☆445Updated last month
- Mathematical Visual Instruction Tuning for Multi-modal Large Language Models☆86Updated last month
- [ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"☆517Updated 4 months ago
- An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions☆1,220Updated last month
- Matryoshka Query Transformer for Large Vision-Language Models☆88Updated 2 months ago
- [NeurIPS 2022] Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering☆132Updated last year
- u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model☆135Updated 2 months ago
- FACTUAL benchmark dataset, the pre-trained textual scene graph parser trained on FACTUAL.☆96Updated last month
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆81Updated 5 months ago
- 【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?☆226Updated last week
- [ICCV 2023] Spectrum-guided Multi-granularity Referring Video Object Segmentation.☆78Updated 11 months ago
- [ECCV 2022] Official implementation of the paper: Audio-Visual Segmentation☆441Updated last week
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆76Updated 3 months ago
- Accelerating the development of large multimodal models (LMMs) with lmms-eval☆1,334Updated this week
- GMoE could be the next backbone model for many kinds of generalization task.☆290Updated last year
- ☆356Updated 4 months ago
- WorldGPT: Empowering LLM as Multimodal World Model☆116Updated last month
- [CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description☆74Updated 9 months ago
- GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?☆202Updated 3 months ago
- Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning☆208Updated last week
- [MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval☆135Updated 3 weeks ago
- [CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"☆260Updated last month
- Video-Inpaint-Anything: This is the inference code for our paper CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, C…☆133Updated last week
- ☆501Updated last year
- ☆166Updated last year
- Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".☆192Updated 5 months ago