TRI-ML / prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
β467Updated 4 months ago
Related projects β
Alternatives and complementary repositories for prismatic-vlms
- Compose multimodal datasets πΉβ204Updated this week
- Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ255Updated 2 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ233Updated last month
- β287Updated 9 months ago
- When do we not need larger vision models?β333Updated 2 months ago
- β569Updated 8 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ123Updated 2 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioningβ85Updated last month
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ300Updated 6 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β457Updated 3 months ago
- LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Imagesβ318Updated last month
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ199Updated last month
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Modelβ244Updated 4 months ago
- A Framework of Small-scale Large Multimodal Modelsβ635Updated 3 weeks ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ294Updated 3 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β523Updated 10 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β317Updated 3 months ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β777Updated 5 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β264Updated this week
- Aligning LMMs with Factually Augmented RLHFβ318Updated last year
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthinessβ230Updated this week
- Implementation of "PaLM-E: An Embodied Multimodal Language Model"β271Updated 9 months ago
- [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Modelsβ227Updated last month
- Famous Vision Language Models and Their Architecturesβ401Updated 2 months ago
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(β¦β242Updated this week
- LLaRA: Large Language and Robotics Assistantβ153Updated last month
- Long Context Transfer from Language to Visionβ328Updated 2 weeks ago
- Theia: Distilling Diverse Vision Foundation Models for Robot Learningβ160Updated 3 weeks ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β158Updated 3 weeks ago