TRI-ML / prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
β543Updated 6 months ago
Alternatives and similar repositories for prismatic-vlms:
Users that are interested in prismatic-vlms are comparing it to the libraries listed below
- Compose multimodal datasets πΉβ261Updated last month
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioningβ97Updated 4 months ago
- When do we not need larger vision models?β354Updated last month
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β814Updated last month
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ306Updated 9 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ250Updated 3 months ago
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ266Updated last week
- β588Updated 11 months ago
- Implementation of "PaLM-E: An Embodied Multimodal Language Model"β280Updated 11 months ago
- A Framework of Small-scale Large Multimodal Modelsβ709Updated last month
- Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.β445Updated last month
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β472Updated 5 months ago
- β304Updated 11 months ago
- Code for the Molmo Vision-Language Modelβ236Updated last month
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β324Updated this week
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Modelβ348Updated 6 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ249Updated last month
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought β¦β202Updated 3 weeks ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ308Updated 6 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformerβ348Updated this week
- β235Updated this week
- Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Gooβ¦β441Updated 2 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β278Updated 2 months ago
- Contextual Object Detection with Multimodal Large Language Modelsβ212Updated 3 months ago
- Aligning LMMs with Factually Augmented RLHFβ339Updated last year
- Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"β186Updated 2 weeks ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β556Updated last year
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthinessβ273Updated last month
- Official repo and evaluation implementation of VSI-Benchβ326Updated this week