TRI-ML / prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
β568Updated 7 months ago
Alternatives and similar repositories for prismatic-vlms:
Users that are interested in prismatic-vlms are comparing it to the libraries listed below
- Compose multimodal datasets πΉβ279Updated last week
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioningβ101Updated 4 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ254Updated 4 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ276Updated 2 months ago
- Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.β455Updated 2 months ago
- Implementation of "PaLM-E: An Embodied Multimodal Language Model"β285Updated last year
- Code for the Molmo Vision-Language Modelβ282Updated 2 months ago
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ266Updated last month
- β268Updated 3 weeks ago
- β598Updated last year
- When do we not need larger vision models?β364Updated last week
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought β¦β227Updated last month
- Official repo and evaluation implementation of VSI-Benchβ373Updated 3 weeks ago
- β308Updated last year
- Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS D&B 2024 Oral)β169Updated last month
- π₯[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ182Updated last week
- Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Gooβ¦β478Updated 3 months ago
- Theia: Distilling Diverse Vision Foundation Models for Robot Learningβ203Updated 4 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ307Updated 10 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β474Updated 6 months ago
- world modeling challenge for humanoid robotsβ432Updated 3 months ago
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ290Updated this week
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β823Updated 2 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMsβ134Updated 5 months ago
- Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"β189Updated 2 weeks ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ222Updated 3 weeks ago
- Embodied Chain of Thought: A robotic policy that reason to solve the task.β134Updated 5 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformerβ364Updated last month
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Modelβ351Updated 7 months ago
- Voltron: Language-Driven Representation Learning for Roboticsβ217Updated last year