TRI-ML / prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
β610Updated 8 months ago
Alternatives and similar repositories for prismatic-vlms:
Users that are interested in prismatic-vlms are comparing it to the libraries listed below
- Compose multimodal datasets πΉβ309Updated this week
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ323Updated 3 months ago
- Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.β473Updated 3 months ago
- β299Updated last month
- Implementation of "PaLM-E: An Embodied Multimodal Language Model"β286Updated last year
- Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Gooβ¦β544Updated 3 weeks ago
- Embodied Chain of Thought: A robotic policy that reason to solve the task.β182Updated last week
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ268Updated 2 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioningβ103Updated 6 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ262Updated 6 months ago
- π₯[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ195Updated this week
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Modelβ355Updated 8 months ago
- β601Updated last year
- Implementation of Οβ, the robotic foundation model architecture proposed by Physical Intelligenceβ371Updated last month
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought β¦β271Updated 2 months ago
- Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"β199Updated 2 weeks ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β852Updated 3 months ago
- Code for the Molmo Vision-Language Modelβ327Updated 3 months ago
- When do we not need larger vision models?β378Updated last month
- A Framework of Small-scale Large Multimodal Modelsβ769Updated last month
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β477Updated 7 months ago
- Theia: Distilling Diverse Vision Foundation Models for Robot Learningβ214Updated 5 months ago
- Recent LLM-based CV and related works. Welcome to comment/contribute!β858Updated last week
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β219Updated last month
- [ICLR 2025] LAPA: Latent Action Pretraining from Videosβ186Updated last month
- Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence