TRI-ML / prismatic-vlmsLinks
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
β693Updated 11 months ago
Alternatives and similar repositories for prismatic-vlms
Users that are interested in prismatic-vlms are comparing it to the libraries listed below
Sorting:
- Compose multimodal datasets πΉβ393Updated this week
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learningβ359Updated 5 months ago
- Implementation of "PaLM-E: An Embodied Multimodal Language Model"β305Updated last year
- Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.β499Updated 6 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioningβ112Updated 8 months ago
- Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Successβ415Updated last month
- Official repo and evaluation implementation of VSI-Benchβ492Updated 3 months ago
- Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Gooβ¦β640Updated 2 months ago
- Embodied Chain of Thought: A robotic policy that reason to solve the task.β254Updated 2 months ago
- Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long cβ¦β435Updated last week
- Code for the Molmo Vision-Language Modelβ431Updated 5 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β619Updated last year
- A Framework of Small-scale Large Multimodal Modelsβ825Updated last month
- [ICLR 2025] LAPA: Latent Action Pretraining from Videosβ293Updated 4 months ago
- Video-R1: Reinforcing Video Reasoning in MLLMs [π₯the first paper to explore R1 for video]β546Updated last week
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ276Updated 4 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ330Updated last month
- β354Updated 4 months ago
- A curated list of awesome papers on Embodied AI and related research/industry-driven resources.β440Updated last month
- Embodied Reasoning Question Answer (ERQA) Benchmarkβ159Updated 2 months ago
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ283Updated 8 months ago
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ212Updated 2 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought β¦β323Updated 5 months ago
- Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"β208Updated 2 weeks ago
- When do we not need larger vision models?β396Updated 3 months ago
- Democratization of RT-2 "RT-2: New model translates vision and language into action"β458Updated 10 months ago
- β334Updated last year
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β438Updated 2 weeks ago
- β613Updated last year
- CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasksβ573Updated 3 months ago