TRI-ML / prismatic-vlmsLinks

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

☆813

Alternatives and similar repositories for prismatic-vlms

Users that are interested in prismatic-vlms are comparing it to the libraries listed below

Sorting:

remyxai / VQASynth
Compose multimodal datasets 🎹
☆485Updated 2 months ago
RL4VLM / RL4VLM
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
☆392Updated 9 months ago
liruiw / HPT
Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.
☆517Updated 10 months ago
MichalZawalski / embodied-CoT
Embodied Chain of Thought: A robotic policy that reason to solve the task.
☆309Updated 6 months ago
kyegomez / PALM-E
Implementation of "PaLM-E: An Embodied Multimodal Language Model"
☆325Updated last year
moojink / openvla-oft
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
☆719Updated last month
embodiedreasoning / ERQA
Embodied Reasoning Question Answer (ERQA) Benchmark
☆225Updated 7 months ago
facebookresearch / open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
☆319Updated last year
allenai / molmo
Code for the Molmo Vision-Language Model
☆766Updated 10 months ago
simpler-env / SimplerEnv
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Goo…
☆778Updated 6 months ago
nvidia-cosmos / cosmos-reason1
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long c…
☆736Updated last week
LatentActionPretraining / LAPA
[ICLR 2025] LAPA: Latent Action Pretraining from Videos
☆381Updated 8 months ago
vision-x-nyu / thinking-in-space
Official repo and evaluation implementation of VSI-Bench
☆604Updated 2 months ago
mees / calvin
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
☆701Updated last month
TRI-ML / vlm-evaluation
VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
☆126Updated last year
Robot-VLAs / RoboVLMs
☆396Updated 8 months ago
1x-technologies / 1xgpt
world modeling challenge for humanoid robots
☆510Updated 11 months ago
embodied-agent-interface / embodied-agent-interface
Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS D&B 2024 Oral)
☆258Updated 7 months ago
PRIME-RL / SimpleVLA-RL
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
☆818Updated last week
lucidrains / pi-zero-pytorch
Implementation of π₀, the robotic foundation model architecture proposed by Physical Intelligence
☆514Updated 2 months ago
kyegomez / RT-2
Democratization of RT-2 "RT-2: New model translates vision and language into action"
☆515Updated last year
haoranD / Awesome-Embodied-AI
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
☆471Updated 4 months ago
Lifelong-Robot-Learning / LIBERO
Benchmarking Knowledge Transfer in Lifelong Robot Learning
☆943Updated 6 months ago
alibaba-damo-academy / WorldVLA
WorldVLA: Towards Autoregressive Action World Model
☆435Updated last month
DelinQu / awesome-vision-language-action-model
Latest Advances on Vison-Language-Action Models.
☆112Updated 7 months ago
OpenGVLab / Instruct2Act
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
☆370Updated last year
Stanford-ILIAD / openvla-mini
OpenVLA: An open-source vision-language-action model for robotic manipulation.
☆267Updated 6 months ago
allenai / unified-io-2
☆628Updated last year
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆707Updated 3 weeks ago
kyegomez / RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
☆226Updated this week