facebookresearch / jepa-intuitive-physics
This repo contains the code for the paper "Intuitive physics understanding emerges fromself-supervised pretraining on natural videos"
☆111Updated last month
Alternatives and similar repositories for jepa-intuitive-physics:
Users that are interested in jepa-intuitive-physics are comparing it to the libraries listed below
- Benchmarking physical understanding in generative video models☆140Updated this week
- Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long c…☆218Updated this week
- Pytorch implementation of "Genie: Generative Interactive Environments", Bruce et al. (2024).☆143Updated 7 months ago
- ☆164Updated this week
- Implementation of a framework for Genie2 in Pytorch☆144Updated 2 months ago
- ☆122Updated 2 months ago
- An ML research template with good documentation by Boyuan Chen, an MIT PhD student☆64Updated 3 weeks ago
- The PyTorch implementation of paper: "AdaWorld: Learning Adaptable World Models with Latent Actions".☆36Updated this week
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training☆252Updated last month
- Official PyTorch Implementation of "History-Guided Video Diffusion"☆241Updated 3 weeks ago
- Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environment…☆240Updated last week
- A Video Tokenizer Evaluation Dataset☆109Updated 2 months ago
- 🔥[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy☆197Updated 2 weeks ago
- This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …☆84Updated 10 months ago
- Scaling Vision Pre-Training to 4K Resolution☆60Updated this week
- Clarity: A Minimalist Website Template for AI Research☆110Updated 2 months ago
- [ICLR 2025] LAPA: Latent Action Pretraining from Videos☆199Updated 2 months ago
- ElasticTok: Adaptive Tokenization for Image and Video☆64Updated 4 months ago
- Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223☆121Updated 3 weeks ago
- Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".☆203Updated 4 months ago
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆145Updated last week
- Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"☆108Updated last month
- code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"☆783Updated 2 weeks ago
- Official implementation of "Self-Improving Video Generation"☆62Updated 3 weeks ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆170Updated 9 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆195Updated last week
- DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control☆100Updated 5 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆62Updated last week
- Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"☆79Updated 2 weeks ago
- An open source implementation of CLIP (With TULIP Support)☆97Updated last week