facebookresearch / VLaMP
Code for “Pretrained Language Models as Visual Planners for Human Assistance”
☆57Updated last year
Related projects ⓘ
Alternatives and complementary repositories for VLaMP
- [CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"☆51Updated last year
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆44Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆50Updated last year
- Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"☆59Updated 10 months ago
- Language Repository for Long Video Understanding☆28Updated 4 months ago
- ☆83Updated last year
- Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)☆31Updated last year
- Language Quantized AutoEncoders☆94Updated last year
- https://arxiv.org/abs/2209.15162☆48Updated last year
- ☆47Updated last year
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆97Updated last year
- PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)☆32Updated last year
- code release of research paper "Exploring Long-Sequence Masked Autoencoders"☆99Updated 2 years ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 2 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆40Updated last week
- Official repo for StableLLAVA☆90Updated 10 months ago
- ☆64Updated 4 months ago
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆76Updated last year
- Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"☆46Updated last year
- ☆72Updated 5 months ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆102Updated last year
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆29Updated 4 months ago
- M4 experiment logbook☆56Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated last year
- [ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning☆64Updated 2 years ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 4 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆52Updated last year
- multimodal video-audio-text generation and retrieval between every pair of modalities on the MUGEN dataset. The repo. contains the traini…☆39Updated last year