facebookresearch / VLaMP
Code for “Pretrained Language Models as Visual Planners for Human Assistance”
☆60Updated last year
Alternatives and similar repositories for VLaMP:
Users that are interested in VLaMP are comparing it to the libraries listed below
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆45Updated last year
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆52Updated last year
- [CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"☆52Updated last year
- A Video Tokenizer Evaluation Dataset☆107Updated 2 months ago
- ☆73Updated 2 years ago
- https://arxiv.org/abs/2209.15162☆49Updated 2 years ago
- Language Repository for Long Video Understanding☆31Updated 9 months ago
- Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)☆32Updated last year
- Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"☆48Updated 2 months ago
- Official repo for StableLLAVA☆94Updated last year
- Recursive Visual Programming (ECCV 2024)☆17Updated 4 months ago
- ElasticTok: Adaptive Tokenization for Image and Video☆61Updated 4 months ago
- ☆83Updated last year
- ☆71Updated 8 months ago
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environment☆21Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 7 months ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆24Updated 2 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 6 months ago
- Language Quantized AutoEncoders☆103Updated 2 years ago
- ☆72Updated 10 months ago
- Code for LaMPP: Language Models as Probabilistic Priors for Perception and Action☆36Updated last year
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated last year
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆104Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆115Updated 8 months ago
- ☆48Updated last year
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Updated last year
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆97Updated 3 months ago
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆78Updated 2 years ago
- Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…☆53Updated last year