facebookresearch / VLaMPLinks

Code for “Pretrained Language Models as Visual Planners for Human Assistance”

☆61

Alternatives and similar repositories for VLaMP

Users that are interested in VLaMP are comparing it to the libraries listed below

Sorting:

allenai / grit_official
Official repository for the General Robust Image Task (GRIT) Benchmark
☆54Updated 2 years ago
facebookresearch / EgoObjects
[ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding
☆76Updated last year
jmerullo / limber
https://arxiv.org/abs/2209.15162
☆50Updated 2 years ago
facebookresearch / HierVL
[CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings
☆46Updated last year
facebookresearch / EgoT2
Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)
☆33Updated 2 years ago
mhh0318 / UniD3
☆54Updated 2 years ago
agrimgupta92 / maskvit
☆73Updated 3 years ago
wade3han / champagne
An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"
☆52Updated last year
facebookresearch / CiT
Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".
☆78Updated 2 years ago
lucidrains / MaMMUT-pytorch
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch
☆103Updated last year
mlfoundations / VisIT-Bench
☆50Updated last year
facebookresearch / PartDistillation
Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"
☆58Updated last year
jeykigung / HiCLIP
☆29Updated 2 years ago
NVlabs / RelViT
[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
☆63Updated 2 years ago
LAION-AI / General-GPT
☆65Updated last year
sanjayss34 / codevqa
☆85Updated 2 years ago
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆56Updated 2 years ago
pinterest / atg-research
☆62Updated last week
google-research-datasets / videoCC-data
VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…
☆78Updated 2 years ago
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆105Updated last year
zinengtang / Perceiver_VL
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
☆33Updated 2 years ago
facebookresearch / long_seq_mae
code release of research paper "Exploring Long-Sequence Masked Autoencoders"
☆100Updated 2 years ago
LAION-AI / laion50BU
Un-*** 50 billions multimodality dataset
☆23Updated 2 years ago
UX-Decoder / FIND
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆125Updated 11 months ago
haoliuhl / language-quantized-autoencoders
Language Quantized AutoEncoders
☆108Updated 2 years ago
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆95Updated last year
TXH-mercury / COSA
[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
☆43Updated 7 months ago
facebookresearch / maws
Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496
☆91Updated 3 months ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆133Updated last year
UMass-Embodied-AGI / CoVLM
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆45Updated last month