mugen-org / MUGEN_coinrun
A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset. This repo contains scripts to train RL agents to navigate the closed world and collect video data.
☆13Updated 2 years ago
Related projects: ⓘ
- multimodal video-audio-text generation and retrieval between every pair of modalities on the MUGEN dataset. The repo. contains the traini…☆38Updated last year
- This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes …☆82Updated last year
- ☆36Updated 2 years ago
- ☆39Updated 7 months ago
- PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)☆32Updated last year
- ☆31Updated this week
- Command-line tool for downloading and extending the RedCaps dataset.☆45Updated 9 months ago
- Official PyTorch implementation of "Improving Generative Imagination in Object-Centric World Models"☆34Updated last year
- [NeurIPS 2021] Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language☆45Updated last year
- We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances…☆44Updated 3 years ago
- [NeurIPS 2021 Spotlight] Learning to Compose Visual Relations☆100Updated last year
- Official Code for Neural Systematic Binder☆28Updated last year
- Official code for Slot-Transformer for Videos (STEVE)☆41Updated last year
- Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch☆64Updated 2 years ago
- Code for Look for the Change paper published at CVPR 2022☆35Updated last year
- PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]☆54Updated 2 years ago
- https://arxiv.org/abs/2209.15162☆48Updated last year
- ☆50Updated 2 years ago
- ☆27Updated last year
- Implementation for the paper "Reliable Visual Question Answering Abstain Rather Than Answer Incorrectly" (ECCV 2022: https//arxiv.org/abs…☆32Updated last year
- ☆33Updated last year
- ☆29Updated 8 months ago
- [ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning☆63Updated 2 years ago
- Official code for NeurRIPS 2020 paper "Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D"☆26Updated last year
- [Findings of EMNLP 2022] AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant☆23Updated last year
- Code and models of MOCA (Modular Object-Centric Approach) proposed in "Factorizing Perception and Policy for Interactive Instruction Foll…☆37Updated 2 months ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Updated last year
- ☆74Updated 2 years ago
- Latent Normalizing Flows for Many-to-Many Cross Domain Mappings (ICLR 2020)☆33Updated 2 years ago
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆29Updated last year