mugen-org / MUGEN_coinrun
A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset. This repo contains scripts to train RL agents to navigate the closed world and collect video data.
☆13Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for MUGEN_coinrun
- multimodal video-audio-text generation and retrieval between every pair of modalities on the MUGEN dataset. The repo. contains the traini…☆39Updated last year
- ☆39Updated 10 months ago
- This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes …☆82Updated last year
- ElasticTok: Adaptive Tokenization for Image and Video☆32Updated 2 weeks ago
- ☆37Updated 2 years ago
- We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances…☆45Updated 3 years ago
- [NeurIPS 2021 Spotlight] Learning to Compose Visual Relations☆101Updated last year
- ☆33Updated 10 months ago
- PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)☆32Updated last year
- DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)☆137Updated 11 months ago
- Command-line tool for downloading and extending the RedCaps dataset.☆45Updated 11 months ago
- ☆34Updated last year
- ☆29Updated last year
- RareAct: A video dataset of unusual interactions☆32Updated 4 years ago
- Official PyTorch implementation of "Improving Generative Imagination in Object-Centric World Models"☆34Updated last year
- ☆74Updated 2 years ago
- Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch☆64Updated 2 years ago
- Official Code for Neural Systematic Binder☆29Updated last year
- ☆50Updated 2 years ago
- Code for Look for the Change paper published at CVPR 2022☆35Updated 2 years ago
- Pytorch Implementation of paper "Object-Centric Learning with Slot Attention"☆82Updated last year
- Release of ImageNet-Captions☆45Updated last year
- Latent Normalizing Flows for Many-to-Many Cross Domain Mappings (ICLR 2020)☆33Updated 2 years ago
- A new play-and-plug method of controlling an existing generative model with conditioning attributes and their compositions.☆71Updated 2 years ago
- ☆45Updated 6 months ago
- [Findings of EMNLP 2022] AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant☆23Updated last year
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆76Updated last year
- Code and models of MOCA (Modular Object-Centric Approach) proposed in "Factorizing Perception and Policy for Interactive Instruction Foll…☆37Updated 5 months ago
- JAX implementation ViT-VQGAN☆77Updated 2 years ago