[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
☆33Feb 6, 2025Updated last year
Alternatives and similar repositories for MMTrail
Users that are interested in MMTrail are comparing it to the libraries listed below
Sorting:
- ☆24May 23, 2025Updated 9 months ago
- On Path to Multimodal Generalist: General-Level and General-Bench☆18Jul 11, 2025Updated 7 months ago
- ☆16Dec 12, 2023Updated 2 years ago
- [Arxiv2022] Interpreting Class Conditional GANs with Channel Awareness☆17Apr 4, 2022Updated 3 years ago
- 🕹️ Explore cutting-edge techniques in game generation☆59Aug 22, 2025Updated 6 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 11 months ago
- Implementation of MathReader, Text-to-Speech for Mathematical Documents☆27Sep 23, 2025Updated 5 months ago
- A dataset for Audio-Visual Sound Event Detection in Movies☆26Jan 23, 2023Updated 3 years ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆45Jul 2, 2025Updated 8 months ago
- [ICCV 2025] VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE☆392Jan 19, 2025Updated last year
- This is a boilerplate project for building mobile applications using Expo, React, and Redux. It provides a solid foundation for creating …☆12Apr 6, 2025Updated 10 months ago
- official code for CVPR'24 paper Diff-BGM☆71Oct 12, 2024Updated last year
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆73Mar 18, 2025Updated 11 months ago
- A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022☆35Oct 9, 2023Updated 2 years ago
- [ICCV 2023] Video Background Music Generation: Dataset, Method and Evaluation☆78Mar 29, 2024Updated last year
- A collection of visual instruction tuning datasets.☆76Mar 14, 2024Updated last year
- [IROS 2021] Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention"☆35Aug 21, 2021Updated 4 years ago
- we explores the fascinating domain of text-to-image generation using the powerful capabilities of the Flux API. The objective is to trans…☆12Aug 14, 2024Updated last year
- Repo for "Centaur: Robust Multimodal Fusion for Human Activity Recognition"☆10Jan 9, 2024Updated 2 years ago
- ☆85Dec 4, 2022Updated 3 years ago
- Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint. CVPR 2023☆42Jun 10, 2023Updated 2 years ago
- Minimalist RL for Diffusion LLMs with SOTA reasoning performance (89.1% GSM8K). Official implementation of "The Flexibility Trap".☆126Jan 24, 2026Updated last month
- Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control☆98Feb 18, 2026Updated last week
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆16Nov 19, 2025Updated 3 months ago
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- Code for the AAAI 2024 paper: "AGS: Affordable and Generalizable Substitute Training for Transferable Adversarial Attack" (accepted).☆12Mar 28, 2024Updated last year
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆76Jan 25, 2026Updated last month
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆82Oct 15, 2025Updated 4 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆176Sep 26, 2024Updated last year
- This sample shows how to build vector similarity search on Azure Cosmos DB for PostgreSQL using the pgvector extension and the multi-moda…☆11Jul 13, 2024Updated last year
- ☆14Jun 2, 2025Updated 9 months ago
- Simple SSH workspace to connect to your running job.☆10Jul 16, 2018Updated 7 years ago
- A python tool help to interact with chatgpt.☆10Dec 11, 2022Updated 3 years ago
- EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation☆41Oct 19, 2022Updated 3 years ago
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 4 months ago
- An example starting point monorepo for data sicence teams☆14Jun 27, 2023Updated 2 years ago
- A Framework for Symbolic MUsic Graph Explanations☆10Jul 30, 2025Updated 7 months ago
- init☆10May 25, 2025Updated 9 months ago
- Sound Separation, Omni modal☆28Sep 15, 2025Updated 5 months ago