JoVA: Unified Multimodal Learning for Joint Video-Audio Generation
☆30Dec 22, 2025Updated 2 months ago
Alternatives and similar repositories for JoVA
Users that are interested in JoVA are comparing it to the libraries listed below
Sorting:
- RLHF for Video Diffusion Models☆26Jul 30, 2025Updated 7 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Oct 21, 2024Updated last year
- ☆41Oct 29, 2025Updated 4 months ago
- Adapting Self-Supervised Representations as a Latent Space for Efficient Generation☆40Oct 17, 2025Updated 5 months ago
- The official implementation of StereoPilot☆104Dec 19, 2025Updated 3 months ago
- [ICLR 2026] IVEBench - Benchmark for Instruction-Guided Video Editing☆71Jan 28, 2026Updated last month
- [CVPR 2026] Offical implementation of the paper "HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Pre…☆62Mar 3, 2026Updated 2 weeks ago
- [CVPR 2026] VideoCoF: Unified Video Editing with Temporal Reasoner☆158Feb 22, 2026Updated 3 weeks ago
- Official repo for paper "IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning"☆41Jan 29, 2026Updated last month
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆34Sep 25, 2025Updated 5 months ago
- ☆70Dec 5, 2025Updated 3 months ago
- 📝The official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"☆21Dec 2, 2025Updated 3 months ago
- ☆85Oct 10, 2025Updated 5 months ago
- Reinforcing Action Policies by Prophesying☆40Nov 26, 2025Updated 3 months ago
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆44Jan 27, 2026Updated last month
- [ICML 2025] Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts☆29Nov 10, 2025Updated 4 months ago
- An Empirical Study of GPT-4o Image Generation Capabilities☆29Apr 16, 2025Updated 11 months ago
- [AAAI 2026] Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing☆24Nov 20, 2025Updated 4 months ago
- AI Voice Cloning Desktop Application that runs locally on your computer and doesn't cost anything to run☆54Nov 26, 2025Updated 3 months ago
- [NeurIPS 2025]Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency☆79Sep 19, 2025Updated 6 months ago
- ☆18Apr 10, 2025Updated 11 months ago
- Official code for "DiffX: Guide Your Layout to Cross-Modal Generative Modeling"☆23Feb 20, 2025Updated last year
- Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation☆23Sep 24, 2025Updated 5 months ago
- 🍑 relsim: Relational Visual Similarity | pip install relsim 🌍 (CVPR 2026)☆70Feb 21, 2026Updated 3 weeks ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆173Mar 6, 2026Updated 2 weeks ago
- ☆18Oct 22, 2024Updated last year
- The official pytorch implementation of “Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization”.☆19May 22, 2025Updated 9 months ago
- https://github.com/xie-lab-ml/Golden-Noise-for-Diffusion-Models for ComfyUI☆18Dec 10, 2024Updated last year
- ☆25Jul 4, 2023Updated 2 years ago
- Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer☆16Nov 21, 2024Updated last year
- ComfyUI Workflow Collection | ComfyUI 工作流合集☆21Dec 6, 2024Updated last year
- [ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆107Feb 6, 2026Updated last month
- UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios☆121Dec 17, 2025Updated 3 months ago
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆32Jun 12, 2025Updated 9 months ago
- LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation☆38Mar 3, 2025Updated last year
- ☆30Mar 14, 2025Updated last year
- Repository to go along with the paper "Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines"☆10Mar 31, 2022Updated 3 years ago
- ☆13Jun 22, 2022Updated 3 years ago
- ☆10Jan 15, 2023Updated 3 years ago