Visual-AI/JoVA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Visual-AI/JoVA)

Visual-AI / JoVA

JoVA: Unified Multimodal Learning for Joint Video-Audio Generation

☆33

Alternatives and similar repositories for JoVA

Users that are interested in JoVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Visual-AI / Dissect-OOD-OSR
View on GitHub
[IJCV 2024] Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks
☆15Aug 30, 2024Updated last year
Visual-AI / SCD
View on GitHub
[CVPRW2024] What’s in a Name? Beyond Class Indices for Image Recognition
☆17Aug 30, 2024Updated last year
Visual-AI / HiLo
View on GitHub
[ICLR2025] HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts
☆22Aug 1, 2025Updated 11 months ago
Visual-AI / PromptCCD
View on GitHub
[ECCV2024] PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery
☆31Apr 3, 2025Updated last year
Visual-AI / Semantic-Correspondence
View on GitHub
[TPAMI 2025] Semantic Correspondence: Unified Benchmarking and a Strong Baseline
☆20Dec 11, 2025Updated 7 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Visual-AI / HypCD
View on GitHub
[CVPR 2025] Hyperbolic Category Discovery
☆32Nov 7, 2025Updated 8 months ago
Visual-AI / SPTNet
View on GitHub
[ICLR2024] SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning
☆36Apr 9, 2025Updated last year
Visual-AI / PruneVid
View on GitHub
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆71May 15, 2025Updated last year
CompVis / RepTok
View on GitHub
[ICLR 2026] Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
☆59Apr 24, 2026Updated 2 months ago
Visual-AI / 3DRS
View on GitHub
[NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
☆158Dec 9, 2025Updated 7 months ago
Phantom-video / LibraGen
View on GitHub
☆17Mar 19, 2026Updated 4 months ago
Visual-AI / Fin3R
View on GitHub
[NeurIPS 2025] Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation
☆64Dec 18, 2025Updated 7 months ago
Visual-AI / GAMEBoT
View on GitHub
[ACL 2025] GAMEBoT: Transparent Assessment of LLM Reasoning in Games
☆33May 15, 2026Updated 2 months ago
Visual-AI / Awesome-Semantic-Correspondence
View on GitHub
A collection of papers on semantic correspondence, organized by year.
☆32Dec 10, 2025Updated 7 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Visual-AI / FROSTER
View on GitHub
[ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition
☆101Jan 14, 2025Updated last year
Visual-AI / RegionDrag
View on GitHub
[ECCV2024] RegionDrag: Fast Region-Based Image Editing with Diffusion Models
☆67Oct 9, 2024Updated last year
j0seo / lookahead-anchoring
View on GitHub
☆15Oct 27, 2025Updated 8 months ago
tanABCC / VABench
View on GitHub
☆16Jul 8, 2026Updated last week
fudan-generative-vision / MixFlow
View on GitHub
[CVPR 2026] MixFlow Training: Alleviating Exposure Bias with Slowed Interpolation Mixture
☆21Dec 23, 2025Updated 6 months ago
sjtuplayer / Harmony
View on GitHub
Audio-video joint generation
☆58Nov 27, 2025Updated 7 months ago
Visual-AI / SPoT
View on GitHub
Official code for paper "Surgical Post-Training: Cutting Errors, Keeping Knowledge"
☆19Jun 16, 2026Updated last month
OpenVE-Team / OpenVE-3M
View on GitHub
OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
☆51Apr 15, 2026Updated 3 months ago
arielshaulov / TokenTrim
View on GitHub
Official implementation of the paper "TOKENTRIM: INFERENCE-TIME TOKEN PRUNING FOR AUTOREGRESSIVE LONG VIDEO GENERATION"
☆15Feb 8, 2026Updated 5 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Visual-AI / Inpaint4Drag
View on GitHub
[ICCV 2025] Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping
☆94Nov 30, 2025Updated 7 months ago
Fantasy-AMAP / fantasy-talking2
View on GitHub
[AAAI 2026] FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation
☆66Aug 20, 2025Updated 11 months ago
Visual-AI / DebGCD
View on GitHub
[ICLR 2025] DebGCD: Debiased Learning with Distribution Guidance for Generalized Category Discovery
☆16Sep 27, 2025Updated 9 months ago
jinhong-ni / UniPano
View on GitHub
[ICCV 2025] Official implementation of "What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?"
☆21Aug 7, 2025Updated 11 months ago
UnicomAI / CoTj
View on GitHub
CoTj (Chain-of-Trajectories) upgrades diffusion models from fixed System-1 denoising schedules to System-2 style, condition-adaptive traj…
☆23Mar 24, 2026Updated 3 months ago
sgvaze / SSB
View on GitHub
Python package to download and use the SSB datasets
☆11Aug 3, 2023Updated 2 years ago
SAIS-FUXI / IPO
View on GitHub
☆58May 6, 2025Updated last year
GeekGuru123 / ProfilingDiT
View on GitHub
☆20Jan 1, 2026Updated 6 months ago
xie-lab-ml / Lightning-Unified-Video-Editor-via-In-Context-Sparse-Attention
View on GitHub
[ICML 2026] The official code for our work "LIVEditor-14B: Lightning Unified Video Editor via In-Context Sparse Attention".
☆38May 15, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
bcmi / Granular-GRPO
View on GitHub
[CVPR 2026] Fine-Grained GRPO for Precise Preference Alignment in Flow Models
☆64Jun 1, 2026Updated last month
Tonniia / Zero2Hero
View on GitHub
[AAAI 2026] Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing
☆24Nov 20, 2025Updated 8 months ago
Owen718 / LongPrompt-LLamaGen
View on GitHub
This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…
☆30Oct 21, 2024Updated last year
wangf3014 / VTok
View on GitHub
Official implementation of VTok: A Unified Video Tokenizer with Decoupled Spatial-Temporal Latents
☆15Feb 5, 2026Updated 5 months ago
thunderbolt215 / UniPercept
View on GitHub
[ICML2026 Spotlight] UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
☆155Jul 13, 2026Updated last week
Jia1018 / GoHD
View on GitHub
The official code for paper: GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expressions…
☆25May 28, 2026Updated last month
baaivision / URSA
View on GitHub
[ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation
☆123May 20, 2026Updated 2 months ago