[NeurIPS'25 Spotlight] Official implementation of "JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation"
☆69Feb 26, 2026Updated last week
Alternatives and similar repositories for JavisGPT
Users that are interested in JavisGPT are comparing it to the libraries listed below
Sorting:
- Animate Any Character in Any World☆90Jan 9, 2026Updated last month
- [AAAI 2026] UltraGen☆77Feb 1, 2026Updated last month
- DreamStyle: A Unified Framework for Video Stylization☆109Jan 7, 2026Updated last month
- Resilient multi-LLM orchestration with in-built failure handing, rate limits, retries, and circuit breaker.☆29Updated this week
- ☆86Feb 4, 2026Updated last month
- Official repo for paper "IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning"☆40Jan 29, 2026Updated last month
- ☆36Dec 16, 2025Updated 2 months ago
- SpotEdit:Selective Region Editing in Diffusion Transformers☆173Jan 5, 2026Updated 2 months ago
- A Unified Visual Generator with Interleaved OmniModal Context☆192Feb 10, 2026Updated 3 weeks ago
- Any-to-Bokeh is a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh e…☆124Feb 4, 2026Updated last month
- [CVPR 2026] SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time☆99Jan 1, 2026Updated 2 months ago
- End2End Virtual Try-on with Visual Reference, CVPR2026☆58Nov 19, 2025Updated 3 months ago
- Official repository for the paper "MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars"☆41Nov 20, 2025Updated 3 months ago
- DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer☆546Jan 13, 2026Updated last month
- Audio-video joint generation☆56Nov 27, 2025Updated 3 months ago
- D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI [ICLR 2026]☆72Jan 15, 2026Updated last month
- [CVPR 2026] OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer☆224Feb 21, 2026Updated last week
- ☆29May 7, 2025Updated 9 months ago
- RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing☆58Dec 26, 2025Updated 2 months ago
- Official PyTorch Implementation of "SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model Without Variational Autoencoder".☆134Dec 18, 2025Updated 2 months ago
- Official codes for the paper "GARDO: Reinforcing Diffusion Models without Reward Hacking"☆56Feb 2, 2026Updated last month
- ☆321Jan 24, 2026Updated last month
- [SIGGRAPH Asia 2025] Official Implementation of "ConsistEdit: Highly Consistent and Precise Training-free Visual Editing"☆69Dec 2, 2025Updated 3 months ago
- ComfyUI workflows☆87Dec 19, 2025Updated 2 months ago
- [ICLR 2026] NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks☆136Oct 20, 2025Updated 4 months ago
- Scaling Zero-Shot Reference-to-Video Generation☆63Dec 11, 2025Updated 2 months ago
- android_device_moto_wingray☆11May 11, 2016Updated 9 years ago
- a guide to grapheme-to-phoneme conversion and phoneme list for ace singing voice synthesis engine☆42Jan 17, 2025Updated last year
- ☆21Dec 14, 2025Updated 2 months ago
- Awesome curated collection of images and prompts generated by z-image-turbo state-of-the-art open-source image generation and editing mod…☆181Dec 25, 2025Updated 2 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆214Oct 12, 2025Updated 4 months ago
- Official Pytorch Implementation for "Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising"☆340Feb 8, 2026Updated 3 weeks ago
- Inference server for MioTTS, a lightweight and fast LLM-based TTS model.☆103Feb 14, 2026Updated 2 weeks ago
- Official Implementation of DRA-Ctrl (Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis)☆118Aug 15, 2025Updated 6 months ago
- A free and open-source focus stacking software that supports multi-focus image alignment and fusion.☆20Feb 5, 2026Updated last month
- [ICLR 2026] LongLive: Real-time Interactive Long Video Generation☆1,077Feb 26, 2026Updated last week
- [MM'22 Oral] AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation☆11Apr 3, 2023Updated 2 years ago
- ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands☆97Feb 6, 2026Updated last month
- [ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models☆74Jan 29, 2026Updated last month