Code for D-DiT
☆61Apr 1, 2025Updated 11 months ago
Alternatives and similar repositories for Dual-Diffusion
Users that are interested in Dual-Diffusion are comparing it to the libraries listed below
Sorting:
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆178Feb 24, 2026Updated 2 weeks ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆187May 21, 2025Updated 9 months ago
- 百度地图坐标拾取工具☆12Jan 27, 2018Updated 8 years ago
- PostgreSQL SKILLs for AI Agent☆26Feb 5, 2026Updated last month
- PyTorch implementation for our paper "Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation"☆13Apr 19, 2023Updated 2 years ago
- Unsupervised muti-metric fusion for Full-Reference (FR) Image Quality Assessment (IQA)☆11Jul 11, 2014Updated 11 years ago
- rsbuild svg loader☆13Nov 11, 2024Updated last year
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Mar 1, 2025Updated last year
- VST that combines the classic mdaPiano and EPiano in a new plug-in☆21Oct 10, 2025Updated 4 months ago
- Code for the paper "FastAdaSP: An Efficient Multitask Inference Framework for Large Speech Language Models". @ EMNLP'24(Oral)☆17Nov 14, 2024Updated last year
- This repo has scripts to compare various powerful RL methods☆39Feb 23, 2026Updated 2 weeks ago
- The IP-Adapter training scripts and inference for Flux Model, which is implemented based on X-Lab☆17Oct 1, 2024Updated last year
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆198Jan 7, 2026Updated 2 months ago
- [ICLR 2026] SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs☆44Oct 14, 2025Updated 4 months ago
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Jul 11, 2024Updated last year
- [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding☆513Nov 14, 2025Updated 3 months ago
- This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehe…☆126Jan 29, 2026Updated last month
- ☆45Dec 6, 2025Updated 3 months ago
- UniVid: The Open-Source Unified Video Model☆30Oct 13, 2025Updated 4 months ago
- ☆190Dec 17, 2024Updated last year
- 🍪 青龙助手:自动同步网站Cookie到青龙面板的Chrome扩展,支持多网站配置和完整的环境变量管理。[qnloft出品]☆52Dec 31, 2025Updated 2 months ago
- RL-based Legged-Wheeled Robot locomotion sim-to-real based on NVIDIA Isaac Lab☆55Updated this week
- EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory☆61Jan 13, 2026Updated last month
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆234Jan 22, 2026Updated last month
- Official PyTorch implementation for "Effective and Efficient Masked Image Generation Models"☆31Apr 8, 2025Updated 11 months ago
- To pioneer training long-context multi-modal transformer models☆70Aug 8, 2025Updated 7 months ago
- Efficient Multi-Vehicle Trajectory Planning via Centralized Searching Decentralized Optimization☆26Jan 16, 2025Updated last year
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆133Dec 18, 2025Updated 2 months ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Jul 25, 2023Updated 2 years ago
- WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction☆62Sep 3, 2025Updated 6 months ago
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis☆131May 16, 2025Updated 9 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆201Sep 18, 2025Updated 5 months ago
- [CVPR 2026] "GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation"☆59Dec 17, 2025Updated 2 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- UniDisc: A discrete diffusion model for joint multimodal generation, enabling controllable and efficient text-image synthesis, editing, a…☆134Apr 2, 2025Updated 11 months ago
- An official implementation of EvoSearch: Scaling Image and Video Generation via Test-Time Evolutionary Search☆100Oct 3, 2025Updated 5 months ago
- Official repository for “PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss”☆210Feb 3, 2026Updated last month
- SwooleWebRTC☆27Apr 3, 2020Updated 5 years ago
- [CVPR 2025] Implementation of "Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models"☆36Apr 28, 2025Updated 10 months ago