haoningwu3639 / SimpleSDM-VideoLinks
A simple and flexible PyTorch implementation of Video StableDiffusion (ZeroScope_v2) based on diffusers.
☆18Updated last year
Alternatives and similar repositories for SimpleSDM-Video
Users that are interested in SimpleSDM-Video are comparing it to the libraries listed below
Sorting:
- A simple and flexible PyTorch implementation of StableDiffusion based on diffusers.☆23Updated 9 months ago
- A simple and flexible PyTorch implementation of StableDiffusion-3 based on diffusers for DIY and finetuning.☆20Updated last month
- Training code for CLIP-FlanT5☆26Updated 11 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 8 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆102Updated 3 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆62Updated 8 months ago
- VisualQuality-R1 is the first open-sourced NR-IQA model can accurately describe and rate the image quality.☆50Updated 3 weeks ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆96Updated last month
- [NeurIPS 2024] SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow☆31Updated 6 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆24Updated last month
- Official implementation of "STAR: Scale-wise Text-to-image generation via Auto-Regressive representations"☆34Updated 3 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆85Updated 3 weeks ago
- MRGen: Segmentation Data Engine for Underrepresented MRI Modalities☆19Updated last month
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆31Updated 7 months ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆85Updated 11 months ago
- ☆18Updated 6 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆36Updated 4 months ago
- Autoregressive Image Generation with Randomized Parallel Decoding☆67Updated 2 months ago
- official training and inference code of bitwise tokenizer☆30Updated last month
- ☆32Updated 5 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆126Updated last month
- Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models☆16Updated last month
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆32Updated 2 months ago
- Unified layout planning and image generation, ICCV2025☆24Updated 2 months ago
- FQGAN: Factorized Visual Tokenization and Generation☆49Updated 3 months ago
- ☆15Updated 7 months ago
- CAR: Controllable AutoRegressive Modeling for Visual Generation☆120Updated 7 months ago
- Official GitHub repository for the Text-Guided Video Editing (TGVE) competition of LOVEU Workshop @ CVPR'23.☆76Updated last year
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆60Updated 4 months ago
- Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection☆32Updated this week