Fr0zenCrane / CockatielLinks
The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"
☆37Updated 3 months ago
Alternatives and similar repositories for Cockatiel
Users that are interested in Cockatiel are comparing it to the libraries listed below
Sorting:
- ☆126Updated 3 months ago
- Video dataset dedicated to portrait-mode video recognition.☆52Updated 9 months ago
- ☆115Updated this week
- [CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.☆70Updated 5 months ago
- Official model implementation and benchmark evaluation repository of <AnyEdit: Unified High-Quality Image Edit with Any Idea>☆28Updated 2 months ago
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆80Updated last year
- [ICML 2025] EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM☆65Updated 2 months ago
- ☆50Updated 9 months ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆48Updated last year
- An Efficient Text-to-Image Generation Pretrain Pipeline☆117Updated 5 months ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆83Updated 4 months ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆85Updated last year
- [CVPR 2025 AI4CC Workshop] Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editin…☆33Updated 4 months ago
- ☆78Updated 6 months ago
- ☆129Updated 2 months ago
- Chinese-native image generation while compatible with SD eco-system, 1st-gen, AAAI2025☆13Updated last year
- [CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆47Updated 2 months ago
- ☆47Updated 4 months ago
- An official implementation of EvoSearch: Scaling Image and Video Generation via Test-Time Evolutionary Search☆91Updated last month
- ☆31Updated 2 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated last year
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆79Updated 4 months ago
- ☆156Updated 8 months ago
- HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆64Updated 7 months ago
- The HD-VG-130M Dataset☆120Updated last year
- [CVPR 2025] PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation☆40Updated 2 months ago
- (ICCV2025) EEdit⚡: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing☆52Updated this week
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆163Updated 11 months ago
- ☆106Updated last year
- ☆54Updated 4 months ago