lyogavin / train_your_own_sora
☆177Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for train_your_own_sora
- VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024☆261Updated 7 months ago
- LLaVA-Interactive-Demo☆352Updated 3 months ago
- [CVPR2024] Make Your Dream A Vlog☆416Updated 8 months ago
- Multimodal Models in Real World☆404Updated 3 weeks ago
- ☆282Updated 2 weeks ago
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraini…☆503Updated 3 months ago
- Official repository for the paper PLLaVA☆594Updated 3 months ago
- Official implementation of the ECCV paper "SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing"☆232Updated last month
- ☆141Updated 4 months ago
- Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope…☆213Updated 3 months ago
- We're back! Implementations of Meissonic developed by Community~If you feel it is helpful, plz consider giving a star❤️☆252Updated this week
- Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch☆250Updated 3 months ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆126Updated 9 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆371Updated 2 months ago
- Memory optimized finetuning scripts for CogVideoX using TorchAO and DeepSpeed☆415Updated this week
- ☆254Updated 3 months ago
- Code repository for T2V-Turbo and T2V-Turbo-v2☆251Updated last month
- ☆145Updated 3 months ago
- [NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".☆311Updated 5 months ago
- Official PyTorch implementation for the paper "AnimateZero: Video Diffusion Models are Zero-Shot Image Animators"☆351Updated 11 months ago
- [ICLR 2024] Code for FreeNoise based on VideoCrafter☆387Updated 4 months ago
- ☆146Updated last month
- Code for instruction-tuning Stable Diffusion.☆212Updated 9 months ago
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆435Updated 2 months ago
- Data release for the ImageInWords (IIW) paper.☆201Updated this week
- ☆197Updated 10 months ago
- MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation☆194Updated 4 months ago
- Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models☆302Updated 10 months ago
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal …☆362Updated 11 months ago
- [ECCV 2024] FreeInit: Bridging Initialization Gap in Video Diffusion Models☆492Updated 10 months ago