AILab-CVC / SEED-X
Multimodal Models in Real World
☆493Updated 2 months ago
Alternatives and similar repositories for SEED-X:
Users that are interested in SEED-X are comparing it to the libraries listed below
- Official implementation of SEED-LLaMA (ICLR 2024).☆610Updated 7 months ago
- ☆225Updated 9 months ago
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraini…☆588Updated 3 weeks ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆433Updated 7 months ago
- ☆358Updated 6 months ago
- GenEval: An object-focused framework for evaluating text-to-image alignment☆246Updated last month
- Official repository for the paper PLLaVA☆647Updated 8 months ago
- ☆480Updated 4 months ago
- Code repository for T2V-Turbo and T2V-Turbo-v2☆298Updated 2 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆577Updated 6 months ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers☆595Updated 6 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆434Updated 4 months ago
- [ICLR 2025] Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance☆256Updated last week
- ☆369Updated last month
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models☆278Updated last year
- [ICLR'25] MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences☆296Updated 8 months ago
- [ICLR 2024] Code for FreeNoise based on VideoCrafter☆405Updated 9 months ago
- ☆175Updated 9 months ago
- [NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".☆348Updated 2 months ago
- SCEPTER is an open-source framework used for training, fine-tuning, and inference with generative models.☆509Updated 3 weeks ago
- Long Context Transfer from Language to Vision☆373Updated last month
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆477Updated this week
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆135Updated 3 months ago
- [CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models☆246Updated 4 months ago
- Evaluating text-to-image/video/3D models with VQAScore☆284Updated last month
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆223Updated last year
- [NeurIPS 2024] VideoTetris: Towards Compositional Text-To-Video Generation☆218Updated 5 months ago
- A Unified Tokenizer for Visual Generation and Understanding☆262Updated last week
- Pandora: Towards General World Model with Natural Language Actions and Video States☆502Updated 7 months ago
- ☆227Updated last year