lichao-sun / Mora
Mora: More like Sora for Generalist Video Generation
☆1,474Updated 2 months ago
Related projects: ⓘ
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆2,007Updated 4 months ago
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,020Updated last month
- [ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)☆1,651Updated last week
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,175Updated 4 months ago
- Latte: Latent Diffusion Transformer for Video Generation.☆1,637Updated last week
- ☆2,395Updated this week
- Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models☆2,889Updated 2 months ago
- GPT4V-level open-source multi-modal model based on Llama3-8B☆1,976Updated 2 weeks ago
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,182Updated 2 months ago
- [ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors☆2,401Updated last week
- StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text☆1,342Updated 2 weeks ago
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and…☆1,786Updated last week
- MiniSora: A community aims to explore the implementation path and future development direction of Sora.☆1,159Updated last week
- Character Animation (AnimateAnyone, Face Reenactment)☆3,078Updated 3 months ago
- Mixture-of-Experts for Large Vision-Language Models☆1,911Updated 4 months ago
- InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥☆1,597Updated 2 months ago
- VideoSys: An easy and efficient system for video generation☆1,633Updated this week
- Video-LLaVA: Learning United Visual Representation by Alignment Before Projection☆2,846Updated last month
- PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis☆2,688Updated last month
- MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation☆2,108Updated last month
- Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment☆1,619Updated this week
- A general fine-tuning kit geared toward diffusion models.☆1,534Updated this week
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,683Updated last week
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,317Updated 2 months ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,756Updated last month
- MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising☆2,318Updated 2 months ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆955Updated last month
- Official implementation of DreaMoving☆1,790Updated 8 months ago
- Create Magic Story!☆5,787Updated last month
- 📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion☆1,155Updated 3 weeks ago