Yuan-ManX / ai-multimodal-timeline
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. π₯
β35Updated 2 months ago
Alternatives and similar repositories for ai-multimodal-timeline:
Users that are interested in ai-multimodal-timeline are comparing it to the libraries listed below
- The codes of Siggraph Asia 2024 paper "Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation"β53Updated 2 months ago
- Implementation of the premier Text to Video model from OpenAIβ57Updated 5 months ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructionsβ129Updated last year
- β83Updated 8 months ago
- Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexibleβ61Updated last week
- Official implementation of MagicFace: Training-free Universal-Style Human Image Customized Synthesis.β62Updated 4 months ago
- β62Updated this week
- Synthetic data generator for image, video and 3D modelsβ29Updated 8 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"β66Updated 6 months ago
- β22Updated 4 months ago
- A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integratβ¦β63Updated 6 months ago
- β69Updated 6 months ago
- Community ComfyUI workflows running on fal.aiβ57Updated 7 months ago
- Fine-tune of Florence-2 for shot categorization.β24Updated last month
- β9Updated last year
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community β¦β60Updated last week
- Official PyTorch implementation of TokenSet.β114Updated last month
- β32Updated 3 months ago
- [arXiv] On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devicesβ115Updated 2 months ago
- Fashion-VDM: Video Diffusion Model for Virtual Try-Onβ20Updated 5 months ago
- DiT for VAE (and Video Generation)β32Updated 7 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.β48Updated 2 months ago
- β25Updated last year
- Inference-time scaling of diffusion-based image and video generation models.β138Updated last month
- [arXiv 2024] Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models"β¦β85Updated 5 months ago
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.β42Updated last month
- A minimalistic, hackable code base to finetune Wan video generation modelβ39Updated last week
- β36Updated 7 months ago
- β30Updated last year
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024β58Updated 2 months ago