Yuan-ManX / ai-multimodal-timeline
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. π₯
β33Updated last month
Alternatives and similar repositories for ai-multimodal-timeline:
Users that are interested in ai-multimodal-timeline are comparing it to the libraries listed below
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMsβ70Updated 2 months ago
- Implementation of the premier Text to Video model from OpenAIβ57Updated 2 months ago
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community β¦β58Updated this week
- β28Updated last month
- A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integratβ¦β63Updated 3 months ago
- The codes of Siggraph Asia 2024 paper "Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation"β40Updated last week
- Implementation of the text to video model LUMIERE from the paper: "A Space-Time Diffusion Model for Video Generation" by Google Researchβ51Updated 2 months ago
- Synthetic data generator for image, video and 3D modelsβ30Updated 5 months ago
- sd3 dreambooth lora training book, adapted from the diffusers docβ42Updated 7 months ago
- β80Updated 4 months ago
- faster parallel inference of mochi-1 video generation modelβ107Updated 3 weeks ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"β118Updated last month
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.β36Updated last month
- Video-Infinity generates long videos quickly using multiple GPUs without extra training.β169Updated 5 months ago
- OLA-VLM: Elevating Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024β45Updated last month
- Enhancement in Multimodal Representation Learning.β39Updated 10 months ago
- Implementation of "SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing"β84Updated last year
- β35Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocksβ15Updated 2 months ago
- Official implementation of MagicFace: Training-free Universal-Style Human Image Customized Synthesis.β58Updated 3 weeks ago
- Pytorch implementation of MIMO, Controllable Character Video Synthesis with Spatial Decomposed Modeling, from Alibaba Intelligence Groupβ129Updated 3 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.β47Updated this week
- (AAAI'25) Training-and-pormpt Free General Painterly Image Harmonization Using image-wise attention sharingβ54Updated last month
- [arXiv 2024] Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models"β¦β77Updated 2 months ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructionsβ127Updated 11 months ago
- β66Updated 3 months ago
- Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4β25Updated last year
- β18Updated 4 months ago
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".β135Updated this week