Yuan-ManX / ai-multimodal-timeline
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. π₯
β31Updated this week
Related projects β
Alternatives and complementary repositories for ai-multimodal-timeline
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMsβ62Updated 3 weeks ago
- β21Updated this week
- β120Updated 2 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"β94Updated this week
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community β¦β55Updated this week
- The codes of Siggraph Asia 2024 paper "Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation"β33Updated 2 months ago
- Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models".β68Updated 2 weeks ago
- Live2Diff: A Pipeline that processes Live video streams by a uni-directional video Diffusion model.β167Updated 3 months ago
- faster parallel inference of mochi-1 video generation modelβ73Updated last week
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.β18Updated this week
- A Training-free Iterative Framework for Long Story Visualizationβ62Updated this week
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.β43Updated 2 weeks ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understandingβ38Updated last month
- A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integratβ¦β63Updated last month
- IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generationβ138Updated 3 weeks ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"β38Updated 7 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMsβ38Updated 4 months ago
- Implementation of the premier Text to Video model from OpenAIβ57Updated last week
- Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"β60Updated last month
- Recaption large (Web)Datasets with vllm and save the artifacts.β30Updated last month
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2β99Updated this week
- Video-Infinity generates long videos quickly using multiple GPUs without extra training.β164Updated 3 months ago
- Training-and-pormpt Free General Painterly Image Harmonization Using image-wise attention sharingβ52Updated 5 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasksβ25Updated last month
- sd3 dreambooth lora training book, adapted from the diffusers docβ42Updated 5 months ago
- β78Updated 3 months ago
- β145Updated 2 months ago
- FLUX.1-dev LoRA Outfit Generator can create an outfit by detailing the color, pattern, fit, style, material, and type.β42Updated 2 weeks ago
- Pytorch implementation of MIMO, Controllable Character Video Synthesis with Spatial Decomposed Modeling, from Alibaba Intelligence Groupβ127Updated last month
- Video-LlaVA fine-tune for CinePile evaluationβ38Updated 3 months ago