Yuan-ManX / ai-multimodal-timeline
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. 🔥
☆35Updated 3 months ago
Alternatives and similar repositories for ai-multimodal-timeline
Users that are interested in ai-multimodal-timeline are comparing it to the libraries listed below
Sorting:
- A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integrat…☆63Updated 7 months ago
- Implementation of the premier Text to Video model from OpenAI☆57Updated 6 months ago
- ☆22Updated 4 months ago
- Live2Diff: A Pipeline that processes Live video streams by a uni-directional video Diffusion model.☆184Updated 9 months ago
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community …☆60Updated last week
- ☆32Updated 3 months ago
- Community ComfyUI workflows running on fal.ai☆57Updated 8 months ago
- Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆63Updated last month
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆48Updated 3 months ago
- Scripts to teach Flux the task of image editing from language with the Flux Control framework.☆74Updated last month
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆129Updated last year
- sd3 dreambooth lora training book, adapted from the diffusers doc☆45Updated 11 months ago
- An official implementation of SwapAnyone.☆60Updated 2 months ago
- ☆83Updated 8 months ago
- Official implementation of MagicFace: Training-free Universal-Style Human Image Customized Synthesis.☆62Updated 4 months ago
- faster parallel inference of mochi-1 video generation model☆119Updated 2 months ago
- Inference-time scaling of diffusion-based image and video generation models.☆143Updated 2 months ago
- [IJCAI 2025] Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models"…☆85Updated this week
- The codes of Siggraph Asia 2024 paper "Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation"☆53Updated 3 weeks ago
- Pusa: Thousands Timesteps Video Diffusion Model☆166Updated 3 weeks ago
- ☆70Updated 7 months ago
- Fashion-VDM: Video Diffusion Model for Virtual Try-On☆20Updated 6 months ago
- Official PyTorch implementation of TokenSet.☆118Updated last month
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆109Updated 2 months ago
- ☆19Updated 8 months ago
- LVAS-Agent Code Base☆15Updated last month
- Recaption large (Web)Datasets with vllm and save the artifacts.☆52Updated 5 months ago
- The official implementation of ”RepVideo: Rethinking Cross-Layer Representation for Video Generation“☆117Updated 3 months ago
- Collection of scripts to build small-scale datasets for fine-tuning video generation models.☆55Updated last month
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".☆162Updated 2 months ago