Yuan-ManX / ai-multimodal-timelineLinks
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. π₯
β37Updated 10 months ago
Alternatives and similar repositories for ai-multimodal-timeline
Users that are interested in ai-multimodal-timeline are comparing it to the libraries listed below
Sorting:
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexibleβ110Updated 4 months ago
- Live2Diff: A Pipeline that processes Live video streams by a uni-directional video Diffusion model.β199Updated last year
- Video-Infinity generates long videos quickly using multiple GPUs without extra training.β188Updated last year
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructionsβ132Updated last year
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".β195Updated 9 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.β50Updated 9 months ago
- β195Updated last year
- A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integratβ¦β65Updated last year
- Controllable Animation Video Generation with Large Models-based Multimodal Agentsβ217Updated last month
- Implementation of the premier Text to Video model from OpenAIβ56Updated last year
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community β¦β58Updated this week
- β208Updated last year
- Enhancement in Multimodal Representation Learning.β40Updated last year
- [arXiv] On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devicesβ128Updated 2 weeks ago
- Official Implementation for paper: Negative Token Merging: Image-based Adversarial Feature Guidanceβ75Updated 5 months ago
- Official PyTorch implementation of TokenSet.β127Updated 8 months ago
- β35Updated 10 months ago
- faster parallel inference of mochi-1 video generation modelβ125Updated 9 months ago
- β93Updated 9 months ago
- β86Updated last year
- β69Updated last year
- β56Updated last year
- Community ComfyUI workflows running on fal.aiβ57Updated last year
- β188Updated 6 months ago
- Implementation of the text to video model LUMIERE from the paper: "A Space-Time Diffusion Model for Video Generation" by Google Researchβ52Updated 10 months ago
- Official implementation of MagicFace: Training-free Universal-Style Human Image Customized Synthesis.β65Updated 11 months ago
- official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (COLM 2024)β176Updated last year
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"β130Updated last year
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editingβ69Updated last year
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generationβ144Updated last year