EvolvingLMMs-Lab / Aero-1Links
☆72Updated last month
Alternatives and similar repositories for Aero-1
Users that are interested in Aero-1 are comparing it to the libraries listed below
Sorting:
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆46Updated 9 months ago
- [AAAI 2025] VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization☆50Updated 6 months ago
- An official implementation of SwapAnyone.☆62Updated 3 months ago
- ☆102Updated this week
- ☆21Updated 3 months ago
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆75Updated 2 months ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆74Updated last week
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆68Updated 2 weeks ago
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆113Updated 3 months ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆34Updated 5 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆69Updated 8 months ago
- LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models☆50Updated 2 weeks ago
- ☆76Updated 3 months ago
- Official implementation of "JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization"☆66Updated 2 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated last year
- Official PyTorch implementation of TokenSet.☆121Updated 3 months ago
- Music production for silent film clips.☆25Updated last month
- ☆61Updated last week
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆112Updated last week
- Collection of scripts to build small-scale datasets for fine-tuning video generation models.☆62Updated 3 months ago
- Blending Custom Photos with Video Diffusion Transformers☆47Updated 5 months ago
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆164Updated 3 months ago
- ☆88Updated this week
- Pusa: Thousands Timesteps Video Diffusion Model☆199Updated this week
- TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes☆66Updated 2 months ago
- ☆45Updated 2 weeks ago
- LVAS-Agent Code Base☆20Updated 2 months ago
- Official implementation of MagicFace: Training-free Universal-Style Human Image Customized Synthesis.☆63Updated 6 months ago
- [NeurIPS 2024 Spotlight] The official implement of research paper "MotionBooth: Motion-Aware Customized Text-to-Video Generation"☆133Updated 8 months ago
- Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆162Updated 2 months ago