EvolvingLMMs-Lab / Aero-1
☆41Updated this week
Alternatives and similar repositories for Aero-1:
Users that are interested in Aero-1 are comparing it to the libraries listed below
- Official PyTorch implementation of TokenSet.☆116Updated last month
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆71Updated last week
- ☆74Updated last month
- An official implementation of SwapAnyone.☆59Updated last month
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆57Updated last month
- ☆40Updated 3 weeks ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆75Updated 4 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆50Updated 4 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆58Updated 2 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆68Updated 7 months ago
- ☆91Updated 3 weeks ago
- ☆74Updated 7 months ago
- ☆22Updated 4 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆38Updated 10 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆68Updated 6 months ago
- Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆139Updated 2 weeks ago
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆109Updated 2 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆51Updated last week
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆99Updated 2 weeks ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆34Updated last month
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆93Updated last week
- Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆61Updated 3 weeks ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 9 months ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers"☆64Updated last month
- ☆32Updated 3 months ago
- Collection of scripts to build small-scale datasets for fine-tuning video generation models.☆53Updated last month
- HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆57Updated 2 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆33Updated 10 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆48Updated 2 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆65Updated 2 weeks ago