lucasjinreal / LLaVA-Magvit2
LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.
☆35Updated 9 months ago
Alternatives and similar repositories for LLaVA-Magvit2:
Users that are interested in LLaVA-Magvit2 are comparing it to the libraries listed below
- LMM solved catastrophic forgetting, AAAI2025☆39Updated 4 months ago
- Video dataset dedicated to portrait-mode video recognition.☆44Updated 3 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆80Updated last month
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated 11 months ago
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆66Updated 5 months ago
- Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing☆23Updated 3 months ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆26Updated 2 months ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆57Updated last week
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆30Updated 8 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 8 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆45Updated 4 months ago
- ☆70Updated last week
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆64Updated 4 months ago
- Towards training VQ-VAE models robustly!☆57Updated 2 months ago
- Inference-only implementation of "One-Step Diffusion Distillation through Score Implicit Matching" [NIPS 2024]☆77Updated 4 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 8 months ago
- Minimal Differentiable Image Reward Functions☆51Updated 2 weeks ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆46Updated 6 months ago
- Keras implement of Finite Scalar Quantization☆71Updated last year
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆48Updated 5 months ago
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆29Updated 3 months ago
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆79Updated 11 months ago
- Code Release for the paper "Make-A-Story: Visual Memory Conditioned Consistent Story Generation" in CVPR 2023☆39Updated last year
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆83Updated 8 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆16Updated last month