lucasjinreal / LLaVA-Magvit2
☆35Updated 6 months ago
Alternatives and similar repositories for LLaVA-Magvit2:
Users that are interested in LLaVA-Magvit2 are comparing it to the libraries listed below
- LMM which strictly superset LLM embedded☆37Updated 2 months ago
- Video dataset dedicated to portrait-mode video recognition.☆41Updated last month
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 6 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆38Updated 9 months ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆47Updated last week
- ☆27Updated last year
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆62Updated 2 months ago
- VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation☆95Updated last week
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆22Updated 2 weeks ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation☆40Updated last month
- Inference-only implementation of "One-Step Diffusion Distillation through Score Implicit Matching" [NIPS 2024]☆75Updated 2 months ago
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆59Updated 3 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆30Updated 6 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆20Updated 5 months ago
- Keras implement of Finite Scalar Quantization☆68Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆28Updated 3 months ago
- [AAAI 2025] Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models. RCDMs improve story generation…☆22Updated 3 weeks ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆50Updated 5 months ago
- ☆128Updated last month
- Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing☆21Updated last month
- code based for rectified flow☆30Updated 2 weeks ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆33Updated 6 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆57Updated 2 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆46Updated 5 months ago
- ☆64Updated last month
- ☆112Updated 6 months ago
- The official repo of continuous speculative decoding☆20Updated last month
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆56Updated 2 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆45Updated 3 months ago