LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.
☆39Jun 20, 2024Updated last year
Alternatives and similar repositories for LLaVA-Magvit2
Users that are interested in LLaVA-Magvit2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆40Jun 22, 2024Updated last year
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Sep 9, 2024Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizers☆1,002Nov 25, 2025Updated 4 months ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 6 months ago
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.☆323Jul 9, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆33Jun 29, 2023Updated 2 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023☆27Apr 27, 2023Updated 2 years ago
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 7 months ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆182Jun 20, 2024Updated last year
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- Official repo for Discriminator Guidance for ImageNet256.☆13Apr 27, 2023Updated 2 years ago
- Code release for the paper "Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control"☆17Apr 9, 2024Updated 2 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆82Jun 7, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- real-time speech enhance☆17Jan 23, 2024Updated 2 years ago
- Keypoint dataset for airplane☆10Dec 28, 2019Updated 6 years ago
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models☆30Feb 4, 2026Updated 2 months ago
- Using Llam.cpp and onnxruntime to accelerate inference of GOT-OCR2.0☆15Mar 6, 2025Updated last year
- [AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.☆21Jan 22, 2026Updated 2 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆57Oct 31, 2023Updated 2 years ago
- 单独维护的中文TTS☆34Oct 28, 2022Updated 3 years ago
- The official code release for Unsupervised Out-of-distribution Detection with Diffusion Inpainting (ICML 2023)☆28Aug 16, 2023Updated 2 years ago
- Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset☆27May 30, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆119May 19, 2025Updated 11 months ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆90Dec 20, 2024Updated last year
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 7 months ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆194Jul 12, 2024Updated last year
- Ultrafast GAN based Vocoder for Text to Speech☆50Jul 16, 2022Updated 3 years ago
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆77Jun 9, 2023Updated 2 years ago
- STAR: Scale-wise Text-to-image generation via Auto-Regressive representations☆150Feb 19, 2025Updated last year
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆23Oct 15, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Implementation of Unsupervised Pixel–Level Domain Adaptation with Generative Adversarial Networks by Google☆16Jan 10, 2017Updated 9 years ago
- A collections of audio codecs with a standardized API☆39Updated this week
- [AAAI 2025] Does VLM Classification Benefit from LLM Description Semantics?☆25Aug 5, 2025Updated 8 months ago
- Simple MoE - Day 17 of 365 Days of Repos☆18Jan 17, 2025Updated last year
- ☆19Mar 22, 2024Updated 2 years ago
- ☆112Jan 8, 2025Updated last year
- Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model☆13Dec 29, 2024Updated last year