LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.
☆39Jun 20, 2024Updated last year
Alternatives and similar repositories for LLaVA-Magvit2
Users that are interested in LLaVA-Magvit2 are comparing it to the libraries listed below
Sorting:
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆41Jun 22, 2024Updated last year
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Sep 9, 2024Updated last year
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models☆29Feb 4, 2026Updated last month
- SEED-Voken: A Series of Powerful Visual Tokenizers☆997Nov 25, 2025Updated 3 months ago
- Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023☆27Apr 27, 2023Updated 2 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆81Jun 7, 2024Updated last year
- Our 2nd-gen LMM☆34May 22, 2024Updated last year
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 6 months ago
- ☆33Jun 29, 2023Updated 2 years ago
- real-time speech enhance☆17Jan 23, 2024Updated 2 years ago
- ☆15Apr 2, 2025Updated 11 months ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆17Nov 7, 2024Updated last year
- [AAAI 2025] Does VLM Classification Benefit from LLM Description Semantics?☆25Aug 5, 2025Updated 7 months ago
- 单独维护的中文TTS☆34Oct 28, 2022Updated 3 years ago
- [IEEE PCS 2022 best paper finalist] "FloLPIPS: A Bespoke Video Quality Metric for Frame Interpoation", Duolikun Danier, Fan Zhang, David …☆22Mar 9, 2024Updated last year
- [AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.☆20Jan 22, 2026Updated last month
- ☆32Oct 23, 2025Updated 4 months ago
- ☆30Aug 21, 2025Updated 6 months ago
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- ☆15Aug 22, 2025Updated 6 months ago
- Official repo for Discriminator Guidance for ImageNet256.☆13Apr 27, 2023Updated 2 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆119May 19, 2025Updated 9 months ago
- [SCIS] MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images☆44Nov 19, 2025Updated 3 months ago
- Code release for the paper "Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control"☆17Apr 9, 2024Updated last year
- Official code for Deep Bayesian Video Frame Interpolation (ECCV2022)☆18May 29, 2023Updated 2 years ago
- Official implementation for "CONVIQT: Contrastive Video Quality Estimator"☆24Jun 14, 2022Updated 3 years ago
- Web application for real-time object detection 🔎 using Flask 🌶, OpenCV, and YoloV3 weights. It uses the COCO Dataset 🖼.☆16Apr 19, 2021Updated 4 years ago
- Ultrafast GAN based Vocoder for Text to Speech☆50Jul 16, 2022Updated 3 years ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆87Dec 20, 2024Updated last year
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆194Jul 12, 2024Updated last year
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 6 months ago
- ☆19Mar 22, 2024Updated last year
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆19Apr 24, 2024Updated last year
- [ACM MM 2025] MLLMs for Aesthetics Reasoning☆23Jan 5, 2026Updated 2 months ago
- 百川Dynamic NTK-ALiBi的代码实现: 无需微调即可推理更长文本☆49Aug 27, 2023Updated 2 years ago
- ☆111Jan 8, 2025Updated last year
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆50Jul 14, 2024Updated last year