lucasjinreal / LLaVA-Magvit2View external linksLinks
LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.
☆39Jun 20, 2024Updated last year
Alternatives and similar repositories for LLaVA-Magvit2
Users that are interested in LLaVA-Magvit2 are comparing it to the libraries listed below
Sorting:
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆40Jun 22, 2024Updated last year
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Sep 9, 2024Updated last year
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 4 months ago
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models☆26Feb 4, 2026Updated last week
- SEED-Voken: A Series of Powerful Visual Tokenizers☆993Nov 25, 2025Updated 2 months ago
- Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023☆27Apr 27, 2023Updated 2 years ago
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.☆322Jul 9, 2024Updated last year
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆81Jun 7, 2024Updated last year
- Our 2nd-gen LMM☆34May 22, 2024Updated last year
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆42Sep 5, 2025Updated 5 months ago
- ☆33Jun 29, 2023Updated 2 years ago
- [AAAI 2025] Does VLM Classification Benefit from LLM Description Semantics?☆25Aug 5, 2025Updated 6 months ago
- real-time speech enhance☆17Jan 23, 2024Updated 2 years ago
- ☆15Apr 2, 2025Updated 10 months ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆17Nov 7, 2024Updated last year
- 单独维护的中文TTS☆34Oct 28, 2022Updated 3 years ago
- ☆32Oct 23, 2025Updated 3 months ago
- [IEEE PCS 2022 best paper finalist] "FloLPIPS: A Bespoke Video Quality Metric for Frame Interpoation", Duolikun Danier, Fan Zhang, David …☆22Mar 9, 2024Updated last year
- [AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.☆20Jan 22, 2026Updated 3 weeks ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- Official repo for Discriminator Guidance for ImageNet256.☆13Apr 27, 2023Updated 2 years ago
- ☆30Aug 21, 2025Updated 5 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆118May 19, 2025Updated 8 months ago
- [SCIS] MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images☆44Nov 19, 2025Updated 2 months ago
- Code release for the paper "Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control"☆17Apr 9, 2024Updated last year
- Official code for Deep Bayesian Video Frame Interpolation (ECCV2022)☆18May 29, 2023Updated 2 years ago
- Official implementation for "CONVIQT: Contrastive Video Quality Estimator"☆23Jun 14, 2022Updated 3 years ago
- Web application for real-time object detection 🔎 using Flask 🌶, OpenCV, and YoloV3 weights. It uses the COCO Dataset 🖼.☆16Apr 19, 2021Updated 4 years ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆182Jun 20, 2024Updated last year
- Ultrafast GAN based Vocoder for Text to Speech☆50Jul 16, 2022Updated 3 years ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆191Jul 12, 2024Updated last year
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆19Apr 24, 2024Updated last year
- [ACM MM 2025] MLLMs for Aesthetics Reasoning☆23Jan 5, 2026Updated last month
- 百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本☆49Aug 27, 2023Updated 2 years ago
- ☆111Jan 8, 2025Updated last year
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆50Jul 14, 2024Updated last year
- ☆20Jan 6, 2023Updated 3 years ago