The official GitHub page for the survey paper "Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey". And this paper is under review.
☆77Feb 18, 2026Updated 2 weeks ago
Alternatives and similar repositories for LLM-Discrete-Tokenization-Survey
Users that are interested in LLM-Discrete-Tokenization-Survey are comparing it to the libraries listed below
Sorting:
- ☆30Sep 15, 2025Updated 5 months ago
- Official implementation of the CVPR '25 highlight paper "Compositional Caching for Training-free Open-vocabulary Attribute Detection"☆23Dec 23, 2024Updated last year
- The demo page for ALMTokenizer☆59Apr 14, 2025Updated 10 months ago
- End-to-End Speech Processing Toolkit☆15Jan 20, 2025Updated last year
- Repository containing codebase for "FaceOff: A Video-to-Video Face Swapping Network" accepted at WACV 2023☆31Jan 22, 2023Updated 3 years ago
- ☆14Mar 12, 2023Updated 2 years ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 9 months ago
- Trainging, inference, and testing of the SAC speech codec model.☆99Nov 1, 2025Updated 4 months ago
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆105Apr 23, 2025Updated 10 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆46Nov 17, 2024Updated last year
- Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning☆25Jan 5, 2026Updated 2 months ago
- The code for AAAI 2025 “Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation”☆15Jan 3, 2025Updated last year
- Official implementation of Spectro-Riemannian Graph Neural Networks (ICLR 2025)☆17May 30, 2025Updated 9 months ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆27Feb 13, 2026Updated 3 weeks ago
- Thesis Template☆10Mar 2, 2026Updated last week
- CoMA: Compositional Human Motion Generation with Multi-modal Agents☆14Jul 31, 2025Updated 7 months ago
- Anki add-on that adds Pinyin and Zhuyin readings above Chinese characters in any field.☆12Sep 23, 2025Updated 5 months ago
- ☆15Feb 12, 2026Updated 3 weeks ago
- Feature extraction from audio signal (explained in Persian)☆12May 7, 2022Updated 3 years ago
- code for paper "DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints"☆26Feb 4, 2025Updated last year
- ☆306May 29, 2025Updated 9 months ago
- [CVPR 2024] Dual Prototype Attention for Unsupervised Video Object Segmentation☆39Apr 21, 2024Updated last year
- Awesome video instance segmentation papers☆51Dec 17, 2025Updated 2 months ago
- (ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec☆46May 16, 2025Updated 9 months ago
- Agentic Keyframe Search for Video Question Answering☆16Apr 7, 2025Updated 11 months ago
- This project is a demonstration of a content-based recommendation system for Spotify that leverages user's preferences and audio features…☆17Apr 4, 2023Updated 2 years ago
- 支持Linux DO的ChatGPT/Claude/Midjourney/API/Grok 共享平台-前端项目☆13Apr 30, 2025Updated 10 months ago
- ☆10Dec 16, 2022Updated 3 years ago
- CVPR 2021 Oral Paper PatchGenCN☆11Oct 28, 2021Updated 4 years ago
- ☆11Jul 2, 2022Updated 3 years ago
- Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)☆11Jun 20, 2025Updated 8 months ago
- ☆13Jan 22, 2025Updated last year
- ☆11Sep 27, 2023Updated 2 years ago
- 数模组新生入门手册——长期维护> <(使用GPL许可证 非商用授权 如果使用其中内容请表明出处)☆11Oct 11, 2019Updated 6 years ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- SoundMatchAnalyser (SMA) is a powerful tool designed to analyze and compare audio quality by evaluating the differences between two audio…☆14Jan 1, 2025Updated last year
- Starter template for building a JS HTML chatbot☆10Mar 21, 2024Updated last year
- Phonemes and durations labeling based on whisper small☆11Jul 7, 2024Updated last year
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆65Jul 8, 2025Updated 8 months ago