[TPAMI 2026] Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey.
☆80Mar 25, 2026Updated last month
Alternatives and similar repositories for LLM-Discrete-Tokenization-Survey
Users that are interested in LLM-Discrete-Tokenization-Survey are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆34Sep 15, 2025Updated 8 months ago
- The demo page for ALMTokenizer☆59Apr 14, 2025Updated last year
- [TPAMI 2026] CLIP-Powered Domain Generalization and Domain Adaptation: A Comprehensive Survey.☆76Mar 25, 2026Updated last month
- Accepted by Neurl Networks☆13May 11, 2026Updated last week
- Repository containing codebase for "FaceOff: A Video-to-Video Face Swapping Network" accepted at WACV 2023☆32Jan 22, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 11 months ago
- ☆51Mar 5, 2026Updated 2 months ago
- End-to-End Speech Processing Toolkit☆16Jan 20, 2025Updated last year
- (ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec☆48May 16, 2025Updated last year
- Testing sets for semanticVAD☆20Feb 18, 2025Updated last year
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆42Jun 10, 2025Updated 11 months ago
- Explore how to get a VQ-VAE models efficiently!☆70Jul 24, 2025Updated 9 months ago
- now-defunct fork of three20 -- please see facebook/three20 for most/all purposes☆17Aug 20, 2010Updated 15 years ago
- A simple implementation for improving CosyVoice2 by GRPO method☆38May 5, 2026Updated 2 weeks ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A curated list of full-duplex spoken dialogue models & benchmarks☆68May 5, 2026Updated 2 weeks ago
- Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”☆57Jun 1, 2025Updated 11 months ago
- ☆19Jul 31, 2025Updated 9 months ago
- FoF Upload,but with TencentCloud COS☆14Nov 10, 2024Updated last year
- Cut2Next: Generating Next Shot via In-Context Tuning☆32Aug 21, 2025Updated 9 months ago
- Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning☆27Mar 30, 2026Updated last month
- Official implementation of the CVPR '25 highlight paper "Compositional Caching for Training-free Open-vocabulary Attribute Detection"☆24Dec 23, 2024Updated last year
- Spatially Embedded Video Codec☆15Jun 7, 2025Updated 11 months ago
- ☆16Apr 8, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A collection of papers and libraries for performing multi-agent optimization☆18Feb 7, 2026Updated 3 months ago
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆76Dec 3, 2025Updated 5 months ago
- [CVPR'24] Neural Clustering based Visual Representation Learning☆44Oct 6, 2025Updated 7 months ago
- [CVPR 2026] Official repo for "EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation"☆58Mar 13, 2026Updated 2 months ago
- This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …☆15Aug 31, 2021Updated 4 years ago
- ☆13Apr 1, 2022Updated 4 years ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- The code for AAAI 2025 “Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation”☆15Jan 3, 2025Updated last year
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Nov 11, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Codebase for the paper-Elucidating the design space of language models for image generation☆46Nov 17, 2024Updated last year
- Neural audio codecs that use end-to-end approaches have gained popularity due to their ability to learn efficient audio representations t…☆16Apr 29, 2023Updated 3 years ago
- The project is about predicting sets (of classes) from images.☆23Aug 31, 2021Updated 4 years ago
- THOUGHTSCULPT, a general reasoning and search method for complex tasks☆13Dec 13, 2024Updated last year
- [AAAI 2026 Oral] HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment☆30Dec 17, 2025Updated 5 months ago
- Leveraging the Spatial Hierarchy: Coarse-to-fine Trajectory Generation via Cascaded Hybrid Diffusion☆15Apr 29, 2026Updated 3 weeks ago
- Code for "A Bilingual Generative Transformer for Semantic Sentence Embedding" published at EMNLP 2020.☆10Nov 20, 2020Updated 5 years ago