[TPAMI 2026] Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey.
☆82Mar 25, 2026Updated 2 months ago
Alternatives and similar repositories for LLM-Discrete-Tokenization-Survey
Users that are interested in LLM-Discrete-Tokenization-Survey are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆34Sep 15, 2025Updated 8 months ago
- The demo page for ALMTokenizer☆59Apr 14, 2025Updated last year
- Multi-scale Attention Fusion Graph Network for Remote Sensing Building Change Detection☆18Jan 7, 2024Updated 2 years ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated last year
- ☆25Nov 26, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.☆106Nov 1, 2025Updated 7 months ago
- ☆51Mar 5, 2026Updated 3 months ago
- (ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec☆48May 16, 2025Updated last year
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆42Jun 10, 2025Updated last year
- Explore how to get a VQ-VAE models efficiently!☆70Jul 24, 2025Updated 10 months ago
- Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs (ECCV 2024)☆19Jul 15, 2024Updated last year
- DeepEarth: AI Foundation Model for Planetary Science & Sustainability☆32Jun 1, 2026Updated last week
- A simple implementation for improving CosyVoice2 by GRPO method☆38May 5, 2026Updated last month
- Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”☆57Jun 1, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆19Jul 31, 2025Updated 10 months ago
- FoF Upload,but with TencentCloud COS☆14Nov 10, 2024Updated last year
- Agentic Keyframe Search for Video Question Answering☆18Apr 7, 2025Updated last year
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆47Sep 19, 2025Updated 8 months ago
- ☆47Mar 7, 2025Updated last year
- ☆20Oct 16, 2023Updated 2 years ago
- A curated list of full-duplex spoken dialogue models & benchmarks☆91Updated this week
- Official implementation of the CVPR '25 highlight paper "Compositional Caching for Training-free Open-vocabulary Attribute Detection"☆23Dec 23, 2024Updated last year
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆32Feb 13, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Dewey Data Inc. Python API☆15Jul 2, 2025Updated 11 months ago
- Spatially Embedded Video Codec☆15Jun 7, 2025Updated last year
- ☆16Apr 8, 2026Updated 2 months ago
- A collection of papers and libraries for performing multi-agent optimization☆19Updated this week
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆76Dec 3, 2025Updated 6 months ago
- [CVPR 2026] Official repo for "EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation"☆60Mar 13, 2026Updated 2 months ago
- The official repository for the NLP-KG web application [ACL 2024 Demo].☆14Oct 16, 2025Updated 7 months ago
- ☆13Apr 1, 2022Updated 4 years ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A lightweight audio codec based on a single quantizer☆71Aug 15, 2025Updated 9 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Nov 11, 2024Updated last year
- Codebase for the paper-Elucidating the design space of language models for image generation☆46Nov 17, 2024Updated last year
- This is an official code for UniConvNet on ICCV 2025☆42Nov 21, 2025Updated 6 months ago
- The project is about predicting sets (of classes) from images.☆23Aug 31, 2021Updated 4 years ago
- THOUGHTSCULPT, a general reasoning and search method for complex tasks☆13Dec 13, 2024Updated last year
- [AAAI 2026 Oral] HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment☆29Dec 17, 2025Updated 5 months ago