The official GitHub page for the survey paper "Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey". And this paper is under review.
☆77Feb 18, 2026Updated 2 weeks ago
Alternatives and similar repositories for LLM-Discrete-Tokenization-Survey
Users that are interested in LLM-Discrete-Tokenization-Survey are comparing it to the libraries listed below
Sorting:
- ☆30Sep 15, 2025Updated 5 months ago
- ☆54Feb 3, 2026Updated last month
- End-to-End Speech Processing Toolkit☆15Jan 20, 2025Updated last year
- Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs (ECCV 2024)☆19Jul 15, 2024Updated last year
- Explore how to get a VQ-VAE models efficiently!☆68Jul 24, 2025Updated 7 months ago
- Repository containing codebase for "FaceOff: A Video-to-Video Face Swapping Network" accepted at WACV 2023☆31Jan 22, 2023Updated 3 years ago
- ☆14Mar 12, 2023Updated 2 years ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 9 months ago
- Facial-Expression Recognition with Deep Neural Networks☆10Mar 6, 2016Updated 10 years ago
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆105Apr 23, 2025Updated 10 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆46Nov 17, 2024Updated last year
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- ☆10Apr 13, 2022Updated 3 years ago
- The code for AAAI 2025 “Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation”☆15Jan 3, 2025Updated last year
- code for paper "DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints"☆26Feb 4, 2025Updated last year
- Dewey Data Inc. Python API☆14Jul 2, 2025Updated 8 months ago
- Thesis Template☆10Mar 2, 2026Updated last week
- Anki add-on that adds Pinyin and Zhuyin readings above Chinese characters in any field.☆12Sep 23, 2025Updated 5 months ago
- Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning☆25Jan 5, 2026Updated 2 months ago
- ☆14Aug 28, 2024Updated last year
- Feature extraction from audio signal (explained in Persian)☆12May 7, 2022Updated 3 years ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆27Feb 13, 2026Updated 3 weeks ago
- ☆306May 29, 2025Updated 9 months ago
- [CVPR 2024] Dual Prototype Attention for Unsupervised Video Object Segmentation☆39Apr 21, 2024Updated last year
- Official implementation of "Physics-Informed Long-Sequence Forecasting From Multi-Resolution Spatiotemporal Data".☆11Dec 12, 2022Updated 3 years ago
- LongCTR: A Long Sequence Modeling Benchmark for CTR Prediction☆17Jun 21, 2025Updated 8 months ago
- Reversi AI based on Monte Carlo search algorithm☆10Apr 2, 2025Updated 11 months ago
- Agentic Keyframe Search for Video Question Answering☆16Apr 7, 2025Updated 11 months ago
- Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"☆14Feb 24, 2025Updated last year
- The Koudai48 VOD Manager☆10May 2, 2019Updated 6 years ago
- 🏆🏅 Repository for the GEB team's winning solutions in the IEEE Hybrid Energy Forecasting and Trading Competition (HEFTCom).☆28Oct 4, 2025Updated 5 months ago
- [CVPR 2025] Official implementation of SSP: High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Se…☆15Jun 26, 2025Updated 8 months ago
- 用于自动预约民政局婚姻登记处的号,限广东省民政局☆10Jun 25, 2023Updated 2 years ago
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- This project is a demonstration of a content-based recommendation system for Spotify that leverages user's preferences and audio features…☆17Apr 4, 2023Updated 2 years ago
- CIKM 23 Oral - HoLe: Homophily-enhanced Structure Learning for Graph Clustering☆10Feb 29, 2024Updated 2 years ago
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆65Jul 8, 2025Updated 8 months ago
- A collection of papers and libraries for performing multi-agent optimization☆17Feb 7, 2026Updated last month
- CVPR 2021 Oral Paper PatchGenCN☆11Oct 28, 2021Updated 4 years ago