Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆79Apr 19, 2025Updated 10 months ago
Alternatives and similar repositories for SeTok
Users that are interested in SeTok are comparing it to the libraries listed below
Sorting:
- ☆12Dec 20, 2024Updated last year
- FQGAN: Factorized Visual Tokenization and Generation☆59Mar 29, 2025Updated 11 months ago
- ☆13Jul 10, 2024Updated last year
- ☆42Jul 9, 2025Updated 7 months ago
- CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs☆14Mar 19, 2024Updated last year
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Jul 11, 2024Updated last year
- [TCSVT] state-of-the-art open vocabulary detector on COCO/LVIS/V3Det☆32Jun 3, 2025Updated 8 months ago
- ☆21Aug 8, 2024Updated last year
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusio…☆99Feb 4, 2026Updated 3 weeks ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆159Sep 27, 2024Updated last year
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆40Feb 20, 2025Updated last year
- mask2former psg☆22Dec 12, 2022Updated 3 years ago
- Disentangled Pre-training for Human-Object Interaction Detection☆27Sep 17, 2025Updated 5 months ago
- ☆30Jan 18, 2026Updated last month
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆131Aug 21, 2024Updated last year
- Extend BoxDiff to SDXL (SDXL-based layout-to-image generation)☆26May 23, 2024Updated last year
- ☆11Nov 30, 2025Updated 3 months ago
- A PyTorch implementation of the paper "MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis".☆12Jan 16, 2023Updated 3 years ago
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆52Dec 5, 2024Updated last year
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆51Feb 4, 2026Updated 3 weeks ago
- Official Implementation for ACM MM2024 paper "VrdONE: One-stage Video Visual Relation Detection".☆11Nov 13, 2024Updated last year
- [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding☆512Nov 14, 2025Updated 3 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆181Oct 14, 2024Updated last year
- ☆34Aug 18, 2025Updated 6 months ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆149Jun 13, 2024Updated last year
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆42Oct 19, 2025Updated 4 months ago
- ☆24Jun 12, 2025Updated 8 months ago
- Code of the Grounded MUIE model, REAMO☆11Dec 3, 2024Updated last year
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- ☆11Jun 12, 2024Updated last year
- ☆12Mar 28, 2022Updated 3 years ago
- [ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision☆12Sep 17, 2023Updated 2 years ago
- [NeurIPS2023] Neural-Logic Human-Object Interaction Detection☆14Aug 24, 2024Updated last year
- ☆11Jan 16, 2024Updated 2 years ago
- ☆11Mar 11, 2025Updated 11 months ago
- [ICLR 2025] Adaptive prompt tailored pruning of T2I diffusion models.☆15Feb 1, 2025Updated last year