ChocoWu/SeTok

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ChocoWu/SeTok)

ChocoWu / SeTok

Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM

☆81

Alternatives and similar repositories for SeTok

Users that are interested in SeTok are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

scofield7419 / Dysen
View on GitHub
CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
☆14Mar 19, 2024Updated 2 years ago
scofield7419 / Video-of-Thought
View on GitHub
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆182Feb 25, 2025Updated last year
THUNLP-MT / ActiView
View on GitHub
☆11Dec 20, 2024Updated last year
scofield7419 / THOR-ISA
View on GitHub
Codes for ACL 2023 paper: Reasoning Implicit Sentiment with Chain-of-Thought Prompting
☆108Aug 28, 2023Updated 2 years ago
MIV-XJTU / FLAME
View on GitHub
[CVPR 2025] PyTorch implementation of paper "FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training"
☆33Jul 8, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SkyworkAI / DAQ-VS
View on GitHub
Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]
☆15Jul 11, 2024Updated 2 years ago
SkyworkAI / Vitron
View on GitHub
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
☆576Oct 20, 2024Updated last year
snap-research / VIMI
View on GitHub
☆13Jul 10, 2024Updated 2 years ago
scofield7419 / EmpathyEar
View on GitHub
Multimodal Empathetic Chatbot
☆54Jul 16, 2024Updated 2 years ago
ChocoWu / Awesome-Scene-Graph-Generation
View on GitHub
This is a repository for listing papers on scene graph generation and application.
☆703Jul 10, 2026Updated last week
xushilin1 / dst-det
View on GitHub
[TCSVT] state-of-the-art open vocabulary detector on COCO/LVIS/V3Det
☆34Jun 3, 2025Updated last year
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
Cominclip / BoxDiff-XL
View on GitHub
Extend BoxDiff to SDXL (SDXL-based layout-to-image generation)
☆28May 23, 2024Updated 2 years ago
s-vco / s-vco
View on GitHub
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
☆19Jun 4, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hrtang22 / MUSE
View on GitHub
Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"
☆26Feb 2, 2025Updated last year
UX-Decoder / FIND
View on GitHub
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆132Aug 21, 2024Updated last year
alipay / POA
View on GitHub
☆22Aug 8, 2024Updated last year
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
wangqixun / mfpsg
View on GitHub
mask2former psg
☆22Dec 12, 2022Updated 3 years ago
RunpeiDong / DreamLLM
View on GitHub
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆462Dec 2, 2024Updated last year
Shengcao-Cao / groundLMM
View on GitHub
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆47Oct 19, 2025Updated 9 months ago
vkhoi / cora_cvpr24
View on GitHub
☆28Sep 3, 2024Updated last year
FoundationVision / UniTok
View on GitHub
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
☆529Nov 14, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Layjins / Spider
View on GitHub
Code for paper "Spider: Any-to-Many Multimodal LLM"
☆16Apr 26, 2025Updated last year
yunyikristy / skipNet
View on GitHub
☆12Oct 21, 2019Updated 6 years ago
wusize / Harmon
View on GitHub
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
☆191May 21, 2025Updated last year
jylins / hourllava
View on GitHub
[NeurIPS 2025 Spotlight] Unleashing Hour-Scale Video Training for Long Video-Language Understanding
☆19Jun 24, 2025Updated last year
tliby / UniFork
View on GitHub
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
☆48Aug 26, 2025Updated 10 months ago
xyzhang17 / SharpContour
View on GitHub
☆12Mar 28, 2022Updated 4 years ago
Pepper-lll / LMforImageGeneration
View on GitHub
Codebase for the paper-Elucidating the design space of language models for image generation
☆45Nov 17, 2024Updated last year
hanghuacs / FineCaption
View on GitHub
☆39Jun 20, 2025Updated last year
xzc-zju / AdaVideoRAG
View on GitHub
[NeurIPS 2025] AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
☆15Jun 16, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
VIM-Bench / VIM_TOOL
View on GitHub
☆12Jun 12, 2024Updated 2 years ago
GeoX-Lab / RS-GPT4V
View on GitHub
☆37Jul 1, 2024Updated 2 years ago
M-E-AGI-Lab / Muddit
View on GitHub
[ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusio…
☆119Apr 13, 2026Updated 3 months ago
inst-it / inst-it
View on GitHub
[NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…
☆40Feb 20, 2025Updated last year
UCSC-VLAA / Recap-DataComp-1B
View on GitHub
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆152Jun 13, 2024Updated 2 years ago
scofield7419 / UMMT-VSH
View on GitHub
Code for the ACL 2023 paper Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Sc…
☆12May 19, 2023Updated 3 years ago
pengts / VW-LMM
View on GitHub
☆25May 13, 2024Updated 2 years ago