wdrink/OpenTokenizer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wdrink/OpenTokenizer)

wdrink / OpenTokenizer

☆21

Alternatives and similar repositories for OpenTokenizer

Users that are interested in OpenTokenizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MengLcool / SliMM
View on GitHub
☆25Dec 26, 2024Updated last year
inst-it / inst-it
View on GitHub
[NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…
☆40Feb 20, 2025Updated last year
Row11n / Prova
View on GitHub
[AAAI-25] Official repository of "Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object De…
☆20Dec 27, 2024Updated last year
wdrink / ARM
View on GitHub
ARM: An AutoRegressive Large Multimodal Model with Discrete Representations
☆50Jun 10, 2026Updated last month
JPShi12 / VideoLoom
View on GitHub
[ICML 2026] VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
☆27Jul 3, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
FoundationVision / OmniTokenizer
View on GitHub
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
☆325Jul 9, 2024Updated 2 years ago
kay-ck / GCMA
View on GitHub
[ACM MM2023] Code Release of GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos
☆12Mar 29, 2024Updated 2 years ago
ShareLab-SII / FluxMem
View on GitHub
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
☆73Mar 16, 2026Updated 4 months ago
xinwong / AdvDetect
View on GitHub
Adversarial Examples Detection Benchmark
☆16Dec 6, 2024Updated last year
Gao-zy26 / ReToMe-VA
View on GitHub
[ACM MM 2024] ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack
☆14Dec 20, 2024Updated last year
wdrink / RepWAM
View on GitHub
Code for RepWAM: World Action Modeling with Representation Visual-Action Tokenizers
☆57Jun 14, 2026Updated last month
wdrink / OmniVid
View on GitHub
☆57Jun 4, 2024Updated 2 years ago
ShareLab-SII / CoMP-MM
View on GitHub
Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"
☆48Apr 3, 2025Updated last year
wdrink / SimpleAR
View on GitHub
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
☆431Jun 20, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
kay-ck / BSC-Attack
View on GitHub
[AAAI2022] Code Release of Attacking Video Recognition Models with Bullet-Screen Comments
☆25Mar 30, 2024Updated 2 years ago
ShareLab-SII / CaTok
View on GitHub
[CVPR-26] Official repository of "CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization"
☆19Mar 9, 2026Updated 4 months ago
SxJyJay / UniToken
View on GitHub
[CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…
☆106Apr 23, 2025Updated last year
Iriya99 / OVRE
View on GitHub
☆23Oct 28, 2024Updated last year
MengLcool / DeepStack-VL
View on GitHub
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆93Jun 17, 2024Updated 2 years ago
Stevetich / EventHallusion
View on GitHub
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
☆34Aug 5, 2025Updated 11 months ago
HuiZhang0812 / WeEdit
View on GitHub
A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing
☆20Mar 13, 2026Updated 4 months ago
X2FD / LVIS-INSTRUCT4V
View on GitHub
☆134Dec 22, 2023Updated 2 years ago
HuiZhang0812 / CreatiLayout
View on GitHub
[ICCV 2025] CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
☆135Aug 6, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
RU-System-Software-and-Security / NIC
View on GitHub
☆12Mar 24, 2023Updated 3 years ago
JingyuanZhou / Task_Adaptive_Network
View on GitHub
☆11Nov 8, 2022Updated 3 years ago
ali-vilab / Unison
View on GitHub
☆17Dec 11, 2025Updated 7 months ago
LINs-lab / GMem
View on GitHub
[Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models
☆43Mar 11, 2025Updated last year
forwchen / LLaVA-MoLE
View on GitHub
☆10Mar 4, 2024Updated 2 years ago
ytaek-oh / vl_compo
View on GitHub
☆10Jul 5, 2024Updated 2 years ago
geshang777 / pix2cap
View on GitHub
Official Implementation of "Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning"
☆28Dec 16, 2025Updated 7 months ago
vita-epfl / rock-pytorch
View on GitHub
A PyTorch implementation of "Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection"
☆14Jun 29, 2020Updated 6 years ago
TyroneLi / ESOL_WSSS
View on GitHub
☆14Jan 4, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
MengLcool / Magic-Pencil
View on GitHub
implementation of "Combining Sketch and Tone for Pencil Drawing Production"
☆16May 16, 2019Updated 7 years ago
nailwatts / FNIN
View on GitHub
FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients
☆13Jan 22, 2025Updated last year
MengLcool / AdaViT
View on GitHub
[CVPR-22] This is the official implementation of the paper "Adavit: Adaptive vision transformers for efficient image recognition".
☆56Aug 18, 2022Updated 3 years ago
elis2496 / maxup_implementation
View on GitHub
☆12Nov 16, 2020Updated 5 years ago
r-cui / ViGA
View on GitHub
"Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022
☆68Jun 27, 2022Updated 4 years ago
sail-sg / imperceptible-jailbreaks
View on GitHub
[ArXiv 2025] Imperceptible Jailbreaking against Large Language Models
☆25Oct 7, 2025Updated 9 months ago
KlingAIResearch / diffusing-right-space
View on GitHub
Metric implementation and raw data of "Diffusing in the Right Space: A Systematic Study of Latent Diffusability"
☆34Jun 16, 2026Updated last month