A Video Tokenizer Evaluation Dataset
☆151Jan 13, 2025Updated last year
Alternatives and similar repositories for TokenBench
Users that are interested in TokenBench are comparing it to the libraries listed below
Sorting:
- A suite of image and video neural tokenizers☆1,711Feb 11, 2025Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizers☆996Nov 25, 2025Updated 3 months ago
- ☆52Dec 13, 2024Updated last year
- a family of versatile and state-of-the-art video tokenizers.☆437Sep 1, 2025Updated 6 months ago
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.☆322Jul 9, 2024Updated last year
- ElasticTok: Adaptive Tokenization for Image and Video☆88Nov 4, 2024Updated last year
- ☆190Dec 17, 2024Updated last year
- Evaluation codes and data for GenEval2☆57Jan 8, 2026Updated last month
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆147Nov 14, 2024Updated last year
- ☆141Jun 28, 2024Updated last year
- This repo contains the code for 1D tokenizer and generator☆1,117Mar 20, 2025Updated 11 months ago
- Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?☆145Feb 11, 2025Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆20Jan 11, 2026Updated last month
- Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world m…☆414Jan 6, 2026Updated last month
- PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838☆1,863Feb 20, 2026Updated last week
- High-performance Image Tokenizers for VAR and AR☆303Apr 25, 2025Updated 10 months ago
- Next-Token Prediction is All You Need☆2,355Jan 12, 2026Updated last month
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆438Aug 8, 2025Updated 6 months ago
- ☆38Feb 6, 2025Updated last year
- New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos☆8,086Jan 6, 2026Updated last month
- Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation☆1,936Aug 15, 2024Updated last year
- A unified inference and post-training framework for accelerated video generation.☆3,111Updated this week
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆98Feb 11, 2025Updated last year
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆183Mar 20, 2025Updated 11 months ago
- [ICCV 2025] VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE☆392Jan 19, 2025Updated last year
- [ICCV 2023] On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement☆66Sep 28, 2023Updated 2 years ago
- Official implementation for SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization.☆55Nov 12, 2025Updated 3 months ago
- [CVPR 2026] PAI-Bench: A Comprehensive Benchmark for Physical AI☆52Feb 21, 2026Updated last week
- the official repo for "D-AR: Diffusion via Autoregressive Models"☆135Jan 29, 2026Updated last month
- This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …☆13Dec 4, 2024Updated last year
- EgoToM is an egocentric theory-of-mind benchmark built on Ego4D videos, containing multi-choice questions that evaluate multimodal large …☆13Apr 1, 2025Updated 11 months ago
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆11Aug 11, 2025Updated 6 months ago
- ☆11Nov 7, 2024Updated last year
- Phonemes and durations labeling based on whisper small☆11Jul 7, 2024Updated last year
- [MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology☆12Jun 17, 2025Updated 8 months ago
- [CVPR 2023] Spatial-then-Temporal Self-Supervised Learning for Video Correspondence☆11Jul 5, 2023Updated 2 years ago
- A new multi-task learning framework using Vision Transformers☆11Jun 19, 2024Updated last year
- [MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"☆14Nov 1, 2024Updated last year
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆198Jan 7, 2026Updated last month