NVlabs / TokenBench
A Video Tokenizer Evaluation Dataset
☆38Updated last week
Related projects ⓘ
Alternatives and complementary repositories for TokenBench
- Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?☆71Updated this week
- ElasticTok: Adaptive Tokenization for Image and Video☆31Updated last week
- ☆30Updated 2 weeks ago
- Official PyTorch implmentation of paper "T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stitching"☆95Updated 8 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆41Updated 2 weeks ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆29Updated 4 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆38Updated 3 months ago
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆22Updated this week
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆13Updated 3 weeks ago
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆17Updated 2 weeks ago
- [ICML 2024] Compositional Image Decomposition with Diffusion Models☆40Updated 4 months ago
- Official implementation of "Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization"☆74Updated 7 months ago
- ☆10Updated last year
- Official implementation of the paper The Hidden Language of Diffusion Models☆69Updated 9 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- Code for the paper "GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos" published at CVPR 2024☆42Updated 8 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆29Updated this week
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆55Updated last month
- [arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization☆83Updated 5 months ago
- Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆22Updated last week
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 4 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆28Updated this week
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆54Updated 3 weeks ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated 2 months ago
- [ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning☆64Updated 2 years ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆76Updated last month
- [AAAI 2024] ConceptBed Evaluations for Personalized Text-to-Image Diffusion Models☆23Updated last year
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆40Updated 4 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year