FoundationVision / OmniTokenizer
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
☆261Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for OmniTokenizer
- Evaluating text-to-image/video/3D models with VQAScore☆229Updated 2 months ago
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation☆97Updated last month
- Visualization of DiT self attention features☆158Updated 3 months ago
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".☆214Updated 3 weeks ago
- MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution☆289Updated this week
- [NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation☆186Updated this week
- Mathematical Visual Instruction Tuning for Multi-modal Large Language Models☆109Updated 3 months ago
- [ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation☆313Updated 4 months ago
- ☆102Updated 4 months ago
- [CVPR 2024] Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation☆104Updated 7 months ago
- [NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models☆243Updated 2 weeks ago
- Implements VAR+CLIP for image generation☆78Updated 3 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆42Updated 4 months ago
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆564Updated 5 months ago
- This repo contains the code for 1D tokenizer and generator☆548Updated this week
- [ICLR2024] The official implementation of paper "VDT: General-purpose Video Diffusion Transformers via Mask Modeling", by Haoyu Lu, Guoxi…☆209Updated 6 months ago
- ☆193Updated 4 months ago
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆108Updated last month
- 🔥ImageFolder: Autoregressive Image Generation with Folded Tokens☆55Updated this week
- SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree☆283Updated 2 weeks ago
- MoVQGAN - model for the image encoding and reconstruction☆197Updated last year
- Official implementation for "Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model" (NeurIPS 2024)☆290Updated 3 weeks ago
- STAR: Scale-wise Text-to-image generation via Auto-Regressive representations☆122Updated 5 months ago
- Scaling Diffusion Transformers with Mixture of Experts☆207Updated 2 months ago
- A work list of recent human video generation method. This repository focus on half/full body human video generation method, The Nerf, Gau…☆210Updated last month
- ☆127Updated 2 weeks ago
- [CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models☆141Updated last month
- [NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"☆126Updated last month
- Diffusion Feedback Helps CLIP See Better☆215Updated 2 months ago
- [CVPR 2024] | LAMP: Learn a Motion Pattern for Few-Shot Based Video Generation☆267Updated 6 months ago