zhaoyue-zephyrus / bsq-vit
[arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization
☆83Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for bsq-vit
- [CVPR 2024] On the Content Bias in Fréchet Video Distance☆88Updated last month
- An in-context conditioning version of MUSE with pre-trained checkpoints.☆109Updated last year
- ☆29Updated last week
- 🔥ImageFolder: Autoregressive Image Generation with Folded Tokens☆53Updated 3 weeks ago
- Official implementation of Aurora☆81Updated last year
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆76Updated 3 weeks ago
- Minimal multi-gpu implementation of EDM2: "Analyzing and Improving the Training Dynamics of Diffusion Models"☆26Updated 8 months ago
- Official PyTorch Implementation of "Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models"☆30Updated last month
- ☆91Updated 5 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆40Updated 4 months ago
- A Pytorch Implementation of Finite Scalar Quantization☆80Updated 11 months ago
- Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition (ICLR 2024)☆27Updated 5 months ago
- [arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"☆114Updated 3 months ago
- This is a repo to track the latest autoregressive visual generation papers.☆41Updated 3 weeks ago
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆77Updated 7 months ago
- SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training☆160Updated 3 weeks ago
- ☆44Updated 2 months ago
- ☆101Updated 4 months ago
- Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?☆34Updated this week
- This is the official implementation for ControlVAR.☆52Updated 3 weeks ago
- ☆10Updated last year
- Comparison between Frechet Video Distance implementation from StyleGAN-V and the original paper☆88Updated last year
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆34Updated 3 weeks ago
- Official implementation of the paper "Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Vi…☆28Updated this week
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆83Updated 3 months ago
- Official GitHub repository for the Text-Guided Video Editing (TGVE) competition of LOVEU Workshop @ CVPR'23.☆71Updated last year
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆48Updated this week
- ☆57Updated last year
- Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆21Updated this week
- "SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow", Yuanzhi Zhu, Xingchao Liu, Qiang Liu☆38Updated 2 weeks ago