zhanghm1995 / Awesome-VAR
A curated list of resources focused on Visual AutoRegressive Modeling, makes GPT-style AR models surpass diffusion transformers in image generation.
☆29Updated last month
Alternatives and similar repositories for Awesome-VAR:
Users that are interested in Awesome-VAR are comparing it to the libraries listed below
- PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models☆43Updated 3 months ago
- [CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes☆21Updated 2 weeks ago
- Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting☆77Updated 2 weeks ago
- Official code of DMA: Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding, ECCV 2024☆29Updated 9 months ago
- This repository is dedicated to Track 2 of the W-CODA 2024 Workshop, "Multimodal Perception and Comprehension of Corner Cases in Autonomo…☆10Updated 10 months ago
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆107Updated last month
- ☆29Updated 7 months ago
- Official PyTorch implementation of paper “InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction”☆14Updated 3 weeks ago
- [CVPR 2025] DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention☆161Updated last month
- [ECCV 2024] DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving☆23Updated 4 months ago
- Official Github Repo for GEM☆41Updated last week
- [CVPR 2025 Highlight🔥] Official code repository for "Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuni…☆69Updated last week
- Project Page for GaussianFormer☆25Updated 10 months ago
- ☆20Updated 2 weeks ago
- [CVPR 2025] GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding☆130Updated 3 weeks ago
- [ICLR 2025] Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving☆26Updated 2 months ago
- ☆42Updated 3 weeks ago
- ☆38Updated 9 months ago
- ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention (ECCV 2024)☆77Updated 9 months ago
- PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆71Updated this week
- [NeurIPS 2024] XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation☆32Updated 3 months ago
- This is the official implementation for ControlVAR.☆102Updated 4 months ago
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆18Updated 2 weeks ago
- officical code for ECCV 2024 paper "Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection"☆14Updated 9 months ago
- Paper: UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting☆15Updated last month
- Open-Vocabulary SAM3D: Understand Any 3D Scene☆27Updated 7 months ago
- Official implementation of paper "Pyramid Diffusion for Fine 3D Large Scene Generation" (ECCV 2024 Oral)☆123Updated 2 weeks ago
- ☆99Updated 5 months ago
- ROOT: VLM based System for Indoor Scene Understanding and Beyond☆25Updated 3 months ago
- [CVPR 2024] Exploiting Diffusion Prior for Generalizable Dense Prediction☆73Updated last year