Qinyu-Allen-Zhao / ArinarLinks
☆37Updated last week
Alternatives and similar repositories for Arinar
Users that are interested in Arinar are comparing it to the libraries listed below
Sorting:
- ☆30Updated 4 months ago
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆29Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆74Updated 3 months ago
- The official repository for paper "MLLMs Need 3D-Aware Representation Supervision for Scene Understanding"☆30Updated this week
- FQGAN: Factorized Visual Tokenization and Generation☆50Updated 2 months ago
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆71Updated 3 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆89Updated last month
- [CVPR 2025] Test-Time Visual In-Context Tuning☆23Updated 2 months ago
- (ICLR 2024, CVPR 2024) SparseFormer☆74Updated 6 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆40Updated 6 months ago
- ☆61Updated last year
- Official repository of paper "Subobject-level Image Tokenization"☆73Updated 2 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆71Updated last year
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆40Updated 11 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆69Updated 7 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆31Updated 3 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference☆82Updated 2 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆68Updated 3 months ago
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆117Updated 2 weeks ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆75Updated 7 months ago
- Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language☆24Updated 3 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆15Updated 2 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆31Updated 6 months ago
- A collection of vision foundation models unifying understanding and generation.☆55Updated 5 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆86Updated 7 months ago
- Curated list of recent visual autoregressive (VAR) modeling works☆29Updated 2 months ago
- Official implementation of LaVin-DiT☆32Updated 4 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆34Updated 11 months ago
- ReNeg: Learning Negative Embedding with Reward Guidance☆32Updated 5 months ago