EmmaSRH / ARVFM
Awesome autoregressive vision foundation models
☆25Updated 4 months ago
Alternatives and similar repositories for ARVFM
Users that are interested in ARVFM are comparing it to the libraries listed below
Sorting:
- A collection of vision foundation models unifying understanding and generation.☆55Updated 4 months ago
- ☆28Updated 4 months ago
- [NeurIPS'24] Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation (Diffews)☆36Updated last month
- ☆35Updated 2 weeks ago
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆103Updated last month
- FQGAN: Factorized Visual Tokenization and Generation☆50Updated last month
- Autoregressive Image Generation with Randomized Parallel Decoding☆59Updated last month
- [NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examples☆53Updated 6 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆41Updated last month
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆43Updated 4 months ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories☆38Updated 2 months ago
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆22Updated 7 months ago
- ☆16Updated last year
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆34Updated 8 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆31Updated 3 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆83Updated last month
- ☆41Updated 7 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆55Updated 3 months ago
- Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language☆23Updated 2 months ago
- ☆31Updated last year
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆67Updated last month
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆38Updated 11 months ago
- ☆21Updated last year
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"☆70Updated 7 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆95Updated 10 months ago
- A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆29Updated last month
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆89Updated 9 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆44Updated 2 months ago
- ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning☆32Updated last month
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆75Updated 6 months ago