EmmaSRH / ARVFM
Awesome autoregressive vision foundation models
☆25Updated last month
Alternatives and similar repositories for ARVFM:
Users that are interested in ARVFM are comparing it to the libraries listed below
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆95Updated last month
- The official implementation of "Adapter is All You Need for Tuning Visual Tasks".☆77Updated 5 months ago
- Open implementation of "RandAR"☆51Updated 2 weeks ago
- Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language☆22Updated 2 weeks ago
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆45Updated 2 weeks ago
- A collection of vision foundation models unifying understanding and generation.☆40Updated 3 weeks ago
- ☆15Updated 5 months ago
- PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆24Updated last month
- ☆16Updated last year
- ☆37Updated 4 months ago
- ☆20Updated 3 weeks ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆60Updated 3 months ago
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆75Updated this week
- Liquid: Language Models are Scalable Multi-modal Generators☆61Updated last month
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆65Updated 3 months ago
- OVMR: Open-Vocabulary Recognition with Multi-Modal References (CVPR24)☆23Updated 2 months ago
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"☆67Updated 4 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆61Updated 5 months ago
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference☆74Updated 5 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆32Updated 7 months ago
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆20Updated 3 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆92Updated 6 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆122Updated 2 months ago
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆40Updated 2 months ago
- FQGAN: Factorized Visual Tokenization and Generation☆40Updated 3 weeks ago
- ☆58Updated last year
- ☆27Updated 4 months ago
- Code release for "SegLLM: Multi-round Reasoning Segmentation"☆58Updated last week
- [NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examples☆46Updated 3 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆115Updated 2 weeks ago