Awesome autoregressive vision foundation models
☆26Dec 24, 2024Updated last year
Alternatives and similar repositories for ARVFM
Users that are interested in ARVFM are comparing it to the libraries listed below
Sorting:
- This is the implementation of Embedded Prompt Tuning(EPT).☆14Feb 10, 2025Updated last year
- [NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations☆19Jan 19, 2025Updated last year
- A collection of vision foundation models unifying understanding and generation.☆59Jan 2, 2025Updated last year
- Semi automated image analysis toolkit for Connectomics☆12Aug 21, 2019Updated 6 years ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- ☆17Aug 7, 2024Updated last year
- ☆20May 28, 2025Updated 9 months ago
- ☆19Jan 10, 2025Updated last year
- ☆30Jan 18, 2026Updated last month
- [NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training☆27Dec 5, 2023Updated 2 years ago
- [NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.☆30Nov 13, 2025Updated 3 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 24, 2026Updated last week
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆117Feb 4, 2026Updated last month
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆46Aug 26, 2025Updated 6 months ago
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- [ICLR'25 Oral] MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models☆35Nov 3, 2024Updated last year
- ☆28Jul 22, 2024Updated last year
- ☆124Aug 20, 2025Updated 6 months ago
- Repo for "Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content"☆40Jun 9, 2025Updated 8 months ago
- [ICML 2023] FedBR: Improving Federated Learning on Heterogeneous Data via Local Learning Bias Reduction☆27Mar 7, 2024Updated last year
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- [ACM MM2019] Learning Semantics-aware Distance Map with Semantics Layering Network for Amodal Instance Segmentation☆32Sep 3, 2020Updated 5 years ago
- [CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection☆29Sep 26, 2024Updated last year
- MLLMSeg: Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decoder☆51Aug 16, 2025Updated 6 months ago
- [ACL 2025] Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging☆39Jun 4, 2025Updated 9 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- [NeurIPS 2025🔥:] EVODiff is an inference-time refinement method for diffusion models that improves sampling efficiency and generative f…☆29Feb 2, 2026Updated last month
- The official implement of paper 《DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents》☆29Oct 23, 2025Updated 4 months ago
- ☆11Jun 22, 2025Updated 8 months ago
- [ICLR 2026] ParallelBench: Understanding the Tradeoffs of Parallel Decoding in Diffusion LLMs☆42Updated this week
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆30Oct 2, 2025Updated 5 months ago
- [CVPR'25] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization☆47Jul 22, 2025Updated 7 months ago
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆40Feb 20, 2025Updated last year
- Multimodal-Composite-Editing-and-Retrieval-update☆35Oct 13, 2025Updated 4 months ago
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paper☆95Nov 13, 2025Updated 3 months ago
- ☆37Sep 16, 2024Updated last year