☆80Jul 3, 2024Updated last year
Alternatives and similar repositories for unified-io-2.pytorch
Users that are interested in unified-io-2.pytorch are comparing it to the libraries listed below
Sorting:
- ☆643Feb 15, 2024Updated 2 years ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆459Dec 2, 2024Updated last year
- ☆231Dec 18, 2023Updated 2 years ago
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆176Oct 6, 2025Updated 5 months ago
- Jax implementation of VIT-VQGAN☆10Jan 25, 2024Updated 2 years ago
- The Structure and Interpretation of Deep Networks Handbook☆14Dec 14, 2024Updated last year
- The OBMO module embedded in PatchNet☆10Feb 21, 2024Updated 2 years ago
- ☆120Jun 6, 2024Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆54Mar 29, 2023Updated 2 years ago
- [ICLR'25] Official repository of paper: Ranking-aware adapter for text-driven image ordering with CLIP☆16Apr 17, 2025Updated 10 months ago
- Code repo for the paper "Semantic Correspondence via 2D-3D-2D Cycle"☆12Jan 28, 2021Updated 5 years ago
- [TNNLS] Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases☆16Jul 10, 2025Updated 7 months ago
- ☆142Jun 28, 2024Updated last year
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆247Jan 17, 2024Updated 2 years ago
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆35Jun 12, 2025Updated 8 months ago
- Implementation of [MNTDP](https://arxiv.org/abs/2012.12631)☆18Mar 9, 2022Updated 4 years ago
- Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)☆17Oct 12, 2021Updated 4 years ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆48Jun 2, 2025Updated 9 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆79Oct 31, 2024Updated last year
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆46Dec 20, 2024Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,988Nov 7, 2025Updated 4 months ago
- ☆23Feb 20, 2026Updated 2 weeks ago
- Blog of the Autonomous Vision Group at MPI-IS Tübingen and University of Tübingen.☆19Dec 22, 2023Updated 2 years ago
- [ICML 2025] Differentiable Solver Search for Fast Diffusion Sampling☆21Jul 7, 2025Updated 8 months ago
- 【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment☆870Mar 25, 2024Updated last year
- ☆306May 29, 2025Updated 9 months ago
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.☆255Feb 11, 2025Updated last year
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆86Sep 12, 2024Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆80Jun 17, 2024Updated last year
- ✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.☆23Jul 1, 2021Updated 4 years ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆418Apr 25, 2025Updated 10 months ago
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024☆111Jun 11, 2024Updated last year
- ☆23Nov 24, 2018Updated 7 years ago
- ☆1,840Jun 28, 2024Updated last year
- Official Repository for our ECCV2020 paper: Imbalanced Continual Learning with Partitioning Reservoir Sampling☆51Dec 8, 2022Updated 3 years ago
- [CoRL 2024] Official code for "Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models"☆28Dec 11, 2024Updated last year
- ☆242Jun 4, 2025Updated 9 months ago
- [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding☆513Nov 14, 2025Updated 3 months ago