allenai/unified-io-2.pytorch

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/unified-io-2.pytorch)

allenai / unified-io-2.pytorch

☆78

Alternatives and similar repositories for unified-io-2.pytorch

Users that are interested in unified-io-2.pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / unified-io-2
View on GitHub
☆650Feb 15, 2024Updated 2 years ago
RunpeiDong / DreamLLM
View on GitHub
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆462Dec 2, 2024Updated last year
allenai / unified-io-inference
View on GitHub
☆231Dec 18, 2023Updated 2 years ago
jiasenlu / vit-vqgan-jax
View on GitHub
Jax implementation of VIT-VQGAN
☆10Jan 25, 2024Updated 2 years ago
ggjy / DeLVM
View on GitHub
☆120Jun 6, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zh460045050 / VQGAN-LC
View on GitHub
☆145Jun 28, 2024Updated 2 years ago
lavinal712 / Awesome-Visual-Tokenizers
View on GitHub
📖 This is a repository for organizing papers, codes and other resources related to visual tokenizers.
☆17Jul 7, 2026Updated last week
uynaes / RankingAwareCLIP
View on GitHub
[ICLR'25] Official repository of paper: Ranking-aware adapter for text-driven image ordering with CLIP
☆16Apr 17, 2025Updated last year
UW-Madison-Lee-Lab / CoBSAT
View on GitHub
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆48Jun 2, 2025Updated last year
CurryYuan / PhraseRefer
View on GitHub
[TNNLS] Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases
☆17Jul 10, 2025Updated last year
TomVeniat / MNTDP
View on GitHub
Implementation of [MNTDP](https://arxiv.org/abs/2012.12631)
☆18Mar 9, 2022Updated 4 years ago
HS-YN / PanoAVQA
View on GitHub
Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)
☆16Oct 12, 2021Updated 4 years ago
jihaonew / MM-Instruct
View on GitHub
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Jul 1, 2024Updated 2 years ago
mrsempress / OBMO_patchnet
View on GitHub
The OBMO module embedded in PatchNet
☆10Feb 21, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SkyworkAI / Vitron
View on GitHub
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
☆576Oct 20, 2024Updated last year
inclusionAI / M2-Reasoning
View on GitHub
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
☆47Jul 17, 2025Updated last year
Ghy0501 / FCIT
View on GitHub
[ICCV 2025] Official Implementation of Federated Continual Instruction Tuning
☆17Aug 10, 2025Updated 11 months ago
mit-han-lab / vila-u
View on GitHub
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆425Apr 25, 2025Updated last year
Hhhhhhao / continuous_tokenizer
View on GitHub
☆321May 29, 2025Updated last year
PKU-YuanGroup / LanguageBind
View on GitHub
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
☆883Mar 25, 2024Updated 2 years ago
allenai / embodied-clip
View on GitHub
Official codebase for EmbCLIP
☆130Jun 16, 2023Updated 3 years ago
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year
MaverickRen / PixelLM
View on GitHub
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆273Feb 11, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
DAMO-NLP-SG / DiGIT
View on GitHub
[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
☆78Oct 31, 2024Updated last year
ziqipang / LM4VisualEncoding
View on GitHub
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
☆244Jun 29, 2026Updated 3 weeks ago
Leon1207 / 3DRefTR
View on GitHub
This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"
☆26Aug 24, 2023Updated 2 years ago
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
davidbau / sidn-handbook
View on GitHub
The Structure and Interpretation of Deep Networks Handbook
☆14Dec 14, 2024Updated last year
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
ytongbai / LVM
View on GitHub
☆1,835Jun 28, 2024Updated 2 years ago
jessemelpolio / AnytimeCL
View on GitHub
[ECCV'24 Oral] Anytime Continual Learning for Open Vocabulary Classification
☆24Oct 17, 2024Updated last year
lucidrains / magvit2-pytorch
View on GitHub
Implementation of MagViT2 Tokenizer in Pytorch
☆668Jan 12, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Lil-Shake / VA-Pi
View on GitHub
[CVPR 2026] This repository is the code of our paper "VA-Pi: Variational Policy Alignment for Pixel-Aware Autoregressive Generation"
☆15Mar 3, 2026Updated 4 months ago
liyunsheng13 / dpp
View on GitHub
☆32Jul 23, 2022Updated 3 years ago
LAION-AI / conditioned-prior
View on GitHub
(wip) Use LAION-AI's CLIP "conditoned prior" to generate CLIP image embeds from CLIP text embeds.
☆29Jul 14, 2022Updated 4 years ago
zycheiheihei / Transferable-Visual-Prompting
View on GitHub
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…
☆45Dec 20, 2024Updated last year
hello-robot / stretch_web_interface
View on GitHub
Prototype web interface that enables remote teleoperation of the Stretch RE1 mobile manipulator from Hello Robot Inc.
☆12Dec 14, 2023Updated 2 years ago
FoundationVision / UniTok
View on GitHub
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
☆529Nov 14, 2025Updated 8 months ago
MengLcool / DeepStack-VL
View on GitHub
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆93Jun 17, 2024Updated 2 years ago