allenai/unified-io-inference

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/unified-io-inference)

allenai / unified-io-inference

☆231

Alternatives and similar repositories for unified-io-inference

Users that are interested in unified-io-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / unified-io-2
View on GitHub
☆650Feb 15, 2024Updated 2 years ago
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,346Oct 5, 2023Updated 2 years ago
allenai / mmc4
View on GitHub
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
☆953Mar 19, 2025Updated last year
JialianW / GRiT
View on GitHub
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
☆341Jan 8, 2024Updated 2 years ago
TencentARC / GVT
View on GitHub
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆59Jun 27, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Buzz-Beater / EgoTaskQA
View on GitHub
Code for NeurIPS 2022 Datasets and Benchmarks paper - EgoTaskQA: Understanding Human Tasks in Egocentric Videos.
☆44Apr 17, 2023Updated 3 years ago
OFA-Sys / OFA
View on GitHub
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence L…
☆2,557Apr 24, 2024Updated 2 years ago
allenai / unified-io-2.pytorch
View on GitHub
☆78Jul 3, 2024Updated 2 years ago
microsoft / UniTAB
View on GitHub
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆90Jun 12, 2023Updated 3 years ago
google-research / pix2seq
View on GitHub
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
☆945Nov 7, 2023Updated 2 years ago
mlfoundations / open_flamingo
View on GitHub
An open-source framework for training large multimodal models.
☆4,113Aug 31, 2024Updated last year
jiasenlu / LL3M
View on GitHub
LL3M: Large Language and Multi-Modal Model in Jax
☆74Apr 23, 2024Updated 2 years ago
ylsung / VL_adapter
View on GitHub
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆212Dec 18, 2022Updated 3 years ago
AILab-CVC / SEED
View on GitHub
Official implementation of SEED-LLaMA (ICLR 2024).
☆642Sep 21, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
allenai / gpv-1
View on GitHub
A task-agnostic vision-language architecture as a step towards General Purpose Vision
☆92Jul 14, 2021Updated 5 years ago
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
Victorwz / VaLM
View on GitHub
VaLM: Visually-augmented Language Modeling. ICLR 2023.
☆56Mar 6, 2023Updated 3 years ago
Vision-CAIR / ChatCaptioner
View on GitHub
Official Repository of ChatCaptioner
☆468Apr 13, 2023Updated 3 years ago
open-mmlab / Multimodal-GPT
View on GitHub
Multimodal-GPT
☆1,512Jun 4, 2023Updated 3 years ago
amirbar / visual_prompting
View on GitHub
Official implementation and data release of the paper "Visual Prompting via Image Inpainting".
☆319Aug 7, 2023Updated 2 years ago
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,604Jan 24, 2024Updated 2 years ago
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,251Jun 2, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
facebookresearch / long_seq_mae
View on GitHub
code release of research paper "Exploring Long-Sequence Masked Autoencoders"
☆100Oct 14, 2022Updated 3 years ago
fundamentalvision / Uni-Perceiver
View on GitHub
☆291Aug 14, 2025Updated 11 months ago
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
mbanani / lgssl
View on GitHub
[CVPR 2023] Learning Visual Representations via Language-Guided Sampling
☆150Apr 13, 2023Updated 3 years ago
RunpeiDong / DreamLLM
View on GitHub
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆462Dec 2, 2024Updated last year
facebookresearch / SWAG
View on GitHub
Official repository for "Revisiting Weakly Supervised Pre-Training of Visual Perception Models". https://arxiv.org/abs/2201.08371.
☆184Apr 17, 2022Updated 4 years ago
shizhediao / DaVinci
View on GitHub
Source code for the paper "Prefix Language Models are Unified Modal Learners"
☆45Apr 30, 2023Updated 3 years ago
mlfoundations / datacomp
View on GitHub
DataComp: In search of the next generation of multimodal datasets
☆786Apr 28, 2025Updated last year
baaivision / Painter
View on GitHub
Painter & SegGPT Series: Vision Foundation Models from BAAI
☆2,593Dec 6, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
allenai / visprog
View on GitHub
Official code for VisProg (CVPR 2023 Best Paper!)
☆774Aug 26, 2024Updated last year
amazon-science / prompt-pretraining
View on GitHub
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
☆259May 3, 2024Updated 2 years ago
SwinTransformer / AiT
View on GitHub
☆111Jun 30, 2023Updated 3 years ago
microsoft / GenerativeImage2Text
View on GitHub
GIT: A Generative Image-to-text Transformer for Vision and Language
☆582Dec 2, 2023Updated 2 years ago
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,845Nov 27, 2025Updated 7 months ago
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆215Feb 27, 2024Updated 2 years ago
e-bug / fine-grained-evals
View on GitHub
[ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"
☆13Jun 11, 2023Updated 3 years ago