OFA-Sys / ONE-PEACELinks

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

☆1,061

Alternatives and similar repositories for ONE-PEACE

Users that are interested in ONE-PEACE are comparing it to the libraries listed below

Sorting:

baaivision / EVA
EVA Series: Visual Representation Fantasies from BAAI
☆2,614Updated last year
microsoft / X-Decoder
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,337Updated 2 years ago
baaivision / Emu
Emu Series: Generative Multimodal Models from BAAI
☆1,760Updated last year
shikras / shikra
☆799Updated last year
lucidrains / CoCa-pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
☆1,187Updated last year
invictus717 / MetaTransformer
Meta-Transformer for Unified Multimodal Learning
☆1,644Updated 2 years ago
facebookresearch / MetaCLIP
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,765Updated last week
OpenGVLab / VisionLLM
VisionLLM Series
☆1,130Updated 9 months ago
microsoft / GLIP
Grounded Language-Image Pre-training
☆2,550Updated last year
facebookresearch / hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
☆1,044Updated last year
open-mmlab / Multimodal-GPT
Multimodal-GPT
☆1,515Updated 2 years ago
PKU-YuanGroup / LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
☆852Updated last year
lucidrains / flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
☆1,272Updated 3 years ago
yzhuoning / Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,229Updated last year
OpenGVLab / InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
☆2,125Updated last week
baaivision / tokenize-anything
[ECCV 2024] Tokenize Anything via Prompting
☆600Updated 11 months ago
facebookresearch / ov-seg
This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.
☆742Updated 2 years ago
facebookresearch / multimodal
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
☆1,673Updated last week
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆929Updated 4 months ago
google-research / pix2seq
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
☆933Updated 2 years ago
NVlabs / ODISE
Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
☆929Updated last year
dvlab-research / LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,502Updated 9 months ago
IDEA-Research / OpenSeeD
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
☆741Updated last year
OFA-Sys / OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence L…
☆2,544Updated last year
SunzeY / AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
☆854Updated 4 months ago
microsoft / RegionCLIP
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
☆797Updated last year
jshilong / GPT4RoI
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆549Updated 6 months ago
mlfoundations / open_flamingo
An open-source framework for training large multimodal models.
☆4,049Updated last year
facebookresearch / ToMe
A method to increase the speed and lower the memory footprint of existing vision transformers.
☆1,127Updated last year
baaivision / Painter
Painter & SegGPT Series: Vision Foundation Models from BAAI
☆2,585Updated last year