fundamentalvision/Uni-Perceiver

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/fundamentalvision/Uni-Perceiver)

fundamentalvision / Uni-Perceiver

☆291

Alternatives and similar repositories for Uni-Perceiver

Users that are interested in Uni-Perceiver are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenGVLab / STM-Evaluation
View on GitHub
☆70Jun 9, 2026Updated last month
OpenGVLab / M3I-Pretraining
View on GitHub
[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
☆91Jun 1, 2023Updated 3 years ago
amirbar / visual_prompting
View on GitHub
Official implementation and data release of the paper "Visual Prompting via Image Inpainting".
☆319Aug 7, 2023Updated 2 years ago
google-research / pix2seq
View on GitHub
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
☆945Nov 7, 2023Updated 2 years ago
OpenGVLab / InternImage
View on GitHub
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
☆2,836Mar 25, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,685Aug 1, 2024Updated last year
FocalNet / FocalNet-DINO
View on GitHub
This repo contains the code and configuration files for reproducing object detection results of FocalNets with DINO
☆68Mar 10, 2023Updated 3 years ago
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,346Oct 5, 2023Updated 2 years ago
czczup / ViT-Adapter
View on GitHub
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
☆1,503Jun 3, 2025Updated last year
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆215Feb 27, 2024Updated 2 years ago
OpenGVLab / VisionLLM
View on GitHub
VisionLLM Series
☆1,153Feb 27, 2025Updated last year
ZhangYuanhan-AI / visual_prompt_retrieval
View on GitHub
[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
☆182Mar 4, 2024Updated 2 years ago
JialianW / GRiT
View on GitHub
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
☆341Jan 8, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
hustvl / MIMDet
View on GitHub
[ICCV 2023] You Only Look at One Partial Sequence
☆343Oct 21, 2023Updated 2 years ago
google-research / vmoe
View on GitHub
☆726Jul 2, 2026Updated 3 weeks ago
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
OpenGVLab / gv-benchmark
View on GitHub
General Vision Benchmark, GV-B, a project from OpenGVLab
☆187Feb 23, 2022Updated 4 years ago
mhh0318 / UniD3
View on GitHub
☆55Feb 9, 2023Updated 3 years ago
ShoufaChen / AdaptFormer
View on GitHub
[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"
☆388Sep 16, 2022Updated 3 years ago
amazon-science / bigdetection
View on GitHub
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training
☆399Oct 23, 2024Updated last year
AILab-CVC / VL-GPT
View on GitHub
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
☆86Sep 12, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Expedit-LargeScale-Vision-Transformer / Expedit-SAM
View on GitHub
[NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…
☆87Oct 29, 2023Updated 2 years ago
gaopengcuhk / Stable-Pix2Seq
View on GitHub
A full-fledged version of Pix2Seq
☆237Nov 6, 2021Updated 4 years ago
yuweihao / MM-Vet
View on GitHub
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆331Jan 20, 2025Updated last year
jinga-lala / DAMEX
View on GitHub
Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…
☆28Mar 29, 2024Updated 2 years ago
OptimalScale / DetGPT
View on GitHub
☆786Aug 7, 2024Updated last year
CASIA-LMC-Lab / Obj2Seq
View on GitHub
Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)
☆85Nov 2, 2022Updated 3 years ago
jozhang97 / DETA
View on GitHub
Detection Transformers with Assignment
☆270Sep 16, 2023Updated 2 years ago
microsoft / UniTAB
View on GitHub
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆90Jun 12, 2023Updated 3 years ago
microsoft / Tutel
View on GitHub
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
☆1,004Jul 21, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OpenGVLab / InternVideo
View on GitHub
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
☆2,342Jul 2, 2026Updated 3 weeks ago
mshukor / eP-ALM
View on GitHub
[ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.
☆27Oct 27, 2023Updated 2 years ago
facebookresearch / SLIP
View on GitHub
Code release for SLIP Self-supervision meets Language-Image Pre-training
☆791Feb 9, 2023Updated 3 years ago
Guillem96 / data2vec-vision
View on GitHub
PyTorch implementation of Data2Vec self-supervised approach for vision use cases.
☆18Oct 7, 2022Updated 3 years ago
hustvl / PolarDETR
View on GitHub
☆79Jun 23, 2022Updated 4 years ago
SwinTransformer / Feature-Distillation
View on GitHub
☆264Nov 30, 2022Updated 3 years ago
NVlabs / M2BEV
View on GitHub
☆59Apr 18, 2022Updated 4 years ago