google-research/pix2seq

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research/pix2seq)

google-research / pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

☆945

Alternatives and similar repositories for pix2seq

Users that are interested in pix2seq are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gaopengcuhk / Stable-Pix2Seq
View on GitHub
A full-fledged version of Pix2Seq
☆237Nov 6, 2021Updated 4 years ago
gaopengcuhk / Pretrained-Pix2Seq
View on GitHub
Replication of Pix2Seq with Pretrained Model
☆58Nov 6, 2021Updated 4 years ago
moein-shariatnia / Pix2Seq
View on GitHub
Simple Implementation of Pix2Seq model for object detection in PyTorch
☆131Sep 2, 2023Updated 2 years ago
gaopengcuhk / Unofficial-Pix2Seq
View on GitHub
Unofficial implementation of Pix2SEQ
☆162Oct 5, 2021Updated 4 years ago
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
facebookresearch / Detic
View on GitHub
Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
☆2,007Mar 21, 2024Updated 2 years ago
OFA-Sys / OFA
View on GitHub
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence L…
☆2,557Apr 24, 2024Updated 2 years ago
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,346Oct 5, 2023Updated 2 years ago
hustvl / MIMDet
View on GitHub
[ICCV 2023] You Only Look at One Partial Sequence
☆343Oct 21, 2023Updated 2 years ago
fundamentalvision / Uni-Perceiver
View on GitHub
☆291Aug 14, 2025Updated 11 months ago
CASIA-LMC-Lab / Obj2Seq
View on GitHub
Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)
☆85Nov 2, 2022Updated 3 years ago
ShoufaChen / DiffusionDet
View on GitHub
[ICCV2023 Best Paper Finalist] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)
☆2,257Dec 22, 2022Updated 3 years ago
seanzhuh / SeqTR
View on GitHub
SeqTR: A Simple yet Universal Network for Visual Grounding
☆144Oct 30, 2024Updated last year
NVlabs / GroupViT
View on GitHub
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
☆788May 10, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
OpenGVLab / VisionLLM
View on GitHub
VisionLLM Series
☆1,152Feb 27, 2025Updated last year
Sharpiless / Pix2seq-mmdetection
View on GitHub
Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection
☆34Apr 18, 2022Updated 4 years ago
raoyongming / DenseCLIP
View on GitHub
[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
☆550Sep 15, 2023Updated 2 years ago
IDEA-Research / DN-DETR
View on GitHub
[CVPR 2022 Oral] Official implementation of DN-DETR
☆605Dec 20, 2023Updated 2 years ago
JialianW / GRiT
View on GitHub
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
☆341Jan 8, 2024Updated 2 years ago
MCG-NJU / AdaMixer
View on GitHub
[CVPR 2022 Oral] AdaMixer: A Fast-Converging Query-Based Object Detector
☆237Aug 17, 2022Updated 3 years ago
facebookresearch / SLIP
View on GitHub
Code release for SLIP Self-supervision meets Language-Image Pre-training
☆792Feb 9, 2023Updated 3 years ago
facebookresearch / mae
View on GitHub
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
☆8,366Jul 23, 2024Updated last year
IDEA-Research / awesome-detection-transformer
View on GitHub
Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
☆1,399Jul 4, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
IDEA-Research / detrex
View on GitHub
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
☆2,303Sep 11, 2025Updated 10 months ago
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,255Jun 2, 2026Updated last month
czczup / ViT-Adapter
View on GitHub
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
☆1,503Jun 3, 2025Updated last year
microsoft / RegionCLIP
View on GitHub
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
☆817Mar 20, 2024Updated 2 years ago
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,684Aug 1, 2024Updated last year
fundamentalvision / Deformable-DETR
View on GitHub
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
☆4,001May 16, 2024Updated 2 years ago
facebookresearch / ToMe
View on GitHub
A method to increase the speed and lower the memory footprint of existing vision transformers.
☆1,208Jun 17, 2024Updated 2 years ago
amirbar / visual_prompting
View on GitHub
Official implementation and data release of the paper "Visual Prompting via Image Inpainting".
☆319Aug 7, 2023Updated 2 years ago
baaivision / Painter
View on GitHub
Painter & SegGPT Series: Vision Foundation Models from BAAI
☆2,593Dec 6, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / DiT
View on GitHub
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
☆8,687May 31, 2024Updated 2 years ago
jshilong / DDQ
View on GitHub
(CVPR2023)Dense Distinct Query for End-to-End Object Detection
☆266May 24, 2023Updated 3 years ago
EPFL-VILAB / MultiMAE
View on GitHub
MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022
☆632Dec 13, 2022Updated 3 years ago
facebookresearch / detr
View on GitHub
End-to-End Object Detection with Transformers
☆15,351Mar 12, 2024Updated 2 years ago
microsoft / UniTAB
View on GitHub
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆90Jun 12, 2023Updated 3 years ago
IDEA-Research / DINO
View on GitHub
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
☆2,825Jul 31, 2024Updated last year
facebookresearch / MaskFormer
View on GitHub
Per-Pixel Classification is Not All You Need for Semantic Segmentation (NeurIPS 2021, spotlight)
☆1,462Mar 11, 2022Updated 4 years ago