kaiyuyue/nxtp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kaiyuyue/nxtp)

kaiyuyue / nxtp

PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]

☆182

Alternatives and similar repositories for nxtp

Users that are interested in nxtp are comparing it to the libraries listed below

Sorting:

giangdip2410 / HyperRouter
View on GitHub
Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"
☆33Nov 29, 2023Updated 2 years ago
EternityYW / Gemini-Commonsense-Evaluation
View on GitHub
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
☆37Jan 3, 2024Updated 2 years ago
aitor-martinez-seras / SNN-Automotive-Object-Detection
View on GitHub
Code of the paper "Efficient Object Detection in Autonomous Driving using Spiking Neural Networks: Performance, Energy Consumption Analys…
☆27Dec 13, 2023Updated 2 years ago
Event-AHU / SAFE_LargeVLM
View on GitHub
[Pattern Recognition 2024] Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models, Dong Li, Jiandon…
☆18Jan 18, 2025Updated last year
fudan-zvg / WoVoGen
View on GitHub
[ECCV 2024] WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
☆112Feb 6, 2025Updated last year
Tony-Lowe / RotationDrag
View on GitHub
☆35Jan 23, 2024Updated 2 years ago
kaixinbear / CAPE
View on GitHub
(CVPR2023) CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
☆110May 5, 2023Updated 2 years ago
ziqipang / LM4VisualEncoding
View on GitHub
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
☆247Jan 17, 2024Updated 2 years ago
bytedance / OmniScient-Model
View on GitHub
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
☆99Jul 15, 2024Updated last year
Evocargo / Lidar-Annotation-is-All-You-Need
View on GitHub
2D road segmentation using lidar data during training
☆43Dec 21, 2023Updated 2 years ago
htqin / GoogleBard-VisUnderstand
View on GitHub
How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges
☆30Sep 24, 2023Updated 2 years ago
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆945Aug 5, 2025Updated 7 months ago
PootieT / explain-then-translate
View on GitHub
Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…
☆29Dec 5, 2023Updated 2 years ago
beichenzbc / Long-CLIP
View on GitHub
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
☆892Aug 13, 2024Updated last year
techmonsterwang / iLLaMA
View on GitHub
Adapting LLaMA Decoder to Vision Transformer
☆30May 20, 2024Updated last year
NVlabs / STL
View on GitHub
Official Pytorch Implementation of Self-emerging Token Labeling
☆35Mar 27, 2024Updated last year
jianzongwu / betrayed-by-captions
View on GitHub
(ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
☆48Jul 18, 2024Updated last year
liuxy1103 / GRDBIS
View on GitHub
☆21Nov 9, 2025Updated 4 months ago
shikras / d-cube
View on GitHub
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…
☆137Mar 20, 2024Updated last year
TencentARC / ViSFT
View on GitHub
☆37Jan 20, 2024Updated 2 years ago
AAAI-DISIM-UnivAQ / DALI
View on GitHub
DALI Multi Agent System Framework
☆42Jan 30, 2026Updated last month
ggjy / vision_weak_to_strong
View on GitHub
☆38Feb 8, 2024Updated 2 years ago
baaivision / tokenize-anything
View on GitHub
[ECCV 2024] Tokenize Anything via Prompting
☆602Dec 11, 2024Updated last year
rosewang2008 / backtracing
View on GitHub
Backtracing: Retrieving the Cause of the Query, EACL 2024 Long Paper, Findings.
☆92Jul 21, 2024Updated last year
YuchenLiu98 / COMM
View on GitHub
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
☆207Jan 8, 2025Updated last year
lxa9867 / ImageFolder
View on GitHub
High-performance Image Tokenizers for VAR and AR
☆303Apr 25, 2025Updated 10 months ago
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,815Nov 27, 2025Updated 3 months ago
UX-Decoder / FIND
View on GitHub
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆131Aug 21, 2024Updated last year
janghyuncho / DECOLA
View on GitHub
Code release for "Language-conditioned Detection Transformer"
☆88Jun 17, 2024Updated last year
sunsmarterjie / ChatterBox
View on GitHub
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆61May 2, 2025Updated 10 months ago
xmoanvaf / llava-phi
View on GitHub
☆401Dec 12, 2024Updated last year
AFeng-x / Draw-and-Understand
View on GitHub
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆95Dec 1, 2025Updated 3 months ago
lunyiliu / CoachLM
View on GitHub
Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.
☆60Mar 20, 2024Updated last year
wyndwarrior / autoregressive-bbox
View on GitHub
☆17Oct 18, 2022Updated 3 years ago
allenai / unified-io-2
View on GitHub
☆643Feb 15, 2024Updated 2 years ago
shenyunhang / APE
View on GitHub
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
☆606May 8, 2024Updated last year
luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆246Aug 14, 2024Updated last year
facebookresearch / perception_models
View on GitHub
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
☆2,181Feb 11, 2026Updated 3 weeks ago
apple / ml-aim
View on GitHub
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,403Aug 4, 2025Updated 7 months ago