NExT-ChatV/NExT-Chat

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NExT-ChatV/NExT-Chat)

NExT-ChatV / NExT-Chat

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

☆253

Alternatives and similar repositories for NExT-Chat

Users that are interested in NExT-Chat are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zamling / PSALM
View on GitHub
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
☆269Dec 30, 2024Updated last year
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆963Aug 5, 2025Updated 11 months ago
lzw-lzw / GroundingGPT
View on GitHub
[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model
☆342Nov 4, 2024Updated last year
Meituan-AutoML / Lenna
View on GitHub
☆87Feb 5, 2024Updated 2 years ago
MaverickRen / PixelLM
View on GitHub
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆273Feb 11, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
JIA-Lab-research / LISA
View on GitHub
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,665Feb 16, 2025Updated last year
FoundationVision / Groma
View on GitHub
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
☆585Jun 7, 2024Updated 2 years ago
SkyworkAI / Vitron
View on GitHub
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
☆576Oct 20, 2024Updated last year
CircleRadon / Osprey
View on GitHub
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
☆843Aug 19, 2025Updated 11 months ago
pipilurj / perceptionGPT
View on GitHub
☆18Aug 7, 2024Updated last year
LLaVA-VL / LLaVA-Plus-Codebase
View on GitHub
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
☆769Feb 1, 2024Updated 2 years ago
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆508Aug 9, 2024Updated last year
PKU-YuanGroup / MoE-LLaVA
View on GitHub
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,322Jul 15, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lxtGH / OMG-Seg
View on GitHub
Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
☆1,350Oct 15, 2025Updated 9 months ago
SunzeY / AlphaCLIP
View on GitHub
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
☆876Jul 20, 2025Updated last year
OpenGVLab / LAMM
View on GitHub
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆317Apr 16, 2024Updated 2 years ago
IDEA-Research / ChatRex
View on GitHub
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆216Oct 15, 2025Updated 9 months ago
VPGTrans / VPGTrans
View on GitHub
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
☆269Oct 13, 2023Updated 2 years ago
lizhou-cs / mglmm
View on GitHub
☆32Jun 14, 2026Updated last month
OpenGVLab / VisionLLM
View on GitHub
VisionLLM Series
☆1,152Feb 27, 2025Updated last year
LeapLabTHU / GSVA
View on GitHub
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
☆166Sep 12, 2024Updated last year
RUCAIBox / ComVint
View on GitHub
The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…
☆19Nov 10, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
UCSB-AI / MiniGPT-5
View on GitHub
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
☆866May 8, 2025Updated last year
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year
Qinying-Liu / TagAlign
View on GitHub
Official implementation of TagAlign
☆37Dec 11, 2024Updated last year
congvvc / LaSagnA
View on GitHub
Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".
☆63Apr 29, 2024Updated 2 years ago
Meituan-AutoML / MobileVLM
View on GitHub
Strong and Open Vision Language Assistant for Mobile Devices
☆1,365Apr 15, 2024Updated 2 years ago
IDEA-Research / OpenSeeD
View on GitHub
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
☆762Jan 22, 2024Updated 2 years ago
BAAI-DCAI / DataOptim
View on GitHub
A collection of visual instruction tuning datasets.
☆77Mar 14, 2024Updated 2 years ago
BAAI-DCAI / Bunny
View on GitHub
A family of lightweight multimodal models.
☆1,052Nov 18, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
QwenLM / Qwen-VL
View on GitHub
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
☆6,712Aug 7, 2024Updated last year
JIA-Lab-research / LLaMA-VID
View on GitHub
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆861Jul 29, 2024Updated last year
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year
jefferyZhan / Griffon
View on GitHub
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1(CVPR 2026).
☆250Apr 17, 2026Updated 3 months ago
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
OptimalScale / DetGPT
View on GitHub
☆786Aug 7, 2024Updated last year
UX-Decoder / DINOv
View on GitHub
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
☆542Apr 8, 2024Updated 2 years ago