sunsmarterjie/ChatterBox

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sunsmarterjie/ChatterBox)

sunsmarterjie / ChatterBox

[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues

☆61

Alternatives and similar repositories for ChatterBox

Users that are interested in ChatterBox are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mingrui-wu / OSI-Bench
View on GitHub
Official repo of From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
☆24Jun 23, 2026Updated last month
qiujihao19 / Artemis
View on GitHub
[NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos
☆27Apr 8, 2025Updated last year
qiujihao19 / LongVideo-R1
View on GitHub
[CVPR 2026] LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
☆50Jul 7, 2026Updated 2 weeks ago
sunsmarterjie / iTPN
View on GitHub
(CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modeling
☆216Jul 28, 2024Updated last year
sunsmarterjie / SDL-Skeleton
View on GitHub
A toolbox for object skeleton detection, can also be used for edge detection, building extraction and road extraction. TIP (2021)
☆138Feb 9, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
sunsmarterjie / DAAS
View on GitHub
'Discretization-Aware Architecture Search' alleviates the discretization gap in one-shot differentiable NAS. DAAS has been accepted by PR…
☆20Jul 30, 2021Updated 4 years ago
callsys / ControlCap
View on GitHub
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆81Oct 25, 2024Updated last year
callsys / DynRefer
View on GitHub
[CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
☆59Mar 4, 2025Updated last year
callsys / GenPromp
View on GitHub
[ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization
☆57Nov 10, 2023Updated 2 years ago
sunsmarterjie / beyond_masking
View on GitHub
Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers
☆26Apr 12, 2022Updated 4 years ago
sunsmarterjie / SaGe
View on GitHub
(SaGe) Semantic-Aware Generation for Self-Supervised Visual Representation Learning
☆26Mar 29, 2022Updated 4 years ago
MzeroMiko / XDLM
View on GitHub
[ICML 2026 Spotlight] Code for miXed Discrete Diffusion Language Model
☆27Mar 16, 2026Updated 4 months ago
AZZMM / CC-Diff
View on GitHub
Implementation of paper "CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis"
☆28Dec 19, 2025Updated 7 months ago
martian422 / MaskGRPO
View on GitHub
The official implementation of MaskGRPO: Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models. (ICLR 2026, arxiv…
☆19Jan 27, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
jinhaoduan / GTBench
View on GitHub
[NeurIPS 2024] GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
☆70Sep 6, 2024Updated last year
Shengcao-Cao / groundLMM
View on GitHub
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆47Oct 19, 2025Updated 9 months ago
architsharma97 / dpo-rlaif
View on GitHub
☆100Jun 27, 2024Updated 2 years ago
XiaokunFeng / MemVLT
View on GitHub
[NeurIPS'24] MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts
☆19Oct 7, 2024Updated last year
giangdip2410 / HyperRouter
View on GitHub
Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"
☆33Nov 29, 2023Updated 2 years ago
MaverickRen / PixelLM
View on GitHub
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆273Feb 11, 2025Updated last year
PootieT / explain-then-translate
View on GitHub
Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…
☆29Dec 5, 2023Updated 2 years ago
AAAI-DISIM-UnivAQ / DALI
View on GitHub
DALI Multi Agent System Framework
☆43Mar 24, 2026Updated 4 months ago
manuelladron / semantic_based_painting
View on GitHub
☆43Sep 10, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
AkitsukiM / VMamba-DOTA
View on GitHub
☆31Sep 24, 2024Updated last year
huuuuusy / videocube-toolkit
View on GitHub
The official python toolkit for running experiments and evaluate performance on VideoCube benchmark @TPAMI2023
☆31Apr 1, 2024Updated 2 years ago
WeihuangLin / INF-LLaVA
View on GitHub
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆42Aug 4, 2024Updated last year
ncTimTang / AKS
View on GitHub
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆228Dec 19, 2025Updated 7 months ago
CASIA-IVA-Lab / MRES
View on GitHub
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…
☆74Jun 3, 2024Updated 2 years ago
yliu-cs / PiTe
View on GitHub
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Feb 13, 2025Updated last year
V3Det / Detectron2-V3Det
View on GitHub
Detectron2 Toolbox and Benchmark for V3Det
☆18Jun 2, 2024Updated 2 years ago
callsys / GMPO
View on GitHub
[ICLR 2026] Geometric-Mean Policy Optimization
☆104Jan 26, 2026Updated 6 months ago
AtsuMiyai / UPD
View on GitHub
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆82Mar 6, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
PVIT-official / PVIT
View on GitHub
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆37Sep 19, 2023Updated 2 years ago
Hansxsourse / VRMDiff
View on GitHub
☆11Mar 11, 2025Updated last year
BowieHsu / EfficientTeacher
View on GitHub
☆18Feb 28, 2023Updated 3 years ago
BriansIDP / AudioVisualLLM
View on GitHub
☆19May 19, 2024Updated 2 years ago
lambert-x / ProLab
View on GitHub
Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…
☆55Aug 27, 2025Updated 10 months ago
yuhangzang / ContextDET
View on GitHub
Contextual Object Detection with Multimodal Large Language Models
☆261Oct 14, 2024Updated last year
YWenxi / think-with-images-through-self-calling
View on GitHub
official repo for `thinking with images through-self-calling`
☆26Dec 28, 2025Updated 6 months ago