sunsmarterjie / ChatterBox
ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆49Updated 4 months ago
Related projects: ⓘ
- ☆100Updated last month
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 6 months ago
- Dense Connector for MLLMs☆98Updated last month
- MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆116Updated 2 weeks ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated 11 months ago
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated last month
- 【ECCV2024】The official repo of Griffon series☆93Updated 2 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆96Updated 4 months ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆43Updated 4 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆85Updated 2 weeks ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆57Updated 3 months ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)☆41Updated 3 months ago
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning☆93Updated 2 months ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆45Updated 2 weeks ago
- The official implementation of RAR☆61Updated 5 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆52Updated 5 months ago
- A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…☆104Updated 6 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆42Updated 3 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 5 months ago
- ☆83Updated 9 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆44Updated 3 weeks ago
- Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆103Updated last month
- Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"☆76Updated 4 months ago
- ☆17Updated 5 months ago
- Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…☆47Updated 2 months ago
- This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆138Updated 5 months ago
- ☆80Updated 4 months ago
- Evaluation code for Ref-L4, a new REC benchmark in the LMM era☆13Updated 2 months ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆36Updated last year