ztyang23 / BACONLinks
☆18Updated last year
Alternatives and similar repositories for BACON
Users that are interested in BACON are comparing it to the libraries listed below
Sorting:
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆51Updated 3 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Updated last year
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆44Updated last year
- ☆58Updated 2 years ago
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆36Updated last year
- ☆13Updated last year
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"☆26Updated last year
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆62Updated 7 months ago
- The offical implemention of JM3D.☆30Updated 2 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆96Updated last week
- Open-vocabulary Semantic Segmentation☆33Updated last year
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆45Updated 5 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆39Updated 8 months ago
- [CVPR 2025] Test-Time Visual In-Context Tuning☆25Updated 7 months ago
- ☆40Updated 4 months ago
- ROOT: VLM based System for Indoor Scene Understanding and Beyond☆37Updated 9 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆88Updated last week
- Unifying Specialized Visual Encoders for Video Language Models☆22Updated 3 months ago
- WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆35Updated 4 months ago
- Official implementation of the WACV 2024 paper CLIP-DIY☆33Updated last year
- [CVPR'2025] EntitySAM: Segment Everything in Video☆51Updated 3 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆98Updated last year
- ☆15Updated 11 months ago
- (ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentation☆37Updated 2 years ago
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆22Updated 11 months ago
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference☆95Updated 7 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆60Updated 3 months ago
- An official repo for WACV 2025 paper "LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spa…☆24Updated 9 months ago
- ☆26Updated 6 months ago