ztyang23 / BACONLinks
☆17Updated 11 months ago
Alternatives and similar repositories for BACON
Users that are interested in BACON are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] Test-Time Visual In-Context Tuning☆23Updated 3 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆21Updated 7 months ago
- ☆58Updated last year
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆36Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆29Updated last week
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆42Updated last year
- ☆12Updated 9 months ago
- Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"☆12Updated last month
- Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆32Updated 2 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Updated last year
- The offical implemention of JM3D.☆30Updated 2 months ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆45Updated last month
- ☆25Updated 3 months ago
- Unifying Specialized Visual Encoders for Video Language Models☆21Updated 3 weeks ago
- ROOT: VLM based System for Indoor Scene Understanding and Beyond☆29Updated 5 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆85Updated 3 weeks ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆37Updated 5 months ago
- Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"☆68Updated 2 weeks ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆60Updated 3 months ago
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆24Updated 9 months ago
- ☆37Updated last month
- MIMIC: Masked Image Modeling with Image Correspondences☆16Updated last year
- ☆18Updated last year
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆98Updated last year
- Official implementation of LaVin-DiT☆35Updated 5 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆76Updated 4 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆41Updated 7 months ago
- ☆13Updated 7 months ago
- [NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentation☆86Updated last year