ztyang23 / BACONLinks
β19Updated last year
Alternatives and similar repositories for BACON
Users that are interested in BACON are comparing it to the libraries listed below
Sorting:
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learningβ52Updated 6 months ago
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β47Updated last year
- β14Updated last year
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lanβ¦β63Updated 10 months ago
- ROOT: VLM based System for Indoor Scene Understanding and Beyondβ39Updated last year
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decodingβ46Updated 8 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networksβ35Updated last year
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"β36Updated last year
- Can 3D Vision-Language Models Truly Understand Natural Language?β20Updated last year
- β42Updated 7 months ago
- Unifying Specialized Visual Encoders for Video Language Modelsβ25Updated 2 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modelingβ41Updated last year
- Visual Spatial Tuningβ172Updated last week
- β58Updated 2 years ago
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodologyβ73Updated 2 weeks ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generationβ95Updated 11 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignmentβ64Updated 6 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Rewardβ91Updated 6 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.β61Updated last year
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024β22Updated last year
- Benchmarking Multi-Image Understanding in Vision and Language Modelsβ12Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"β20Updated last year
- [CVPR 2025] Test-Time Visual In-Context Tuningβ29Updated last month
- Open-vocabulary Semantic Segmentationβ33Updated last year
- β16Updated last year
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"β26Updated last year
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentationβ37Updated 2 years ago
- The offical implemention of JM3D.β31Updated 5 months ago
- Egocentric Video Understanding Dataset (EVUD)β33Updated last year
- Simple script to parallelize download and extract files for SA-1B Dataset.β38Updated 3 weeks ago