ztyang23 / BACONLinks
☆18Updated last year
Alternatives and similar repositories for BACON
Users that are interested in BACON are comparing it to the libraries listed below
Sorting:
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆46Updated 7 months ago
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆22Updated last year
- ☆58Updated 2 years ago
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆36Updated last year
- ☆14Updated last year
- Visual Spatial Tuning☆169Updated 3 weeks ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆63Updated 10 months ago
- Unifying Specialized Visual Encoders for Video Language Models☆25Updated 2 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆47Updated last year
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- [CVPR 2025] Test-Time Visual In-Context Tuning☆27Updated 3 weeks ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Updated last year
- ☆66Updated 2 months ago
- ☆41Updated 6 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Updated last year
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Updated 11 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Updated last year
- Scaling Spatial Intelligence with Multimodal Foundation Models☆159Updated 2 weeks ago
- ☆112Updated last week
- PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025☆14Updated 2 months ago
- ☆13Updated 8 months ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆46Updated 6 months ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Updated 2 years ago
- ☆24Updated 7 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆31Updated last year
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆78Updated 2 years ago
- An official repo for WACV 2025 paper "LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spa…☆26Updated last year
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 10 months ago
- The offical implemention of JM3D.☆31Updated 5 months ago