mahtabbigverdi / AuroraLinks
☆12Updated last year
Alternatives and similar repositories for Aurora
Users that are interested in Aurora are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Updated 6 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆39Updated last year
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples☆39Updated last year
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆38Updated 2 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Updated last year
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"☆72Updated 2 months ago
- Official code repository of Shuffle-R1☆25Updated 4 months ago
- ☆42Updated 7 months ago
- ☆41Updated 6 months ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆76Updated last month
- ☆27Updated 9 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆65Updated 6 months ago
- Awesome paper for multi-modal llm with grounding ability☆19Updated 3 months ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆44Updated 2 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆33Updated last month
- [IJCV 2024]☆19Updated last year
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps☆71Updated 2 months ago
- LEO: A powerful Hybrid Multimodal LLM☆19Updated 11 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆46Updated last year
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆23Updated last month
- ☆57Updated 7 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆33Updated last year
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆28Updated last year
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆62Updated last year
- The official implementation of "PixelThink: Towards Efficient Chain-of-Pixel Reasoning" (arXiv 2025)☆39Updated 7 months ago
- [NeurIPS 2025] Official repository for “FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models”☆27Updated last month
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆46Updated last year
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"☆53Updated 6 months ago
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs☆56Updated 6 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆33Updated 6 months ago