mahtabbigverdi / AuroraLinks
β12Updated 7 months ago
Alternatives and similar repositories for Aurora
Users that are interested in Aurora are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)β15Updated 2 weeks ago
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β42Updated last year
- officical code for ECCV 2024 paper "Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection"β14Updated last year
- β38Updated last month
- β22Updated 3 months ago
- β45Updated 2 months ago
- Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"β17Updated 3 months ago
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"β41Updated last week
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentaβ¦β45Updated 3 weeks ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examplesβ29Updated 7 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?β21Updated last year
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.β44Updated 6 months ago
- β33Updated last week
- LEO: A powerful Hybrid Multimodal LLMβ18Updated 6 months ago
- Official repository for βFlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Modelsββ20Updated 2 weeks ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervisionβ41Updated 3 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaborationβ24Updated 9 months ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPOβ63Updated last month
- β12Updated 5 months ago
- Awesome paper for multi-modal llm with grounding abilityβ18Updated 11 months ago
- [IJCV 2024]β16Updated 8 months ago
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Mapsβ61Updated 2 months ago
- The official implementation of "PixelThink: Towards Efficient Chain-of-Pixel Reasoning" (arXiv 2025)β35Updated last month
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".β57Updated last year
- π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ25Updated last month
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ32Updated last month
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasksβ14Updated last month
- (CVPR 2024) ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuningβ46Updated 7 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ77Updated 8 months ago
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)β20Updated 5 months ago