mahtabbigverdi / Aurora-perceptionLinks
☆19Updated 3 months ago
Alternatives and similar repositories for Aurora-perception
Users that are interested in Aurora-perception are comparing it to the libraries listed below
Sorting:
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆35Updated 3 weeks ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆53Updated last week
- For Ego4D VQ3D Task☆20Updated last year
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆45Updated 3 weeks ago
- The official repository for paper "MLLMs Need 3D-Aware Representation Supervision for Scene Understanding"☆64Updated last month
- Official PyTorch Code of ReKV (ICLR'25)☆33Updated 4 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆43Updated last month
- ☆69Updated 2 weeks ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆67Updated last week
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆47Updated 3 weeks ago
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆36Updated last year
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…☆12Updated 8 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆19Updated last month
- ☆48Updated last month
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆79Updated 3 weeks ago
- A framework that allows you to apply Sparse AutoEncoder on any models☆29Updated this week
- ☆49Updated 2 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆60Updated 9 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Updated 9 months ago
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆33Updated 3 weeks ago
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆47Updated 6 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆33Updated 8 months ago
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆75Updated 9 months ago
- Program synthesis for 3D spatial reasoning☆42Updated last month
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆19Updated 3 months ago
- OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆45Updated 2 weeks ago
- 🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resamplin…☆39Updated last month
- A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…☆16Updated 3 months ago
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆28Updated 11 months ago