[NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆40Oct 14, 2025Updated 4 months ago
Alternatives and similar repositories for SSR
Users that are interested in SSR are comparing it to the libraries listed below
Sorting:
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆51Feb 23, 2026Updated last week
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Jun 7, 2024Updated last year
- ☆25Aug 19, 2025Updated 6 months ago
- The official implementation of Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion [AAAI'2…☆15Feb 2, 2026Updated last month
- Score and Distribution Matching Policy: Advanced accelerated Visuomotor Policies via matched distillation☆10May 9, 2025Updated 9 months ago
- ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025☆13Aug 25, 2025Updated 6 months ago
- This is an implementation of the paper "Are We Done with Object-Centric Learning?"☆12Sep 11, 2025Updated 5 months ago
- ☆13Jan 22, 2025Updated last year
- OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulation☆346Aug 27, 2025Updated 6 months ago
- ☆16May 13, 2025Updated 9 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Jul 17, 2024Updated last year
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- Findings of EMNLP 2023: InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspe…☆14Aug 13, 2024Updated last year
- ☆23May 8, 2025Updated 9 months ago
- [NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models☆64Nov 27, 2025Updated 3 months ago
- ☆20Apr 16, 2025Updated 10 months ago
- [ICCV 2023 & IJCV 2026] PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection☆22Aug 12, 2024Updated last year
- official implementation of "CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusi…☆18Sep 5, 2024Updated last year
- ☆19Sep 10, 2025Updated 5 months ago
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Mar 31, 2025Updated 11 months ago
- [NeurIPS 2025] Official repository for “FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models”☆28Dec 9, 2025Updated 2 months ago
- Official Code for "Learning to Reason via Mixture-of-Thought for Logical Reasoning"☆26Nov 20, 2025Updated 3 months ago
- [WAVC'24 Workshop] Human-Centric Autonomous Systems With LLMs for User Command Reasoning☆17Jul 10, 2024Updated last year
- codes for Efficient Test-Time Scaling via Self-Calibration☆19Sep 13, 2025Updated 5 months ago
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆23Oct 22, 2025Updated 4 months ago
- Simulator designed to generate diverse driving scenarios.☆44Feb 27, 2025Updated last year
- ☆19Mar 10, 2025Updated 11 months ago
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodiment☆23Jan 9, 2025Updated last year
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- [CVPR 2025] Offical implementation of the paper "Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters The…☆31Feb 27, 2025Updated last year
- [ICCV2025] Official code repository of "CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction"☆59Aug 10, 2025Updated 6 months ago
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆39Jan 5, 2026Updated last month
- [ICLR 2026] Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models"☆162Feb 16, 2026Updated 2 weeks ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆25Nov 6, 2023Updated 2 years ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Jun 1, 2025Updated 9 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- The official implementation of the DIFFA series for dLLM-based large audio language model☆59Feb 2, 2026Updated last month