☆18Aug 7, 2025Updated 7 months ago
Alternatives and similar repositories for Spatio-Temporal-LLM
Users that are interested in Spatio-Temporal-LLM are comparing it to the libraries listed below
Sorting:
- Official implementation of the ECCV2024 paper: Generalizable Facial Expression Recognition☆20Sep 20, 2024Updated last year
- ☆19Jun 11, 2025Updated 9 months ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆206Jun 4, 2025Updated 9 months ago
- CVPR2025☆21Aug 16, 2025Updated 7 months ago
- This repo is the official implementation of "Euclid’s Gift: Enhancing Spatial Perception and Reasoning in Vision‑Language Models via Geom…☆27Updated this week
- ☆17Jul 6, 2021Updated 4 years ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)☆30Oct 28, 2025Updated 4 months ago
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆32Jun 12, 2025Updated 9 months ago
- [NeurIPS 2025 Spotlight] Official implementation of the SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alig…☆161Sep 25, 2025Updated 5 months ago
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆29Apr 16, 2024Updated last year
- Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"☆45Mar 25, 2025Updated 11 months ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆19Dec 22, 2024Updated last year
- ☆18Nov 18, 2024Updated last year
- EPIC-Kitchens-100 Action Recognition baselines: TSN, TRN, TSM☆33Mar 15, 2022Updated 4 years ago
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Nov 7, 2023Updated 2 years ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆213Nov 28, 2025Updated 3 months ago
- [IJCV 2025] VLPrompt-PSG: Vision-Language Prompting for Panoptic Scene Graph Generation☆28Sep 24, 2024Updated last year
- ☆35Apr 4, 2024Updated last year
- Evaluation metrics and submission file creation scripts the Action Recognition challenge☆15Feb 9, 2026Updated last month
- [ICCV 2025 Oral] CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation☆62Aug 1, 2025Updated 7 months ago
- MGCF-Net for Phishing URLs Detection☆49May 20, 2025Updated 10 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆67Jul 22, 2025Updated 7 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆60Updated this week
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆43Feb 5, 2025Updated last year
- Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation☆11Jul 24, 2023Updated 2 years ago
- [CVPR'25] Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model (DFD-FCG)☆49Jul 20, 2025Updated 8 months ago
- This repository maintains the code for my master thesis "learn semantic 3d reconstruction on octree"☆13May 8, 2019Updated 6 years ago
- ☆24May 23, 2025Updated 9 months ago
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆38Jan 12, 2026Updated 2 months ago
- [ICCV 2023] HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation☆38Jan 25, 2024Updated 2 years ago
- On solutions to the problem of Event Collapse in Motion Compensation frameworks☆15Jan 21, 2023Updated 3 years ago
- Training recipe for SpatialReasoner [NeurIPS 2025]☆41Updated this week
- Houdini Digital Asset which creates procedural city☆11Jun 14, 2016Updated 9 years ago
- [CVPR 2024] 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis.☆21Apr 23, 2025Updated 10 months ago
- An official implementation for APNet: Urban-level Scene Segmentation of Aerial Images and Point Clouds☆10Feb 7, 2024Updated 2 years ago
- This is an aerial image dataset for semantic scene understanding.☆13Jul 24, 2022Updated 3 years ago
- [NeurIPS 2024] XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation☆36Jan 20, 2025Updated last year
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆81Oct 10, 2024Updated last year
- [ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition☆63Jun 25, 2025Updated 8 months ago