alwynpan / uom-comp90024
Demo Code for Subject COMP90024
☆12Updated last month
Alternatives and similar repositories for uom-comp90024
Users that are interested in uom-comp90024 are comparing it to the libraries listed below
Sorting:
- Project Description☆22Updated last year
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆35Updated 3 months ago
- Awesome paper for multi-modal llm with grounding ability☆17Updated 9 months ago
- Yet Another Academic Homepage Template☆20Updated last week
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆20Updated 3 weeks ago
- [CVPR2024] This is the official implement of MP5☆101Updated 10 months ago
- Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"☆12Updated last month
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark☆11Updated last month
- GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization☆119Updated last month
- Official implementation of: Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel☆25Updated 5 months ago
- One-Shot Open Affordance Learning with Foundation Models (CVPR 2024)☆34Updated 9 months ago
- AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation☆71Updated last month
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.☆253Updated 3 months ago
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"☆91Updated 3 months ago
- ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images (NeurIPS2024)☆80Updated 4 months ago
- This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and fol…☆155Updated 3 months ago
- Embodied Question Answering (EQA) benchmark and method☆16Updated last month
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆52Updated 2 months ago
- Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"☆27Updated last month
- ☆121Updated last year
- official implementation of NeurIPS 2023 paper "FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation"☆32Updated last year
- ☆76Updated 8 months ago
- ☆32Updated 3 weeks ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding☆102Updated 2 weeks ago
- ☆43Updated 6 months ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆128Updated 6 months ago
- [CVPR 2024] Code for HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation☆71Updated 7 months ago
- [ICLR 2023] SQA3D for embodied scene understanding and reasoning☆132Updated last year
- OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulation☆126Updated this week
- Learning without Forgetting for Vision-Language Models (TPAMI 2025)☆36Updated 2 months ago