alwynpan / uom-comp90024Links
Demo Code for Subject COMP90024
☆12Updated 2 months ago
Alternatives and similar repositories for uom-comp90024
Users that are interested in uom-comp90024 are comparing it to the libraries listed below
Sorting:
- ☆12Updated 6 months ago
- [CVPR'2022, TPAMI'2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation☆20Updated 5 months ago
- 😎 A curated list of CVPR 2025 Oral paper. Total 96☆33Updated last week
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆74Updated 8 months ago
- [arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆37Updated last week
- [NeurIPS 2024] Official code repository for MSR3D paper☆60Updated last week
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆17Updated 11 months ago
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"☆95Updated 4 months ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆51Updated last week
- Yet another RL Baseline repo.☆10Updated last year
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆22Updated 2 months ago
- ☆15Updated 3 weeks ago
- [CVPR 2024] Binding Touch to Everything: Learning Unified Multimodal Tactile Representations☆53Updated 4 months ago
- Codes of Paper "Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding"☆18Updated 9 months ago
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆65Updated this week
- [ICCV2025] AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation☆78Updated this week
- ☆124Updated last year
- Unified Vision-Language-Action Model☆61Updated this week
- ☆207Updated 2 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆93Updated last year
- RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints☆50Updated this week
- SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation☆165Updated last month
- Public release for "Explore until Confident: Efficient Exploration for Embodied Question Answering"☆59Updated 11 months ago
- ☆37Updated 2 weeks ago
- ☆18Updated last year
- [Arxiv 2025: MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation]☆36Updated 2 months ago
- OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding☆18Updated 2 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆41Updated 5 months ago
- [CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation☆147Updated last week