z1oong / Building-Egocentric-Procedural-AI-AssistantLinks
Building Egocentric Procedural AI Assistant: Methods, Benchmarks, and Challenges
β41Updated 2 weeks ago
Alternatives and similar repositories for Building-Egocentric-Procedural-AI-Assistant
Users that are interested in Building-Egocentric-Procedural-AI-Assistant are comparing it to the libraries listed below
Sorting:
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ649Updated 7 months ago
- π up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.β260Updated 2 weeks ago
- [RSS 2025] Learning to Act Anywhere with Task-centric Latent Actionsβ968Updated 2 months ago
- β17Updated 2 years ago
- LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintainedπ₯]β174Updated 3 months ago
- [TPAMI 2025] Towards Visual Grounding: A Surveyβ291Updated 2 months ago
- StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developingβ949Updated this week
- RynnVLA-002: A Unified Vision-Language-Action and World Modelβ866Updated last month
- This website is for the collection of VLA SOTA results.β110Updated this week
- [RSS 2024 & RSS 2025] VLN-CE evaluation code of NaVid and Uni-NaVidβ363Updated 3 months ago
- Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"β388Updated 3 months ago
- β452Updated this week
- Traffic Scenarios Event Caption (TSEC) Datasetβ18Updated last year
- [ICLR 2026] ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Drivingβ432Updated this week
- π₯This is a curated list of "A survey on Efficient Vision-Language Action Models" research. We will continue to maintain and update the rβ¦β123Updated 3 weeks ago
- [AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Videoβ¦β91Updated last year
- Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)β100Updated last year
- Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Successβ1,006Updated 4 months ago
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"β309Updated last year
- HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Modelβ336Updated 3 months ago
- Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoningβ315Updated 10 months ago
- [NeurIPS 2025 Spotlight] SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulationβ224Updated 7 months ago
- A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulationβ401Updated 3 months ago
- [ICLR2025] Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learningβ14Updated 9 months ago
- Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Futureβ215Updated 9 months ago
- InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policyβ344Updated 3 weeks ago
- Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learningβ26Updated 6 months ago
- Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.β523Updated last year
- [AAAI 2026 Oral] SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulationβ60Updated 2 weeks ago
- Efficiently apply modification functions to RLDS/TFDS datasets.β29Updated last year