hrlics / HoPELinks
[NeurIPS 2025] HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models
☆24Updated 2 months ago
Alternatives and similar repositories for HoPE
Users that are interested in HoPE are comparing it to the libraries listed below
Sorting:
- Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies☆55Updated 2 months ago
- CoRL 2025☆23Updated 5 months ago
- ☆47Updated 2 weeks ago
- [ICCV 2025] Official repo of "EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow"☆26Updated 3 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 4 months ago
- ☆39Updated last week
- Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image☆65Updated last month
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆47Updated 4 months ago
- Code for "AffordanceLLM: Grounding Affordance from Vision Language Models"☆14Updated last year
- CVPR 2025' Instruct-4DGS: Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation☆23Updated 4 months ago
- [SIGGRAPH Asia 25] Official code for Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic…☆31Updated 2 months ago
- The official implementation of "DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation". (arXiv 2601.22153)☆118Updated last week
- This repository is the official implementation of MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expre…☆75Updated 3 weeks ago
- [AAAI 2025] GFlow: Recovering 4D World from Monocular Video☆65Updated 9 months ago
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆126Updated 3 months ago
- Codes of Paper "Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding"☆20Updated last year
- [ICCV 2025] RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping☆33Updated 2 months ago
- [ICLR 2025] Dataset and Code for Paper "Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels"☆45Updated last month
- Official implementation of “4D LangVGGT: 4D Language-Visual Geometry Grounded Transformer”☆78Updated 2 months ago
- ☆33Updated 2 months ago
- Imitation Learning; Robotics; Policy; VLA;☆30Updated this week
- Project page for Neural Shell Texture Splatting (ICCV 2025)☆33Updated 3 months ago
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆73Updated 3 months ago
- Public implementation of Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling☆26Updated 2 months ago
- official code repo of CVPR 2025 paper PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation☆60Updated 6 months ago
- [NeurIPS 2025]"DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling"☆93Updated last month
- [ICLR26] Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling☆138Updated 2 weeks ago
- [CVPR 2025] Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning☆55Updated 10 months ago
- Code for "BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation", ICCV 2025.☆101Updated 4 months ago
- Offical implementation of "Auto-Regressively Generating Multi-View Consistent Images". (ICCV 2025)☆82Updated 6 months ago