jina-ai / executor-3d-encoderLinks
An executor that wraps 3D mesh models and encodes 3D content documents to d-dimension vector.
☆19Updated 2 years ago
Alternatives and similar repositories for executor-3d-encoder
Users that are interested in executor-3d-encoder are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆40Updated 11 months ago
- Improving 3D Large Language Model via Robust Instruction Tuning☆54Updated 3 months ago
- Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"☆54Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆54Updated 2 years ago
- [ICRA 2024] Official Implementation of the paper "Parameter-efficient Prompt Learning for 3D Point Cloud Understanding"☆23Updated 3 months ago
- Code for “Pretrained Language Models as Visual Planners for Human Assistance”☆61Updated last year
- Webpage☆16Updated last year
- A project for computing high-quality ground truth training examples for RGB-D data.☆44Updated last year
- Using Segment-Anything and CLIP to generate pixel-aligned semantic features.☆40Updated 2 years ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆15Updated 6 months ago
- The implementation for "3D Scene Diffusion Guidance using Scene Graphs" paper. A Diffusion Model for Conditional 3D Scene Generation with…☆20Updated last year
- IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos☆48Updated 2 months ago
- ☆33Updated last year
- Image/Instance Retrieval using CLIP, A self supervised Learning Model☆28Updated 2 years ago
- [CVPR 2023] Code for "3D Concept Learning and Reasoning from Multi-View Images"☆78Updated last year
- ☆16Updated last year
- Utilizing segment-anything to help the region selection of 3D point cloud or mesh.☆45Updated 2 years ago
- CLEVR3D Dataset: Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation☆18Updated last year
- Agent-to-Sim Learning Interactive Behavior from Casual Videos.☆43Updated 7 months ago
- PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability☆15Updated 2 months ago
- Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"☆58Updated last year
- MIMIC: Masked Image Modeling with Image Correspondences☆16Updated 11 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- Vision-oriented multimodal AI☆49Updated 11 months ago
- [ICCV 2023] Code for "Multi-task View Synthesis with Neural Radiance Fields"☆11Updated last year
- [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning☆38Updated 5 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆34Updated 11 months ago
- ☆21Updated 5 months ago
- Code for the Ask4Help project☆22Updated 2 years ago
- ☆31Updated last month