silicx / GoldFromOres-BiLPLinks
Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)
☆24Updated 11 months ago
Alternatives and similar repositories for GoldFromOres-BiLP
Users that are interested in GoldFromOres-BiLP are comparing it to the libraries listed below
Sorting:
- Official implementation of Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement.☆31Updated 10 months ago
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆14Updated 3 weeks ago
- This the official repository of OCL (ICCV 2023).☆22Updated last year
- Code for our ICML'24 on multimodal dataset distillation☆37Updated 8 months ago
- Data pre-processing and training code on Open-X-Embodiment with pytorch☆11Updated 5 months ago
- ☆46Updated 6 months ago
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodiment☆15Updated 5 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆43Updated last week
- Official PyTorch Implementation of Learning Affordance Grounding from Exocentric Images, CVPR 2022☆62Updated 7 months ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆61Updated 3 weeks ago
- Latent Motion Token as the Bridging Language for Robot Manipulation☆105Updated last month
- official repo for AGNOSTOS, a cross-task manipulation benchmark, and X-ICM method, a cross-task in-context manipulation (VLA) method☆29Updated last month
- ☆16Updated last year
- [ICLR2025] Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?☆10Updated 2 months ago
- Official implemetation of the paper "Policy Contrastive Decoding for Robotic Foundation Models"☆16Updated 2 weeks ago
- [CVPR 2024] Binding Touch to Everything: Learning Unified Multimodal Tactile Representations☆52Updated 4 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆29Updated last month
- Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223☆134Updated last month
- ☆13Updated 2 months ago
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆78Updated 3 weeks ago
- AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation☆77Updated 2 months ago
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆33Updated last year
- Official Implementation of CAPEAM (ICCV'23)☆13Updated 6 months ago
- [ICML 2025] OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction☆83Updated 2 months ago
- Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)☆44Updated 11 months ago
- ☆95Updated last month
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆85Updated 9 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆57Updated 9 months ago
- Accepted by CVPR 2024☆34Updated last year
- Official Implementation of CL-ALFRED (ICLR'24)☆22Updated 8 months ago