[CVPR 2026 Highlight] A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens
☆93Apr 21, 2026Updated last week
Alternatives and similar repositories for deltatok
Users that are interested in deltatok are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Aug 4, 2025Updated 8 months ago
- [AAAI 2025] SSLFusion: Scale and Space Aligned Latent Fusion Model for Multimodal 3D Object Detection☆17Nov 14, 2025Updated 5 months ago
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆32Jun 12, 2025Updated 10 months ago
- [CVPR26] MuM's a pretty good feature extractor for 3D tasks, probably the best.☆83Apr 6, 2026Updated 3 weeks ago
- [IJCAI2024] Implementation of "DCDet: Dynamic Cross-based 3D Object Detector"☆14Aug 28, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICCV 2025]CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection☆28Aug 10, 2025Updated 8 months ago
- [ICCV2025] LRS4Fusion: Self-Supervised Sparse Sensor Fusion for Long Range Perception☆33Aug 20, 2025Updated 8 months ago
- This repo contains the code for our TMLR paper: A Simple Video Segmenter by Tracking Objects Along Axial Trajectories☆27Mar 20, 2025Updated last year
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆33Dec 22, 2025Updated 4 months ago
- Official implementation of "Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals" (CVPR 2026)☆36Feb 25, 2026Updated 2 months ago
- ☆34Jan 22, 2026Updated 3 months ago
- ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation (CVPR'25)☆21Apr 2, 2025Updated last year
- Code for RA-L work "Deep Probabilistic Feature-metric Tracking"☆30Mar 20, 2023Updated 3 years ago
- ☆20Feb 8, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Dataset and Baselines for "You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization pr…☆11Sep 15, 2023Updated 2 years ago
- ☆11Nov 18, 2024Updated last year
- LaunchPad is a light-weighted Slurm job launcher designed for hyper-parameter search.☆11Aug 2, 2024Updated last year
- Reproduction of popular methods for class-incremental learning in image recognition and proposal of a new variant.☆10Jan 21, 2021Updated 5 years ago
- Beyond Accuracy: What Matters in Designing Well-Behaved Models?☆19Mar 30, 2026Updated 3 weeks ago
- [ICLR 2026] The official implementation of "Dichotomous Diffusion Policy Optimization"☆35Mar 6, 2026Updated last month
- Repo of "Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving"☆108Mar 22, 2026Updated last month
- ☆36Dec 18, 2025Updated 4 months ago
- ☆28Apr 4, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- This repo contains VPR models that have been fine-tuned for indoor usage.☆16May 15, 2024Updated last year
- ☆13May 9, 2023Updated 2 years ago
- Workshop on UAVs in Multimedia: Capturing the World from a New Perspective. Reza Zhu's Solution: MBEG☆11May 17, 2024Updated last year
- This repository includes the official implementation of our paper "Grouping First, Attending Smartly: Training-Free Acceleration for Diff…☆55May 21, 2025Updated 11 months ago
- [CVPR 2025 Award Candidate & Oral] TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion☆45Apr 24, 2025Updated last year
- [CVPR 2025] Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion☆48Mar 31, 2025Updated last year
- CVPR 2025: VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction☆82Aug 1, 2025Updated 8 months ago
- ☆47Jan 16, 2024Updated 2 years ago
- The first open-domain closed-loop revisited benchmark for evaluating memory consistency and action control in world models.☆51Feb 10, 2026Updated 2 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [NeurIPS 24] Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models☆44Sep 30, 2024Updated last year
- Official implementation of "Repurposing Video Diffusion Transformers for Robust Point Tracking"☆43Dec 24, 2025Updated 4 months ago
- Collection of gym environments with support for domain randomization☆10Dec 11, 2024Updated last year
- ☆12Apr 18, 2025Updated last year
- Official repository of the paper "JIST: Joint Image and Sequence Training for Sequential Visual Place Recognition"☆24Dec 15, 2023Updated 2 years ago
- On the Challenges of Open World Recognition under Shifting Visual Domains☆11Jan 24, 2022Updated 4 years ago
- Reinforcing Action Policies by Prophesying☆39Nov 26, 2025Updated 5 months ago