H-EmbodVis/NUMINA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/H-EmbodVis/NUMINA)

H-EmbodVis / NUMINA

[CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

☆68

Alternatives and similar repositories for NUMINA

Users that are interested in NUMINA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

H-EmbodVis / PointTPA
View on GitHub
[CVPR 2026] PointTPA: Dynamic Network Parameter Adaptation for 3D Scene Understanding
☆33Apr 7, 2026Updated 3 months ago
H-EmbodVis / HERMESV2
View on GitHub
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
☆65May 1, 2026Updated 2 months ago
XenoZLH / Shuffle-R1
View on GitHub
Official code repository of Shuffle-R1
☆26Feb 23, 2026Updated 5 months ago
1ranGuan / VST
View on GitHub
[ECCV 26] Video Streaming Thinking
☆116Jun 18, 2026Updated last month
H-EmbodVis / DOMINO
View on GitHub
[ECCV 2026] Towards Generalizable Robotic Manipulation in Dynamic Environments
☆230Jun 30, 2026Updated 3 weeks ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
DYZhang09 / ViTWSS3D
View on GitHub
[ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection
☆13Apr 12, 2024Updated 2 years ago
H-EmbodVis / HyDRA
View on GitHub
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
☆267Updated this week
H-EmbodVis / GRANT
View on GitHub
[AAAI 2026 Oral] Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
☆364Dec 12, 2025Updated 7 months ago
H-EmbodVis / EasyCache
View on GitHub
Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching
☆291May 12, 2026Updated 2 months ago
H-EmbodVis / MERGE
View on GitHub
[NeurIPS 2025] More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models
☆219Oct 31, 2025Updated 8 months ago
dk-liang / UniFuture
View on GitHub
[ICRA 2026] UniFuture: A 4D Driving World Model for Future Generation and Perception
☆163Feb 26, 2026Updated 4 months ago
EasonXiao-888 / SpatialEdit
View on GitHub
[Official Repo] SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
☆214Apr 13, 2026Updated 3 months ago
dk-liang / UniSeg3D
View on GitHub
[NeurIPS 2024] A Unified Framework for 3D Scene Understanding
☆179Jul 7, 2025Updated last year
DYZhang09 / ToC3D
View on GitHub
[ECCV 2024] Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
☆53Sep 21, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
AIGeeksGroup / UniMesh
View on GitHub
UniMesh: Unifying 3D Mesh Understanding and Generation
☆57Jul 14, 2026Updated last week
HyperbolicCurve / Awesome-World-Action-Model
View on GitHub
A curated list of academic papers and resources on Vision-Language-Action (VLA) and World Action Models (WAM)
☆30Updated this week
xiaomi-mlab / MindDrive
View on GitHub
[ECCV 2026] Official code of “MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning”
☆246Jun 23, 2026Updated last month
GordonChen19 / Prompt-Relay
View on GitHub
An inference-time, plug-and-play method for temporal control in multi-event generation
☆185Apr 26, 2026Updated 2 months ago
j0seo / lookahead-anchoring
View on GitHub
☆15Oct 27, 2025Updated 8 months ago
EditCrafter / EditCrafter
View on GitHub
The official repository of EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model (CVPRW 2026)
☆50Apr 19, 2026Updated 3 months ago
LMD0311 / HERMES
View on GitHub
[ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
☆259May 12, 2026Updated 2 months ago
yangdongchao / UniAudio2Demo
View on GitHub
☆26Feb 10, 2026Updated 5 months ago
xiaomi-mlab / Orion
View on GitHub
[ICCV 2025] Official code of "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation"
☆654Jun 22, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hanjq17 / Spectrum
View on GitHub
[CVPR 2026] Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration
☆126Apr 30, 2026Updated 2 months ago
Westlake-AGI-Lab / SwitchCraft
View on GitHub
Official Implementation of SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls [CVPR 2026]
☆24Mar 2, 2026Updated 4 months ago
H-EmbodVis / NAUTILUS
View on GitHub
[NeurIPS 2025] NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding
☆368Dec 18, 2025Updated 7 months ago
WildActor / WildActor
View on GitHub
Accepted by ICML2026
☆90Jun 29, 2026Updated 3 weeks ago
ShianDu / UniMMVSR
View on GitHub
Official Code of "UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution"
☆26Oct 12, 2025Updated 9 months ago
arielshaulov / FlowMo
View on GitHub
☆110Sep 3, 2025Updated 10 months ago
Sphere-AI-Lab / diagdistill
View on GitHub
Implementation of <Streaming Autoregressive Video Generation via Diagonal Distillation> in ICLR 2026
☆130Mar 18, 2026Updated 4 months ago
alibaba-damo-academy / Lumos-Custom
View on GitHub
[ICLR-26, ECCV-26, NeurIPS-25] Lumos-Custom Project: research for customized video generation in the Lumos Project.
☆216Jun 29, 2026Updated 3 weeks ago
dk-liang / Awesome-GPT4-with-Applications
View on GitHub
Awesome GPT-4 with Applications. This is a collection of resources related to GPT-4, including news, official documents, demo and applica…
☆20Mar 15, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
1ranGuan / thinkomni
View on GitHub
[ICLR26] ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
☆94Mar 20, 2026Updated 4 months ago
TIGER-AI-Lab / Context-Forcing
View on GitHub
Context Forcing: Consistent Autoregressive Video Generation with Long Context [ICML26]
☆105Jun 29, 2026Updated 3 weeks ago
shalfun / DriVerse
View on GitHub
[ACMMM 2025] Officially implement of the paper "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompti…
☆220May 7, 2025Updated last year
hyeon-cho / Tangential-Amplifying-Guidance
View on GitHub
[ICML2026] Official Implementation of "TAG: Tangential Amplifying Guidance for Hallucination-Resistant Sampling"
☆42Jul 6, 2026Updated 2 weeks ago
HongkLin / TIDE
View on GitHub
[CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes
☆60Apr 9, 2025Updated last year
deepshwang / crepa
View on GitHub
☆15Jun 21, 2025Updated last year
franciszzj / Saber
View on GitHub
[CVPR 2026] Scaling Zero-Shot Reference-to-Video Generation
☆76Apr 28, 2026Updated 2 months ago