☆42Jul 14, 2025Updated 10 months ago
Alternatives and similar repositories for Ground-R1
Users that are interested in Ground-R1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆47Jul 17, 2025Updated 10 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆48Jul 3, 2025Updated 10 months ago
- [ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban…☆30Jul 15, 2025Updated 10 months ago
- ☆124Jul 22, 2025Updated 10 months ago
- [NeurIPS 2023] Discover and Align Taxonomic Context Priors for Open-world Semi-Supervised Learning☆17Apr 15, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Official implementation of Latent-SFT: teaching LLMs to reason with vocabulary-space latent chains.☆48Updated this week
- List of learning-based PCC papers, welcome Pull Requests!☆25Nov 4, 2025Updated 6 months ago
- PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [EMNLP 2024]☆28Nov 18, 2024Updated last year
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆163Jun 2, 2025Updated 11 months ago
- [IEEE TVCG 2025] Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames☆11Jun 1, 2025Updated 11 months ago
- [ICLR 2026] SparseD: Sparse Attention for Diffusion Language Models☆65Feb 22, 2026Updated 3 months ago
- OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing☆46Apr 15, 2026Updated last month
- ☆14Jul 1, 2023Updated 2 years ago
- Official implementation of EgoThinker at NIPS 2025☆27Nov 25, 2025Updated 5 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code of the paper "Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation"…☆20Nov 11, 2025Updated 6 months ago
- ☆113Aug 14, 2025Updated 9 months ago
- [ICCV 2025] UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoing and Understanding.☆78Feb 28, 2026Updated 2 months ago
- Anchor Assignment and Sampling Heuristics in Deep Object Detection: A Review☆11Aug 2, 2022Updated 3 years ago
- [CVPR'26] UniGame code implementation☆19Apr 21, 2026Updated last month
- ☆44Apr 16, 2026Updated last month
- [CVPR'25] Official code of paper "Mimic In-Context Learning for Multimodal Tasks"☆26Mar 10, 2026Updated 2 months ago
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆56Mar 31, 2025Updated last year
- code for affordance-r1☆70May 11, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official repo for An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization☆16Mar 8, 2024Updated 2 years ago
- ☆1,211Nov 20, 2025Updated 6 months ago
- OW-OVD: Unified Open World and Open Vocabulary Object Detection (CVPR 2025)☆30Dec 2, 2024Updated last year
- CaptionQA: Is Your Caption as Useful as the Image Itself?☆35Mar 3, 2026Updated 2 months ago
- [NeurIPS 2024] COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing☆26Dec 8, 2024Updated last year
- [ECCV2024] UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation☆27Oct 20, 2025Updated 7 months ago
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆50May 12, 2024Updated 2 years ago
- ☆12Dec 4, 2024Updated last year
- [NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion☆103Oct 29, 2025Updated 6 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [NeurIPS 2024] PyTorch code for the paper "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning…☆26Oct 24, 2025Updated 6 months ago
- A simple, elegant web tool that allows you to create custom RSS feeds for arXiv search queries. Stay up-to-date with the latest research …☆35Mar 21, 2026Updated 2 months ago
- ☆17Jun 19, 2023Updated 2 years ago
- Official implementation of Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts(IJCAI 2024)☆15Oct 16, 2024Updated last year
- [AAAI 2023 Oral] Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training☆14Apr 19, 2023Updated 3 years ago
- The official implement of "Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models"☆17Mar 24, 2025Updated last year
- ☆15Mar 29, 2023Updated 3 years ago