☆45Jul 28, 2025Updated 9 months ago
Alternatives and similar repositories for UV-CoT
Users that are interested in UV-CoT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ECCV2022] The PyTorch implementation of paper "Equivariance and Invariance Inductive Bias for Learning from Insufficient Data"☆19Oct 12, 2022Updated 3 years ago
- [ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning☆46Nov 8, 2025Updated 6 months ago
- Unified layout planning and image generation, ICCV2025☆42Jan 19, 2026Updated 3 months ago
- Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization☆27Apr 14, 2025Updated last year
- [ICCV 2025] Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models☆36Mar 20, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding☆66Sep 1, 2025Updated 8 months ago
- GAIIC2024无人机视角下的双光目标检测 - Rank6 解决方案☆12Jun 17, 2024Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆443Dec 22, 2024Updated last year
- Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)☆30Jan 18, 2026Updated 3 months ago
- ☆112Jan 8, 2025Updated last year
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated last year
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆30Mar 25, 2026Updated last month
- [AAAI 2022 Oral] This is a Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tail…☆33Feb 17, 2022Updated 4 years ago
- CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification(AAAI2025)☆52Nov 24, 2025Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents☆65May 26, 2025Updated 11 months ago
- [ICML 2025] Official code of "DAMA: Data- and Model-aware Alignment of Multi-modal LLMs"☆16May 24, 2025Updated 11 months ago
- This is the project for IRM methods☆12Sep 13, 2021Updated 4 years ago
- LVAS-Agent Code Base☆20Apr 15, 2025Updated last year
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆47Jul 17, 2025Updated 9 months ago
- [NeurIPS 2023] Generalized Logit Adjustment☆39Apr 21, 2024Updated 2 years ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆19Dec 27, 2024Updated last year
- ☆20Mar 3, 2025Updated last year
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆141Jul 28, 2025Updated 9 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [AAAI-26] Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?☆30Dec 14, 2025Updated 4 months ago
- OpenAI GPT For Python Developers☆12Jun 9, 2023Updated 2 years ago
- ECCV2020_Spatial Hierarchy Aware Residual Pyramid Network for Time-of-Flight Depth Denoising☆12Sep 24, 2020Updated 5 years ago
- 使 用fastrtc框架调用qwen-2.5-omni-realtime实现实时语音、视频等☆14Jun 27, 2025Updated 10 months ago
- ☆42Nov 8, 2025Updated 6 months ago
- Training Segment Anything Model(SAM) by MetaAI from scratch and fine-tuning it with NDIS Park(Night and Day Instance Segmented Park) data…☆13Jun 21, 2025Updated 10 months ago
- Load and visualize different datasets in video question answering☆10May 11, 2021Updated 4 years ago
- The supplementary material for the paper "Fine-tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code R…☆16Aug 12, 2024Updated last year
- ☆13Aug 5, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology☆86Jan 26, 2026Updated 3 months ago
- Parallel_Computer_Architecture经典书籍☆17May 13, 2022Updated 3 years ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆240Nov 7, 2025Updated 6 months ago
- [AAAI 2026 Oral] HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment☆31Dec 17, 2025Updated 4 months ago
- [CVPR 2023] Code for the paper "Masked Images Are Counterfactual Samples for Robust Fine-tuning"☆14Mar 24, 2023Updated 3 years ago
- ☆12Aug 29, 2025Updated 8 months ago
- Implementation for the CVPR2019 paper "Graphical Contrastive Losses for Scene Graph Parsing"☆12Nov 11, 2019Updated 6 years ago