qirui-chen/RGA3-release

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/qirui-chen/RGA3-release)

qirui-chen / RGA3-release

[ICCV 2025] Object-centric Video Question Answering with Visual Grounding and Referring

☆24

Alternatives and similar repositories for RGA3-release

Users that are interested in RGA3-release are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhengrongz / AoTD
View on GitHub
[CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".
☆58Updated this week
haoningwu3639 / SimpleSDM-Video
View on GitHub
A simple and flexible PyTorch implementation of Video StableDiffusion (ZeroScope_v2) based on diffusers.
☆20Feb 15, 2024Updated 2 years ago
Lzq5 / UniTime
View on GitHub
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
☆56May 20, 2026Updated 2 months ago
Go2Heart / StreamFormer
View on GitHub
[ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.
☆93Updated this week
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
haoningwu3639 / MRGen
View on GitHub
[ICCV 2025] MRGen: Segmentation Data Engine for Underrepresented MRI Modalities
☆41Sep 26, 2025Updated 10 months ago
haoningwu3639 / SimpleSDM-3
View on GitHub
A simple and flexible PyTorch implementation of StableDiffusion-3 based on diffusers for DIY and finetuning.
☆27May 28, 2025Updated last year
Lzq5 / Video-Text-Alignment
View on GitHub
☆28Jul 18, 2025Updated last year
haolinyang-hlyang / SoccerMaster
View on GitHub
[CVPR 2026 Oral] SoccerMaster: A Vision Foundation Model for Soccer Understanding
☆67Jul 14, 2026Updated last week
Code-kunkun / ZS-CIR
View on GitHub
[BMVC 2023] Zero-shot Composed Text-Image Retrieval
☆55Nov 26, 2024Updated last year
Go2Heart / OmniStream
View on GitHub
[ECCV 2026] OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
☆115Mar 15, 2026Updated 4 months ago
showlab / VideoLISA
View on GitHub
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆148Dec 26, 2024Updated last year
haoningwu3639 / SpatialScore
View on GitHub
[CVPR 2026 Highlight] SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence
☆84May 28, 2026Updated last month
Becomebright / ReKV
View on GitHub
[ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
☆122Nov 4, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Becomebright / GroundVQA
View on GitHub
Official PyTorch code of GroundVQA (CVPR'24)
☆63Sep 13, 2024Updated last year
cvlab-kaist / SOLA
View on GitHub
Official implementation of "Referring Video Object Segmentation via Language Aligned Track Selection".
☆41Jun 2, 2025Updated last year
Code-kunkun / LamRA
View on GitHub
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
☆182Jul 7, 2025Updated last year
jyrao / MatchTime
View on GitHub
[EMNLP 2024 Oral] MatchTime: Towards Automatic Soccer Game Commentary Generation
☆104Jan 2, 2025Updated last year
PolyU-ChenLab / UniPixel
View on GitHub
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆247Jan 4, 2026Updated 6 months ago
yhy-2000 / MomentSeeker
View on GitHub
☆23Jul 23, 2025Updated last year
MAGIC-AI4Med / RaTEScore
View on GitHub
[EMNLP 2024] RaTEScore: A Metric for Radiology Report Generation
☆67May 18, 2025Updated last year
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
haoningwu3639 / VFI_Adapter
View on GitHub
[BMVC 2023 Oral] Boost Video Frame Interpolation via Motion Adaptation
☆19Aug 22, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated last year
jbistanbul / universalvtg
View on GitHub
Official Code for the paper "UniversalVTG: A Univeral and Lightweight Foundation Model for Video Temporal Grounding"
☆15Apr 15, 2026Updated 3 months ago
BRZ911 / ViTCoT
View on GitHub
[ACM MM 2025] ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
☆18Jul 15, 2025Updated last year
Go2Heart / EchoSight
View on GitHub
[EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.
☆90Jan 19, 2026Updated 6 months ago
appletea233 / LLaVA-ST
View on GitHub
[CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
☆84Jul 4, 2025Updated last year
SalesforceAIResearch / strefer
View on GitHub
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
☆19Jun 2, 2026Updated last month
congvvc / InstructSeg
View on GitHub
[ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"
☆56Feb 10, 2025Updated last year
haoningwu3639 / SimpleSDXL
View on GitHub
A simple and flexible PyTorch implementation of StableDiffusion-XL based on diffusers.
☆19Sep 2, 2024Updated last year
Becomebright / MTV
View on GitHub
Revisiting Multi-Task Visual Representation Learning
☆22Jan 21, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
cilinyan / VISA
View on GitHub
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆213Aug 5, 2024Updated last year
WillDreamer / Awesome-MLLM-Reasoning
View on GitHub
Recent Advances on MLLM's Reasoning Ability
☆26Apr 11, 2025Updated last year
zjucsq / PLA
View on GitHub
[ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision
☆12Sep 17, 2023Updated 2 years ago
cilinyan / ReVOS-api
View on GitHub
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆22Jul 20, 2024Updated 2 years ago
DavidYan2001 / PVChat
View on GitHub
[ICCV 2025] PVChat: Personalized Video Chat with One-Shot Learning
☆17Apr 4, 2026Updated 3 months ago
ekazakos / grove
View on GitHub
Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)
☆31Jan 18, 2026Updated 6 months ago
CeeZh / SILVR
View on GitHub
Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"
☆19Jan 18, 2026Updated 6 months ago