longmalongma / TW-GRPOLinks
The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"
☆23Updated 3 months ago
Alternatives and similar repositories for TW-GRPO
Users that are interested in TW-GRPO are comparing it to the libraries listed below
Sorting:
- ☆23Updated 5 months ago
- ☆38Updated 2 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆47Updated 3 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆53Updated 2 months ago
- ☆44Updated 11 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆39Updated 7 months ago
- [ICCV 2025] MOVE: Motion-Guided Few-Shot Video Object Segmentation☆32Updated last week
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆73Updated 2 months ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆125Updated last month
- ☆22Updated 3 months ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆18Updated 8 months ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆54Updated 2 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆114Updated 3 weeks ago
- ☆33Updated 11 months ago
- The official implementation of "PixelThink: Towards Efficient Chain-of-Pixel Reasoning" (arXiv 2025)☆37Updated 3 months ago
- [CVPR2025] Official code repository for SeTa: "Scale Efficient Training for Large Datasets"☆21Updated 5 months ago
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆19Updated 3 months ago
- [AAAI 2025] GFlow: Recovering 4D World from Monocular Video☆52Updated 4 months ago
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆37Updated 5 months ago
- [CVPR'2025] EntitySAM: Segment Everything in Video☆42Updated 2 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆81Updated 6 months ago
- Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting☆53Updated 2 months ago
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆37Updated 2 months ago
- ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models☆75Updated this week
- This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehe…☆54Updated this week
- [CVPR 2025] Test-Time Visual In-Context Tuning☆25Updated 5 months ago
- This is the project for 'USG'.☆25Updated 5 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆86Updated 6 months ago
- ICML2025☆57Updated 2 weeks ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆114Updated last week