[NeurIPS 2025] VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
☆64Jan 6, 2026Updated last month
Alternatives and similar repositories for VideoRFT
Users that are interested in VideoRFT are comparing it to the libraries listed below
Sorting:
- [ECCV2024] Nonverbal Interaction Detection☆29Oct 30, 2024Updated last year
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆35Jun 12, 2025Updated 8 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated last week
- [CVPR 2025] GPS as a Control Signal for Image Generation☆25Mar 18, 2025Updated 11 months ago
- Exposing Text-Image Inconsistency Using Diffusion Models (ICLR 2024)☆10Jun 15, 2024Updated last year
- [ACL 2023] Transforming Visual Scene Graphs to Image Captions☆10Dec 13, 2023Updated 2 years ago
- [ICLR 2025] Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception☆14Jul 4, 2025Updated 7 months ago
- FFNet: MetaMixer-based Efficient Convolutional Mixer Design☆31Mar 11, 2025Updated 11 months ago
- ☆15Mar 30, 2025Updated 11 months ago
- Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution☆20Jun 10, 2025Updated 8 months ago
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆49Jan 8, 2025Updated last year
- Image captioning with weight pruning in PyTorch☆22Jan 14, 2022Updated 4 years ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆85May 4, 2025Updated 10 months ago
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆53Mar 31, 2025Updated 11 months ago
- ☆27Mar 3, 2025Updated last year
- Official code of ACM MM2024 paper- Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection☆24Aug 15, 2024Updated last year
- Ideographic Description Sequence Checker Tools☆25Jun 21, 2017Updated 8 years ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆87Jul 13, 2025Updated 7 months ago
- ☆31Sep 1, 2025Updated 6 months ago
- Video Diffusion Transformers are In-Context Learners☆35Jan 6, 2025Updated last year
- [ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++☆219Feb 2, 2026Updated last month
- [ICLR 2025, AAAI 2026] official implementation of "Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generati…☆34Jan 26, 2026Updated last month
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆23Jan 26, 2025Updated last year
- Code for Paper 'Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach'☆35Jan 2, 2026Updated 2 months ago
- Reward Guided Latent Consistency Distillation☆26Oct 9, 2024Updated last year
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆78Oct 15, 2024Updated last year
- [NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos☆27Apr 8, 2025Updated 10 months ago
- ☆52Jan 6, 2026Updated last month
- Official implementary of HCoG: Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation [CVPR 2025]☆58Jul 28, 2025Updated 7 months ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆35Mar 12, 2024Updated last year
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆26May 26, 2025Updated 9 months ago
- MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations☆36Oct 17, 2024Updated last year
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆39Jun 14, 2025Updated 8 months ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆36Apr 14, 2025Updated 10 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆12Nov 14, 2025Updated 3 months ago
- Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation☆12Feb 16, 2025Updated last year
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"☆35Feb 2, 2024Updated 2 years ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- [ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets☆63Aug 6, 2025Updated 6 months ago