Mr-Loevan / FASTLinks
Fast-Slow Thinking for Large Vision-Language Model Reasoning
☆15Updated last month
Alternatives and similar repositories for FAST
Users that are interested in FAST are comparing it to the libraries listed below
Sorting:
- ☆84Updated 2 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆41Updated 2 weeks ago
- Official Repository of Personalized Visual Instruct Tuning☆29Updated 3 months ago
- Official implementation of MC-LLaVA.☆28Updated 3 weeks ago
- ☆42Updated 7 months ago
- ☆44Updated 5 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆35Updated 4 months ago
- ☆37Updated last month
- Multimodal RewardBench☆41Updated 4 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆51Updated 3 weeks ago
- ☆30Updated 10 months ago
- LEO: A powerful Hybrid Multimodal LLM☆18Updated 5 months ago
- ☆21Updated 3 months ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆22Updated 2 months ago
- Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation☆19Updated 2 months ago
- ☆37Updated last month
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 6 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆54Updated this week
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆22Updated 2 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆37Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆76Updated last year
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆36Updated 4 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆29Updated last month
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆62Updated 2 weeks ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆27Updated 2 months ago
- The code repository of UniRL☆30Updated 3 weeks ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆47Updated 2 months ago
- VisRL: Intention-Driven Visual Perception via Reinforced Reasoning☆29Updated last week
- [NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector☆37Updated last year
- Official implement of MIA-DPO☆58Updated 5 months ago