codefanw / FlashSlothLinks
[CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
☆45Updated 4 months ago
Alternatives and similar repositories for FlashSloth
Users that are interested in FlashSloth are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆80Updated 2 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆83Updated last month
- ☆69Updated 2 months ago
- ☆29Updated last year
- [ICCV 2025] Token Activation Map to Visually Explain Multimodal LLMs☆40Updated 2 weeks ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆69Updated last week
- ☆88Updated 3 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆87Updated last week
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"☆102Updated last week
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆74Updated 3 months ago
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆24Updated last month
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆86Updated 2 months ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories☆58Updated 4 months ago
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆43Updated 5 months ago
- code for FineLIP☆26Updated 3 months ago
- ☆33Updated 3 weeks ago
- Transactions on Multimedia (TMM25)☆15Updated 3 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆77Updated 8 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆31Updated last month
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆64Updated 3 weeks ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆23Updated 3 months ago
- The official repository for ACL2025 paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆49Updated 2 months ago
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation☆105Updated 3 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆58Updated 4 months ago
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆53Updated last week
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆184Updated last week
- [ECCV2024]The official implementation of the DiffPNG paper in PyTorch.☆12Updated 9 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆43Updated last month
- [CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception☆65Updated last month
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆123Updated 6 months ago