tanvir-utexas / PaPrLinks
☆11Updated 11 months ago
Alternatives and similar repositories for PaPr
Users that are interested in PaPr are comparing it to the libraries listed below
Sorting:
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.☆32Updated 5 months ago
- ☆29Updated last year
- The official repository for ACL2025 paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆46Updated last month
- Official implementation of CVPR 2024 paper "Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers".☆38Updated last year
- Official PyTorch implementation of Agglomerative Token Clustering presented at ECCV 2024☆17Updated 9 months ago
- Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT …☆34Updated last year
- The official implementation of "PixelThink: Towards Efficient Chain-of-Pixel Reasoning" (arXiv 2025)☆31Updated 3 weeks ago
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"☆46Updated 5 months ago
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆22Updated 2 months ago
- [NeurIPS'23] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions☆60Updated last year
- Official implementation for paper "Knowledge Diffusion for Distillation", NeurIPS 2023☆88Updated last year
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆52Updated 2 weeks ago
- Improving Mamaba performance on Video Understanding task☆40Updated 8 months ago
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆23Updated 2 weeks ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆41Updated 6 months ago
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆78Updated 2 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆34Updated last year
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆36Updated 4 months ago
- Official PyTorch Code of ReKV (ICLR'25)☆28Updated 3 months ago
- [AAAI 2025] Linear-complexity Visual Sequence Learning with Gated Linear Attention☆111Updated last year
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆73Updated 2 months ago
- [ECCV2024]FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance☆14Updated 9 months ago
- [ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…☆96Updated last year
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…☆48Updated 10 months ago
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆46Updated 2 weeks ago
- [CVPR 2023] Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection☆30Updated 2 years ago
- Adapters Strike Back (CVPR 2024)☆35Updated 11 months ago
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".☆49Updated last year
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation☆34Updated 3 months ago
- ☆80Updated 7 months ago