alwynpan / uom-comp90024Links
Demo Code for Subject COMP90024
☆12Updated 7 months ago
Alternatives and similar repositories for uom-comp90024
Users that are interested in uom-comp90024 are comparing it to the libraries listed below
Sorting:
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆193Updated 5 months ago
- [CVPR2025] Official implementation of the paper "Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practi…☆39Updated last month
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆36Updated 10 months ago
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆23Updated 8 months ago
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆25Updated last year
- [ACL'25 Main] Official Implementation of HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Languag…☆40Updated 2 months ago
- A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coreset…☆58Updated 10 months ago
- ☆81Updated last year
- [CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Att…☆53Updated last month
- Official repo of Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics☆50Updated 3 months ago
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆72Updated 9 months ago
- [ICML 2024 Spotlight] "Sample-specific Masks for Visual Reprogramming-based Prompting"☆12Updated 11 months ago
- Survey: https://arxiv.org/pdf/2507.20198☆218Updated last month
- [ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs☆132Updated 3 months ago
- Code for our ICML'24 on multimodal dataset distillation☆41Updated last year
- Awesome paper for multi-modal llm with grounding ability☆19Updated last month
- [NeurIPS 2025]⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆233Updated last month
- 📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.☆339Updated 2 weeks ago
- Latest Advances on Vison-Language-Action Models.☆119Updated 8 months ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆87Updated 2 months ago
- Code release for VTW (AAAI 2025 Oral)☆64Updated 3 weeks ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆97Updated 5 months ago
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆88Updated 2 months ago
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆50Updated 6 months ago
- A tiny paper rating web☆38Updated 8 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 7 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆54Updated 10 months ago
- ☆29Updated 3 months ago
- [CVPR' 25] Interleaved-Modal Chain-of-Thought☆94Updated this week
- A paper list for spatial reasoning☆411Updated last week