alwynpan / uom-comp90024Links
Demo Code for Subject COMP90024
☆12Updated 5 months ago
Alternatives and similar repositories for uom-comp90024
Users that are interested in uom-comp90024 are comparing it to the libraries listed below
Sorting:
- Project Description☆22Updated last year
- ☆79Updated last year
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆22Updated 4 months ago
- [CVPR2025] Official implementation of the paper "Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practi…☆28Updated 2 months ago
- ☆67Updated 9 months ago
- Survey: https://arxiv.org/pdf/2507.20198☆121Updated last week
- SmartCLIP: A training method to improve CLIP with both short and long texts☆19Updated 2 months ago
- [ICML 2024 Spotlight] "Sample-specific Masks for Visual Reprogramming-based Prompting"☆12Updated 8 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆74Updated last month
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆90Updated 10 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆36Updated 2 months ago
- Survey on LLM Inference via Search (TMLR 2025)☆10Updated 3 months ago
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆41Updated 3 months ago
- [ICLR 2025] "Noisy Test-Time Adaptation in Vision-Language Models"☆11Updated 6 months ago
- TrackGPT: Track What You Need in Videos via Text Prompts☆25Updated 2 years ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆71Updated 2 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆65Updated 2 months ago
- ✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).☆53Updated 5 months ago
- [ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality☆38Updated last month
- Official implementation of MC-LLaVA.☆139Updated last week
- Pytorch implementation for "Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning" (ICML 2024)☆22Updated 3 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆79Updated last year
- [NeurIPS 2024]Repos for "Visualization-of-Thought" dataset, construction code and evaluation.☆32Updated 10 months ago
- The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models☆16Updated 10 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 4 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆151Updated 2 months ago
- A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.☆352Updated this week
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"☆60Updated last month
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆30Updated 3 months ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆102Updated 10 months ago