sanbuphy / computer-vision-reference
Collected the world's best computer vision labs and lecture materials.
☆14Updated last month
Alternatives and similar repositories for computer-vision-reference:
Users that are interested in computer-vision-reference are comparing it to the libraries listed below
- Code release for VTW (AAAI 2025) Oral☆33Updated 2 months ago
- ☆20Updated last month
- ☆36Updated last week
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆44Updated 2 months ago
- ☆21Updated 3 months ago
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆47Updated this week
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆80Updated last month
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆31Updated 3 months ago
- 🌈 Unifying Visual Understanding and Generation with Dual Visual Vocabularies☆26Updated last week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆97Updated last month
- ☆37Updated 3 months ago
- ☆56Updated last week
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆56Updated last month
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆33Updated 2 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆81Updated 3 weeks ago
- ☆70Updated 2 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆54Updated last week
- Official code for ICLR 2024 paper "Do Generated Data Always Help Contrastive Learning?"☆30Updated 11 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆49Updated 8 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆26Updated 3 months ago
- ☆50Updated last week
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆83Updated 3 weeks ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆59Updated 3 months ago
- [CVPR 2025] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆42Updated 3 weeks ago
- A Self-Training Framework for Vision-Language Reasoning☆73Updated 2 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆92Updated this week
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 7 months ago
- 本项目用于Multimodal领域新手的学习路线,包括该领域的经典论文,项目及课程。旨在希望学习者在一定的时间内达到对这个领域有较为深刻的认知,能够自己进行的独立研究。☆15Updated last year
- GPT as a Monte Carlo Language Tree: A Probabilistic Perspective☆42Updated 2 months ago
- ☆40Updated 2 months ago