YeeZ93 / Awesome-Object-Centric-LearningLinks
A curated list of researches in object-centric learning
☆11Updated 7 months ago
Alternatives and similar repositories for Awesome-Object-Centric-Learning
Users that are interested in Awesome-Object-Centric-Learning are comparing it to the libraries listed below
Sorting:
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆58Updated last year
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆26Updated 7 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆41Updated 2 months ago
- This repository houses the code for the paper - "The Neglected of VLMs"☆28Updated last month
- Awesome paper for multi-modal llm with grounding ability☆17Updated 10 months ago
- Compress conventional Vision-Language Pre-training data☆51Updated last year
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 6 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆65Updated last month
- The efficient tuning method for VLMs☆80Updated last year
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆40Updated last year
- ☆43Updated 5 months ago
- Visual self-questioning for large vision-language assistant.☆41Updated 8 months ago
- ☆37Updated 11 months ago
- (ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?☆25Updated 6 months ago
- [AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)☆39Updated last year
- NegCLIP.☆32Updated 2 years ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆28Updated last year
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples☆23Updated 6 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆58Updated 2 months ago
- ✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).☆45Updated 2 months ago
- [ICLR 2024] Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models.☆80Updated 10 months ago
- Code for "Is CLIP ideal? No. Can we fix it? Yes!"☆16Updated 2 months ago
- 🔥 [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"☆15Updated 4 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆68Updated last year
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆39Updated 4 months ago
- Official repo of M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning☆23Updated 2 months ago
- ☆61Updated last year
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆34Updated 6 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆15Updated 2 months ago