chenin-wang / awesome_ai_paper
☆13Updated this week
Alternatives and similar repositories for awesome_ai_paper:
Users that are interested in awesome_ai_paper are comparing it to the libraries listed below
- Official PyTorch Implementation of ParGo: Bridging Vision-Language with Partial and Global Views. (AAAI 2025)☆11Updated 3 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆37Updated 4 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆23Updated 6 months ago
- 中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine☆78Updated 10 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆20Updated last month
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024☆14Updated 6 months ago
- The official code for MedAgent_Pro☆16Updated this week
- MRGen: Segmentation Data Engine for Underrepresented MRI Modalities☆18Updated last month
- ☆44Updated last month
- [CVPR 2024] FairCLIP: Harnessing Fairness in Vision-Language Learning☆72Updated 3 weeks ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆51Updated 5 months ago
- Awsome works based on SSM and Mamba☆17Updated last year
- [CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering☆18Updated 9 months ago
- The first Chinese medical large vision-language model designed to integrate the analysis of textual and visual data☆60Updated last year
- ☆14Updated last year
- [CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…☆40Updated 8 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- ☆16Updated last year
- Pytorch Implementation for CVPR 2024 paper: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation☆37Updated last month
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆57Updated 4 months ago
- ☆19Updated 5 months ago
- ☆11Updated 11 months ago
- [AAAI 2024] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training☆82Updated last year
- [CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"☆32Updated last week
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆158Updated 2 months ago
- ☆67Updated 5 months ago
- GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.☆64Updated 3 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆38Updated 6 months ago
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆65Updated 10 months ago
- ☆43Updated last year