Hao840 / ADEM-VL
PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"
☆15Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for ADEM-VL
- AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)☆30Updated 6 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated this week
- Multimodal Video Understanding Framework (MVU)☆23Updated 5 months ago
- Public repository for the ECCV 2024 paper "Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation".☆19Updated last month
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆47Updated 4 months ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 3 months ago
- Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning☆32Updated 3 weeks ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆69Updated last month
- ☆25Updated last year
- [ACCV 2024 ] Official code for "DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention"☆14Updated 3 weeks ago
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆27Updated last week
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks☆40Updated last month
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- Official Implementation of KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models☆37Updated 2 weeks ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆16Updated 3 weeks ago
- ☆38Updated last year
- TensorFlow code for our ECCV'24 Workshop paper "LightAvatar: Efficient Head Avatar as Dynamic NeLF"☆22Updated this week
- ☆58Updated 4 months ago
- ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2☆51Updated 2 weeks ago
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆31Updated 11 months ago
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"☆21Updated this week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆31Updated 4 months ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆56Updated 2 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆35Updated last month
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆47Updated 10 months ago
- LiVOS: Light Video Object Segmentation with Gated Linear Matching☆13Updated last week
- PyTorch Implementation of "ASTRA: An Action Spotting TRAnsformer for Soccer Videos", ACM MMSports 2023. | 3rd place solution for SoccerNe…☆35Updated 5 months ago