This repository collects papers on VLLM applications. We will update new papers irregularly.
☆216Feb 23, 2026Updated 3 months ago
Alternatives and similar repositories for awesome-VLLMs
Users that are interested in awesome-VLLMs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official implementation of NeurlPS 2025 D&B paper: IndustryEQA: Pushing the frontiers of Embodied Question Answering in Industrial Sc…☆15Sep 25, 2025Updated 8 months ago
- The official implementation of ECCV2024 paper "Facial Affective Behavior Analysis with Instruction Tuning"☆31Jan 8, 2025Updated last year
- ☆15Jun 25, 2025Updated 11 months ago
- The codebase for ABAW4 challenge of ECCV2022 workshop.☆21Jun 18, 2023Updated 2 years ago
- ☆25Apr 17, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Repository containing code for CoRL 2020 paper on "Learning Object Manipulation Skills via Approximate State Estimation from Real Videos"☆17Dec 15, 2021Updated 4 years ago
- [AAAI'26] Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augm…☆11Dec 5, 2025Updated 6 months ago
- A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.☆622Jun 3, 2026Updated last week
- ☆13Jun 13, 2025Updated last year
- Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and Dee…☆63Mar 18, 2025Updated last year
- Collection of AWESOME vision-language models for vision tasks☆3,125Oct 14, 2025Updated 8 months ago
- Fast and memory-efficient exact attention☆21Apr 10, 2026Updated 2 months ago
- [WACV 2025] Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection☆17Mar 23, 2025Updated last year
- ☆14Sep 2, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Private AI Hub (P8Hub) - Host and use your own AI Services. Keep everything simple and private.☆29Nov 19, 2023Updated 2 years ago
- DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies☆44Aug 14, 2025Updated 10 months ago
- ☆41Sep 9, 2025Updated 9 months ago
- This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations.☆24Aug 2, 2024Updated last year
- Official implementation of "Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory"☆26Oct 29, 2024Updated last year
- A paper list of some recent works about Token Compress for Vit and VLM☆918Jun 2, 2026Updated last week
- 📚 A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for str…☆174Jun 5, 2026Updated last week
- project website for "depth sensing beyond LiDAR range"☆11Jul 28, 2020Updated 5 years ago
- Code for "RSF: Optimizing Rigid Scene Flow From 3D Point Clouds Without Labels"☆10Jan 17, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The official implementation of "Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification"☆93Mar 14, 2024Updated 2 years ago
- Official PyTorch code of GroundVQA (CVPR'24)☆63Sep 13, 2024Updated last year
- Pytorch Implementation of LoG 22 [Oral] -- Transductive Linear Probing: A Novel Framework for Few-Shot Node Classification☆17May 31, 2023Updated 3 years ago
- The runner-up solution of AICITY Challenge Track2 (Vehicle Re-Identification) at CVPR 2021 Workshop.☆20May 3, 2022Updated 4 years ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆89Oct 26, 2025Updated 7 months ago
- ☆30Aug 30, 2024Updated last year
- Fast-Slow Test-time Adaptation for Online Vision-and-Language Navigation☆35Dec 5, 2025Updated 6 months ago
- Recurrent Neural Network Demo by PyBrain☆10Feb 2, 2015Updated 11 years ago
- Multimodal-Composite-Editing-and-Retrieval-update☆35Oct 13, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- PET/CT segmentation lymphoma☆22Sep 17, 2020Updated 5 years ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆90Nov 28, 2023Updated 2 years ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,817Mar 12, 2026Updated 3 months ago
- Code for "AffordanceLLM: Grounding Affordance from Vision Language Models"☆14Oct 18, 2024Updated last year
- ☆14Aug 24, 2015Updated 10 years ago
- ☆39Feb 3, 2026Updated 4 months ago
- Pytorch implementation of the paper 'Towards Scenario Generalization for Vision-based Roadside 3D Object Detection'☆17Mar 9, 2025Updated last year