This repository collects papers on VLLM applications. We will update new papers irregularly.
☆218Feb 23, 2026Updated 4 months ago
Alternatives and similar repositories for awesome-VLLMs
Users that are interested in awesome-VLLMs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official implementation of ECCV2024 paper "Facial Affective Behavior Analysis with Instruction Tuning"☆31Jan 8, 2025Updated last year
- ☆15Jun 25, 2025Updated last year
- The codebase for ABAW4 challenge of ECCV2022 workshop.☆21Jun 18, 2023Updated 3 years ago
- ☆25Apr 17, 2024Updated 2 years ago
- The official implementation of CVPR2023 paper "DISC: Learning from Noisy Labels via Dynamic Instance-Specific Selection and Correction"☆57Jul 19, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Repository containing code for CoRL 2020 paper on "Learning Object Manipulation Skills via Approximate State Estimation from Real Videos"☆17Dec 15, 2021Updated 4 years ago
- [AAAI'26] Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augm…☆11Dec 5, 2025Updated 7 months ago
- 🔱 Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs☆73Mar 21, 2025Updated last year
- Official code for the NeurIPS25 paper "RAT: Bridging RNN Efficiencyand Attention Accuracy in Language Modeling" (https://arxiv.org/abs/25…☆26Dec 10, 2025Updated 6 months ago
- A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.☆640Jun 3, 2026Updated last month
- [KDD'23] This is the code repo for our KDD'23 paper "DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling".☆11Jun 14, 2023Updated 3 years ago
- ☆14Jun 13, 2025Updated last year
- Fast and memory-efficient exact attention☆22Jun 26, 2026Updated last week
- Benchmarking memory-augmented robotic generalist policies☆118Jun 18, 2026Updated 2 weeks ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [WACV 2025] Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection☆17Mar 23, 2025Updated last year
- ☆15Sep 2, 2024Updated last year
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆72Jul 8, 2025Updated 11 months ago
- ☆34Sep 26, 2025Updated 9 months ago
- ☆41Sep 9, 2025Updated 9 months ago
- A paper list of some recent works about Token Compress for Vit and VLM☆929Jun 25, 2026Updated last week
- [WACV 2024 LLVM-AD Challenge] UCU Dataset☆15Sep 9, 2023Updated 2 years ago
- 📚 A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for str…☆184Jun 10, 2026Updated 3 weeks ago
- project website for "depth sensing beyond LiDAR range"☆11Jul 28, 2020Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code for "RSF: Optimizing Rigid Scene Flow From 3D Point Clouds Without Labels"☆10Jan 17, 2023Updated 3 years ago
- Official PyTorch code of GroundVQA (CVPR'24)☆63Sep 13, 2024Updated last year
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆90Oct 26, 2025Updated 8 months ago
- Recurrent Neural Network Demo by PyBrain☆10Feb 2, 2015Updated 11 years ago
- Multimodal-Composite-Editing-and-Retrieval-update☆35Oct 13, 2025Updated 8 months ago
- PET/CT segmentation lymphoma☆22Sep 17, 2020Updated 5 years ago
- This is the official repo for the ICML 2025 paper "Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization" Tang et al☆20Jun 8, 2025Updated last year
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆90Nov 28, 2023Updated 2 years ago
- Code for "AffordanceLLM: Grounding Affordance from Vision Language Models"☆14Oct 18, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,825Mar 12, 2026Updated 3 months ago
- ☆14Aug 24, 2015Updated 10 years ago
- CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs☆14Mar 19, 2024Updated 2 years ago
- Code implementation for: From Virtual Games to Real-World Play☆48Jun 23, 2025Updated last year
- ☆12Oct 5, 2024Updated last year
- Pytorch implementation of the paper 'Towards Scenario Generalization for Vision-based Roadside 3D Object Detection'☆17Mar 9, 2025Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆54Jun 12, 2025Updated last year