This repository collects papers on VLLM applications. We will update new papers irregularly.
☆215Feb 23, 2026Updated 2 months ago
Alternatives and similar repositories for awesome-VLLMs
Users that are interested in awesome-VLLMs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official implementation of NeurlPS 2025 D&B paper: IndustryEQA: Pushing the frontiers of Embodied Question Answering in Industrial Sc…☆13Sep 25, 2025Updated 7 months ago
- A curated list of Awesome Personalized Large Multimodal Models resources☆57Mar 26, 2026Updated last month
- ☆14Jun 25, 2025Updated 10 months ago
- ☆25Apr 17, 2024Updated 2 years ago
- Repository containing code for CoRL 2020 paper on "Learning Object Manipulation Skills via Approximate State Estimation from Real Videos"☆17Dec 15, 2021Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [AAAI'26] Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augm…☆12Dec 5, 2025Updated 5 months ago
- 🔱 Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs☆71Mar 21, 2025Updated last year
- A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.☆573Apr 13, 2026Updated 3 weeks ago
- ☆12Jun 13, 2025Updated 10 months ago
- Collection of AWESOME vision-language models for vision tasks☆3,115Oct 14, 2025Updated 6 months ago
- this is a reproduction of my senior's graduation project☆14Jun 21, 2022Updated 3 years ago
- Fast and memory-efficient exact attention☆22Apr 10, 2026Updated 3 weeks ago
- [WACV 2025] Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection☆17Mar 23, 2025Updated last year
- ☆12Sep 2, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official Implementation of MoE-Loco: Mixture of Experts for Multitask Locomotion☆40Oct 22, 2025Updated 6 months ago
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆69Jul 8, 2025Updated 9 months ago
- ☆41Sep 9, 2025Updated 7 months ago
- Official implementation of "Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory"☆26Oct 29, 2024Updated last year
- A paper list of some recent works about Token Compress for Vit and VLM☆891Apr 14, 2026Updated 3 weeks ago
- 📚 A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for str…☆149Apr 13, 2026Updated 3 weeks ago
- project website for "depth sensing beyond LiDAR range"☆11Jul 28, 2020Updated 5 years ago
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- Pytorch Implementation of LoG 22 [Oral] -- Transductive Linear Probing: A Novel Framework for Few-Shot Node Classification☆17May 31, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆89Oct 26, 2025Updated 6 months ago
- Recurrent Neural Network Demo by PyBrain☆10Feb 2, 2015Updated 11 years ago
- Multimodal-Composite-Editing-and-Retrieval-update☆35Oct 13, 2025Updated 6 months ago
- PET/CT segmentation lymphoma☆22Sep 17, 2020Updated 5 years ago
- This is the official repo for the ICML 2025 paper "Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization" Tang et al☆20Jun 8, 2025Updated 10 months ago
- Gazebo support for the RoboCup 3D simulation league.☆12May 3, 2020Updated 6 years ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆89Nov 28, 2023Updated 2 years ago
- Official code for "Audio-Guided Attention Network for Weakly Supervised Violence Detection" (ICCECE2022).☆13Mar 25, 2022Updated 4 years ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,789Mar 12, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs☆14Mar 19, 2024Updated 2 years ago
- Code implementation for: From Virtual Games to Real-World Play☆47Jun 23, 2025Updated 10 months ago
- ☆36Feb 3, 2026Updated 3 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆54Jun 12, 2025Updated 10 months ago
- ☆31Feb 2, 2026Updated 3 months ago
- ☆17Aug 5, 2024Updated last year
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆55Mar 21, 2025Updated last year