fyabc / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆22Updated this week
Alternatives and similar repositories for vllm:
Users that are interested in vllm are comparing it to the libraries listed below
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆133Updated 2 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆145Updated 2 months ago
- ☆29Updated 7 months ago
- VimTS: A Unified Video and Image Text Spotter☆77Updated 4 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 6 months ago
- Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆49Updated 2 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆118Updated 4 months ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆138Updated 5 months ago
- ☆36Updated 5 months ago
- Chinese Stable Diffusion, zh SD,中文文生图,中文SD,中文Stable Diffusion☆48Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆75Updated 5 months ago
- A third-party component library based on Gradio.☆92Updated last week
- a family of highly capabale yet efficient large multimodal models☆178Updated 7 months ago
- ComfyUI YOLO-World Integration☆41Updated 8 months ago
- ☆56Updated last year
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆37Updated this week
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 5 months ago
- ☆176Updated 9 months ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆175Updated 3 months ago
- ☆182Updated 8 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆126Updated 4 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆125Updated 2 months ago
- Quick exploration into fine tuning florence 2☆305Updated 6 months ago
- ☆172Updated last month
- ☆366Updated last month
- Codebase for the Recognize Anything Model (RAM)☆76Updated last year
- A Simple Framework of Small-scale Large Multimodal Models for Video Understanding Based on TinyLLaVA_Factory.☆46Updated last week
- Long Context Transfer from Language to Vision☆368Updated 2 weeks ago
- 研究GOT-OCR-项目落地加速,不限语言☆59Updated 5 months ago
- ☆73Updated last year