fyabc / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆20Updated 2 weeks ago
Alternatives and similar repositories for vllm:
Users that are interested in vllm are comparing it to the libraries listed below
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆131Updated 3 weeks ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆145Updated 3 weeks ago
- Florence-2☆59Updated last week
- An initiative to replicate Sora☆103Updated 10 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆214Updated last week
- Multimodal Models in Real World☆437Updated 3 months ago
- ComfyUI YOLO-World Integration☆38Updated 7 months ago
- ☆174Updated 7 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models☆272Updated 10 months ago
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆224Updated last year
- ☆27Updated 6 months ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆100Updated 9 months ago
- ☆36Updated 4 months ago
- ☆351Updated 3 months ago
- Codebase for the Recognize Anything Model (RAM)☆72Updated last year
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- ☆173Updated 7 months ago
- a family of highly capabale yet efficient large multimodal models☆176Updated 5 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 5 months ago
- ☆426Updated 2 months ago
- VimTS: A Unified Video and Image Text Spotter☆76Updated 3 months ago
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆125Updated last month
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆284Updated this week
- Quick exploration into fine tuning florence 2☆299Updated 5 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆117Updated 3 weeks ago
- HPT - Open Multimodal LLMs from HyperGAI☆313Updated 8 months ago
- ☆164Updated last year
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆124Updated 3 months ago
- mllm-npu: training multimodal large language models on Ascend NPUs☆90Updated 5 months ago