fyabc / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆19Updated last week
Alternatives and similar repositories for vllm:
Users that are interested in vllm are comparing it to the libraries listed below
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆130Updated 7 months ago
- An initiative to replicate Sora☆102Updated 9 months ago
- ☆32Updated 7 months ago
- Florence-2☆54Updated this week
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆109Updated 2 months ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆134Updated 2 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆36Updated 4 months ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphs☆188Updated 6 months ago
- VimTS: A Unified Video and Image Text Spotter☆75Updated 2 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆107Updated 2 months ago
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".☆135Updated this week
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆70Updated 2 months ago
- sd3 dreambooth lora training book, adapted from the diffusers doc☆42Updated 7 months ago
- ☆145Updated last month
- ☆169Updated 6 months ago
- Chinese CLIP models with SOTA performance.☆51Updated last year
- ☆28Updated last month
- ☆26Updated 5 months ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- GLM Series Edge Models☆124Updated 2 weeks ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models☆272Updated 9 months ago
- a family of highly capabale yet efficient large multimodal models☆176Updated 4 months ago
- Port of Facebook's LLaMA model in C/C++☆77Updated this week
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆181Updated this week
- ☆51Updated last month
- ☆36Updated 3 months ago
- Multimodal Models in Real World☆428Updated 2 months ago
- mllm-npu: training multimodal large language models on Ascend NPUs☆90Updated 4 months ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆151Updated 3 weeks ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆56Updated 2 months ago