gameofdimension / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆9Updated last year
Related projects ⓘ
Alternatives and complementary repositories for vllm
- Large-scale exact string matching tool☆15Updated 2 weeks ago
- A prompt set of ChatGLM-6B☆14Updated last year
- qwen2 and llama3 cpp implementation☆34Updated 5 months ago
- XVERSE-7B: A multilingual large language model developed by XVERSE Technology Inc.☆50Updated 7 months ago
- ChatYuan-7B☆13Updated last year
- A more efficient GLM implementation!☆55Updated last year
- accelerate generating vector by using onnx model☆12Updated 10 months ago
- 基于Paddle进行语义检索并部署上线,支持多语言 This code is based on Paddle to do a semantic search, and deploy it. Multilingual support☆11Updated 2 years ago
- 演示 vllm 对中文大语言模型的神奇效果☆31Updated last year
- ☆12Updated last year
- GPT+神器,简单实用的一站式AGI架构,内置本地化,LLM模型,agent,矢量数据库,智能链chain☆48Updated last year
- Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory☆25Updated 6 months ago
- Yuren 13B is an information synthesis large language model that has been continuously trained based on Llama 2 13B, which builds upon the…☆14Updated last year
- A dataset template for guiding chat-models to self-cognition, including information about the model’s identity, capabilities, usage, limi…☆25Updated last year
- ☆13Updated 8 months ago
- AGM阿格姆:AI基因图谱模型,从token-weight权重微粒角度,探索AI模型,GPT\LLM大模型的内在运作机制。☆26Updated last year
- 全球首个StableVicuna中文优化版。☆65Updated last year
- Its an open source LLM based on MOE Structure.☆57Updated 4 months ago
- Service for Bert model to Vector. 高效的文本转向量(Text-To-Vector)服务,支持GPU多卡、多worker、多客户端调用,开箱即用。☆10Updated 2 years ago
- 集成了LLM与SDXL的AIGC应用程序☆25Updated 10 months ago
- 大语言模型训练和服务调研☆34Updated last year
- aigc evals☆10Updated 11 months ago