zhaohb / fastapi_tritonserver
☆23Updated 3 months ago
Related projects: ⓘ
- ☆90Updated last year
- llm-export can export llm model to onnx.☆193Updated this week
- export llama to onnx☆91Updated 3 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆130Updated 3 weeks ago
- run ChatGLM2-6B in BM1684X☆48Updated 6 months ago
- Optimize QWen1.5 models with TensorRT-LLM☆16Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆120Updated 9 months ago
- Transformer related optimization, including BERT, GPT☆39Updated last year
- LLM101n: Let's build a Storyteller 中文版☆113Updated last month
- 纯c++的全平台llm加速库,支持python调用,支持baichuan, glm, llama, moss基座,手机端流畅运行chatglm-6B级模型单卡可达10000+token / s,☆44Updated last year
- ☆123Updated 3 months ago
- ☆572Updated last month
- simplify >2GB large onnx model☆41Updated 6 months ago
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆25Updated 6 months ago
- A more efficient GLM implementation!☆54Updated last year
- Triton Inferece Server Model Config and Client Scripts☆31Updated 2 years ago
- LLaMa/RWKV onnx models, quantization and testcase☆345Updated last year
- Compare multiple optimization methods on triton to imporve model service performance☆46Updated 8 months ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- llama inference for tencentpretrain☆95Updated last year
- Another ChatGLM2 implementation for GPTQ quantization☆54Updated 11 months ago
- 演示 vllm 对中文大语言模型的神奇效果☆31Updated 10 months ago
- 基于MNN-llm的安卓手机部署大语言模型:Qwen1.5-0.5B-Chat☆40Updated 5 months ago
- Imitate OpenAI with Local Models☆83Updated 3 weeks ago
- Transformer related optimization, including BERT, GPT☆58Updated last year
- qwen2 and llama3 cpp implementation☆34Updated 3 months ago
- 专注于Python/C++/CUDA、ML/DL/RL和NLP/KG/DS/LLM领域的技术分享。☆59Updated 2 months ago
- ☆140Updated 4 months ago
- ☆251Updated last week
- OpenLLaMA-Chinese, a permissively licensed open source instruction-following models based on OpenLLaMA☆64Updated last year