hipudding / llama.cpp
LLM inference in C/C++
☆11Updated this week
Alternatives and similar repositories for llama.cpp:
Users that are interested in llama.cpp are comparing it to the libraries listed below
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆240Updated 3 weeks ago
- Inference code for LLaMA models☆118Updated last year
- Community maintained hardware plugin for vLLM on Ascend☆393Updated this week
- ☆604Updated 8 months ago
- ☆90Updated last year
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆82Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆257Updated this week
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆678Updated 2 months ago
- ☆324Updated 2 months ago
- ☆158Updated this week
- export llama to onnx☆118Updated 3 months ago
- 用于AIOPS24挑战赛的Demo☆61Updated 9 months ago
- LLM101n: Let's build a Storyteller 中文版☆130Updated 7 months ago
- ☆139Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆132Updated 3 months ago
- ☆409Updated this week
- ☆127Updated 3 months ago
- ☆45Updated last year
- FlagPerf is an open-source software platform for benchmarking AI chips.☆325Updated last month
- Optimize QWen1.5 models with TensorRT-LLM☆17Updated 10 months ago
- llm-export can export llm model to onnx.☆274Updated 2 months ago
- 高性能文本 Tokenizer 库☆28Updated last year
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆61Updated last year
- ☆145Updated 2 months ago
- ☆125Updated 3 weeks ago
- C++ implementation of Qwen-LM☆582Updated 3 months ago
- Transformer related optimization, including BERT, GPT☆59Updated last year
- GLake: optimizing GPU memory management and IO transmission.☆449Updated last week
- 欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。☆305Updated 8 months ago
- ☆46Updated this week