ubergarm / r1-ktransformers-guideLinks

run DeepSeek-R1 GGUFs on KTransformers

☆255

Alternatives and similar repositories for r1-ktransformers-guide

Users that are interested in r1-ktransformers-guide are comparing it to the libraries listed below

Sorting:

thu-pacman / chitu
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
☆1,331Updated this week
gpustack / llama-box
LM inference server implementation based on *.cpp.
☆290Updated 3 months ago
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆268Updated 3 months ago
ModelCloud / GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU vi…
☆886Updated this week
lework / llm-benchmark
LLM 并发性能测试工具，支持自动化压力测试和性能报告生成。
☆184Updated 7 months ago
nlzy / vllm-gfx906
vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
☆327Updated last month
shell-nlp / gpt_server
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。
☆234Updated this week
maaaxinfinity / ktrun
KTransformers 一键部署脚本
☆54Updated 7 months ago
infinigence / Infini-Megrez
☆337Updated last month
david-xinyuwei / david-share
☆368Updated this week
intel / xFasterTransformer
☆431Updated 2 months ago
vllm-project / vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
☆1,357Updated this week
HFAiLab / hai-platform
一种任务级GPU算力分时调度的高性能深度学习训练平台
☆712Updated 2 years ago
andrewkchan / deepseek.cpp
CPU inference for the DeepSeek family of large language models in C++
☆313Updated last month
QwenLM / qwen.cpp
C++ implementation of Qwen-LM
☆606Updated 11 months ago
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆204Updated last month
ninehills / llm-inference-benchmark
LLM Inference benchmark
☆429Updated last year
gpustack / gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆216Updated 3 months ago
intel / ipex-llm-tutorial
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm
☆169Updated 6 months ago
microsoft / T-MAC
Low-bit LLM inference on CPU/NPU with lookup table
☆890Updated 5 months ago
Ascend / pytorch
Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch
☆454Updated this week
IEIT-Yuan / Yuan-2.0
Yuan 2.0 Large Language Model
☆690Updated last year
MooreThreads / vllm_musa
A high-throughput and memory-efficient inference and serving engine for LLMs
☆68Updated last year
padeoe / hf-mirror-site
a huggingface mirror site.
☆314Updated last year
OpenBMB / MiniCPM-CookBook
This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…
☆293Updated 4 months ago
ymcui / Chinese-Mixtral
中文Mixtral混合专家大模型（Chinese Mixtral MoE LLMs）
☆610Updated last year
allwefantasy / byzer-llm
Easy, fast, and cheap pretrain,finetune, serving for everyone
☆315Updated 4 months ago
UnicomAI / Unichat-llama3-Chinese
☆347Updated last year
foldl / chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)
☆746Updated this week
guqiong96 / Lvllm
LvLLM is a special NUMA extension of vllm that makes full use of CPU and memory resources, reduces GPU memory requirements, and features …
☆76Updated last week