ubergarm / r1-ktransformers-guideLinks
run DeepSeek-R1 GGUFs on KTransformers
☆254Updated 7 months ago
Alternatives and similar repositories for r1-ktransformers-guide
Users that are interested in r1-ktransformers-guide are comparing it to the libraries listed below
Sorting:
- LM inference server implementation based on *.cpp.☆286Updated 2 months ago
- High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.☆1,312Updated last week
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆267Updated 2 months ago
- KTransformers 一键部署脚本☆51Updated 6 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆211Updated 2 months ago
- CPU inference for the DeepSeek family of large language models in C++☆314Updated 3 weeks ago
- LLM 并发性能测试工具,支持自动化压力测试和性能报告生成。☆173Updated 7 months ago
- Low-bit LLM inference on CPU/NPU with lookup table☆879Updated 4 months ago
- vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆307Updated 3 weeks ago
- LLM Inference benchmark☆428Updated last year
- ☆360Updated this week
- C++ implementation of Qwen-LM☆606Updated 10 months ago
- gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。☆216Updated this week
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆201Updated 3 weeks ago
- ☆338Updated 2 weeks ago
- LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU vi…☆852Updated this week
- The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoi…☆49Updated 2 weeks ago
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆444Updated last month
- Community maintained hardware plugin for vLLM on Ascend☆1,262Updated this week
- a huggingface mirror site.☆307Updated last year
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆727Updated this week
- Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm☆169Updated 6 months ago
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆165Updated 3 months ago
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆138Updated 2 months ago
- ☆240Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆65Updated last year
- Port of Facebook's LLaMA model in C/C++☆63Updated 6 months ago
- Mixture-of-Experts (MoE) Language Model☆189Updated last year
- ☆347Updated last year
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆44Updated 5 months ago