ubergarm / r1-ktransformers-guideLinks
run DeepSeek-R1 GGUFs on KTransformers
☆252Updated 7 months ago
Alternatives and similar repositories for r1-ktransformers-guide
Users that are interested in r1-ktransformers-guide are comparing it to the libraries listed below
Sorting:
- High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.☆1,293Updated this week
- LM inference server implementation based on *.cpp.☆279Updated last month
- KTransformers 一键部署脚本☆51Updated 5 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆265Updated 2 months ago
- gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。☆213Updated 2 weeks ago
- LLM 并发性能测试工具,支持自动化压力测试和性能报告生成。☆160Updated 6 months ago
- C++ implementation of Qwen-LM☆606Updated 10 months ago
- vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆280Updated this week
- ☆336Updated 3 weeks ago
- Community maintained hardware plugin for vLLM on Ascend☆1,179Updated last week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆208Updated last month
- LLM Inference benchmark☆426Updated last year
- ☆430Updated 3 weeks ago
- A streamlined and customizable framework for efficient large model evaluation and performance benchmarking☆1,762Updated last week
- a huggingface mirror site.☆304Updated last year
- LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU vi…☆817Updated this week
- The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.☆465Updated last month
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆197Updated 3 weeks ago
- 中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)☆609Updated last year
- ☆355Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆64Updated 11 months ago
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆437Updated 3 weeks ago
- vLLM Documentation in Chinese Simplified / vLLM 中文文档☆110Updated last month
- The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoi…☆43Updated last month
- ☆572Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆39Updated last month
- Low-bit LLM inference on CPU/NPU with lookup table☆866Updated 4 months ago
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆381Updated last week
- Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud(通义点金:面向金融行业的大模型)☆360Updated last month
- Yuan 2.0 Large Language Model☆688Updated last year