vproxy-tools / ktransformersLinks
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
☆44Updated 7 months ago
Alternatives and similar repositories for ktransformers
Users that are interested in ktransformers are comparing it to the libraries listed below
Sorting:
- run DeepSeek-R1 GGUFs on KTransformers☆258Updated 9 months ago
- Efficient inference of large language models.☆151Updated 2 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models forked from deepseek-ai/Janus☆17Updated 10 months ago
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆208Updated 2 months ago
- Static suckless single batch CUDA-only qwen3-0.6B mini inference engine☆519Updated 3 months ago
- ☆17Updated 8 months ago
- 电子鹦鹉 / Toy Language Model☆225Updated this week
- CPU inference for the DeepSeek family of large language models in C++☆315Updated 2 months ago
- MoonPalace(月宫)是由 Moonshot AI 月之暗面提供的 API 调试工具。☆219Updated 11 months ago
- 支持中文场景的的小语言模型 llama2.c-zh☆150Updated last year
- An AI agent to control drones from your CLI☆141Updated 4 months ago
- a huggingface mirror site.☆318Updated last year
- ☆50Updated 3 months ago
- High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.☆1,347Updated this week
- 大模型中文测试题库-民间版本☆90Updated 2 years ago
- Wanna breeze through some papers?☆65Updated last month
- xllamacpp - a Python wrapper of llama.cpp☆66Updated last week
- run chatglm3-6b in BM1684X☆40Updated last year
- LM inference server implementation based on *.cpp.☆293Updated 2 weeks ago
- KTransformers 一键部署脚本☆55Updated 7 months ago
- C++ implementation of Qwen-LM☆610Updated last year
- LvLLM is a special NUMA extension of vllm that makes full use of CPU and memory resources, reduces GPU memory requirements, and features …☆87Updated this week
- ☆113Updated last year
- an open high-performance Optical Character Recognition (OCR) toolkit☆304Updated 4 months ago
- 360zhinao☆291Updated 6 months ago
- MiniCPM on iOS.☆67Updated 8 months ago
- Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory☆29Updated last year
- ai法律团队☆44Updated 11 months ago
- Its an open source LLM based on MOE Structure.☆58Updated last year
- ☆136Updated 9 months ago