run DeepSeek-R1 GGUFs on KTransformers
☆262Mar 3, 2025Updated last year
Alternatives and similar repositories for r1-ktransformers-guide
Users that are interested in r1-ktransformers-guide are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations☆17,191Updated this week
- ☆57Feb 10, 2025Updated last year
- KTransformers 一键部署脚本☆59Apr 18, 2025Updated last year
- llama.cpp fork with additional SOTA quants and improved performance☆2,554Updated this week
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆116Dec 15, 2025Updated 5 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- CI scripts designed to build a Pascal-compatible version of vLLM.☆12Aug 10, 2024Updated last year
- ☆16Dec 16, 2024Updated last year
- Voxtral is a state-of-the-art model developed to handle both speech transcription and audio understanding with remarkable accuracy and ef…☆30Jul 26, 2025Updated 9 months ago
- ☆27Apr 14, 2025Updated last year
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- A chess arena for large language models☆39May 22, 2025Updated last year
- 大语言模型工具集☆28Aug 1, 2025Updated 9 months ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,223Updated this week
- Implementation for EACL 2024 paper "Corpus-Steered Query Expansion with Large Language Models"☆13Mar 19, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- JotItNow is a AI Voice Notes App☆25Mar 6, 2025Updated last year
- Yet Another (LLM) Web UI, made with Gemini☆12Dec 25, 2024Updated last year
- V.I.S.O.R., my in-development AI-powered voice assistant with integrated memory!☆36Nov 20, 2025Updated 6 months ago
- A pure and fast NumPy implementation of Mamba with cache support.☆18Jun 16, 2024Updated last year
- ☆51May 31, 2024Updated last year
- Zero-Shot Summarization with GPT-3☆17Sep 11, 2023Updated 2 years ago
- d.run website☆17May 18, 2026Updated last week
- An AI project to provide `private` chat and RAG service. 一个提供私有化检索增强生成的AI项目☆11Jul 14, 2024Updated last year
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆90May 11, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Offical implementation of our paper "Exploring the Potential of Diffusion Large Language Models in Code Generation".☆22Oct 29, 2025Updated 6 months ago
- AI Based "Happiness Optimizer"☆12Oct 20, 2024Updated last year
- AI Demo 项目,一个专门为希望学习和探索人工智能(AI)技术的开发者准备的实战案例集合。☆31May 17, 2026Updated last week
- hummingbird(蜂鸟)是由Golang编写的超轻量级物 联网平台,具有轻量级、快速、极低的内存占用等特性,特别适用于个人开发者或初创公司承接中小型物联网项目。☆17Aug 29, 2023Updated 2 years ago
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input a target size and the toolchain w…☆130May 11, 2026Updated 2 weeks ago
- Demo of an "always-on" AI assistant.☆24Feb 14, 2024Updated 2 years ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆28,137Updated this week
- A multimodal, function calling powered LLM webui.☆213Sep 23, 2024Updated last year
- ☆13Apr 28, 2019Updated 7 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- Experiments in Recurrent Highway Networks with Grouped Auxiliary Memory paper☆21Dec 15, 2019Updated 6 years ago
- Use an appropriate mix of LLMs based on https://nuenki.app/blog research to translate languages better than any one tool.☆27Jun 23, 2025Updated 11 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Jul 26, 2023Updated 2 years ago
- Данный проект основан на llama.cpp и компилирует только RPC-сервер, а так же вспомогательные утилиты, работающие в режиме RPC-клиента, не…☆24May 25, 2025Updated last year
- YOLO-NAS for ROS 2☆14Jun 5, 2023Updated 2 years ago
- The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".☆16Jul 2, 2024Updated last year