tjluyao / kv.run
View external linksLinks

A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.

☆23

Alternatives and similar repositories for kv.run

Users that are interested in kv.run are comparing it to the libraries listed below

Sorting:

leimao / Nsight-Compute-Docker-Image
View on GitHub
Nsight Compute In Docker
☆13Dec 21, 2023Updated 2 years ago
Oneflow-Inc / oneflow-lite
View on GitHub
☆17Jan 1, 2024Updated 2 years ago
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆31Jan 26, 2026Updated 2 weeks ago
linxihui / dkernel
View on GitHub
☆21Apr 17, 2025Updated 9 months ago
galeselee / Awesome_LLM_System-PaperList
View on GitHub
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆283Mar 6, 2025Updated 11 months ago
amirzandieh / QJL
View on GitHub
QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead
☆31Jan 27, 2025Updated last year
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆470Jan 8, 2026Updated last month
microsoft / chunk-attention
View on GitHub
☆85Apr 18, 2025Updated 9 months ago
InfiniTensor / InfiniTrain
View on GitHub
☆17Updated this week
FlyAIBox / dcu-in-action
View on GitHub
国产加速卡-海光DCU实战（大模型训练、微调、推理等）
☆69Aug 10, 2025Updated 6 months ago
LLMServe / SwiftTransformer
View on GitHub
High performance Transformer implementation in C++.
☆151Jan 18, 2025Updated last year
zishen-ucap / LTX-Video-xDiT
View on GitHub
This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerate…
☆11Dec 31, 2024Updated last year
Orange-OpenSource / LatSeq
View on GitHub
A Low-Impact Internal Latency Measurement Tool for OpenAirInterface
☆11Sep 13, 2023Updated 2 years ago
nvidia-riva / common
View on GitHub
Protocol buffers and other common resources.
☆13Jan 20, 2026Updated 3 weeks ago
rayleizhu / vllm-ra
View on GitHub
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Feb 29, 2024Updated last year
AlibabaPAI / FLASHNN
View on GitHub
☆105Sep 9, 2024Updated last year
shouxieai / cpp-rotation-album
View on GitHub
cpp rotation album，基于cpp eigen实现的3d旋转相册，GAMES101复现内容
☆12Jul 25, 2022Updated 3 years ago
ACA-Lab-SJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
wangzhaode / tokenizer.cpp
View on GitHub
A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.
☆22Jan 4, 2026Updated last month
bluryar / VoxCPM-ONNX
View on GitHub
☆25Jan 5, 2026Updated last month
Muhtasham / llm-inference-simulator
View on GitHub
🚀 LLM inference optimization simulator, modeling compute-bound prefill and memory-bound decode phases.
☆13Jul 12, 2025Updated 7 months ago
MLSysOps / InfraGym
View on GitHub
Empowering LLM Agents for Real-World Computer System Optimization
☆16Sep 10, 2025Updated 5 months ago
onosproject / openairinterface5g
View on GitHub
ONF's mirror of https://gitlab.eurecom.fr/oai/openairinterface5g
☆14Feb 6, 2026Updated last week
seifip / google-maps-rpa
View on GitHub
A script to reorganize 'Want to go' Saved places in Google Maps into separate lists by category.
☆12May 14, 2024Updated last year
botbahlul / android-autosrt-v2
View on GitHub
ANDROID APP to AUTO GENERATE SUBTITLE FILE and TRANSLATED SUBTITLE FILE (using unofficial online Google Translate API) for any audio/vide…
☆19May 5, 2024Updated last year
Idein / onnigiri
View on GitHub
☆13Jan 14, 2026Updated last month
wtudio / easyfetchcontent
View on GitHub
CMake modules for quickly importing third-party libraries by fetchcontent.
☆12Jun 1, 2025Updated 8 months ago
maomaoyuchengzi / MobileNetSSD-detect
View on GitHub
mobileNet SSD 基于caffe的前向检测
☆10Nov 30, 2018Updated 7 years ago
KSkun / OI-Templates
View on GitHub
My templates used in OI. All C++.
☆11Jul 17, 2018Updated 7 years ago
mllite / sklearn_explain
View on GitHub
Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score.
☆13Jan 21, 2021Updated 5 years ago
NivEz / yad2-scraper
View on GitHub
Yad2 smart scraper with a minimal setup
☆17Jun 18, 2023Updated 2 years ago
dog-qiuqiu / simple-rknn2
View on GitHub
The rknn2 API uses the secondary encapsulation of the process, which is easy for everyone to call. It is applicable to rk356x rk3588
☆48Jun 18, 2022Updated 3 years ago
InfraSail / higherdb
View on GitHub
A Rust LSM-Tree based lightweight storage engine.
☆13Apr 8, 2022Updated 3 years ago
digital-nomad-cheng / MTCNN_PyTorch
View on GitHub
☆10Jun 29, 2020Updated 5 years ago
tphakala / tflite_c
View on GitHub
TensorFlow Lite C precompiled library for Windows, Linux and macOS
☆11Dec 30, 2024Updated last year
pedrodeoliveira / locust-rest-grpc
View on GitHub
Distributed Load Testing of REST/gRPC APIs using Locust
☆10Sep 2, 2020Updated 5 years ago
delta1037 / FatigueDetectRKNN
View on GitHub
fatigue detect rknn/onnx model deploy in rk3568 npu（ROCK 3A）
☆17Dec 24, 2023Updated 2 years ago
smedegaard / hip_rs
View on GitHub
A rust wrapper for HIP
☆12Jun 10, 2025Updated 8 months ago
kemingy / gathers
View on GitHub
clustering algorithm implementation
☆13Nov 3, 2025Updated 3 months ago

tjluyao / kv.runView external linksLinks

Alternatives and similar repositories for kv.run

tjluyao / kv.run
View external linksLinks