Oneflow-Inc/serving

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Oneflow-Inc/serving)

Oneflow-Inc / serving

OneFlow Serving

☆20

Alternatives and similar repositories for serving

Users that are interested in serving are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Oneflow-Inc / conda-env
View on GitHub
☆12Mar 13, 2023Updated 3 years ago
carefree0910 / carefree-flow
View on GitHub
Deep Learning ❤️ OneFlow
☆19Aug 26, 2021Updated 4 years ago
Oneflow-Inc / one-fx
View on GitHub
A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.
☆13Apr 7, 2023Updated 3 years ago
Oneflow-Inc / vision
View on GitHub
Datasets, Transforms and Models specific to Computer Vision
☆91Nov 17, 2023Updated 2 years ago
Oneflow-Inc / oneflow-xrt
View on GitHub
☆24Apr 25, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Oneflow-Inc / oneflow-lite
View on GitHub
☆17Jan 1, 2024Updated 2 years ago
SkyworkAI / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆17Jun 3, 2024Updated 2 years ago
Oneflow-Inc / models
View on GitHub
Models and examples built with OneFlow
☆100Oct 16, 2024Updated last year
feifeibear / PyTorchMemTracer
View on GitHub
Depict GPU memory footprint during DNN training of PyTorch
☆11Nov 17, 2022Updated 3 years ago
zzk0 / triton
View on GitHub
Triton Inferece Server Model Config and Client Scripts
☆31Jan 7, 2022Updated 4 years ago
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
adityaiitb / PyProf
View on GitHub
A GPU performance profiling tool for PyTorch models
☆22Jul 5, 2022Updated 4 years ago
GeeeekExplorer / kkbot
View on GitHub
A Feishu/Lark AI agent bot
☆15Feb 27, 2026Updated 4 months ago
OpenPPL / ppl.kernel.cpu
View on GitHub
☆19Apr 6, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
megvii-research / IntLLaMA
View on GitHub
IntLLaMA: A fast and light quantization solution for LLaMA
☆19Jul 21, 2023Updated 2 years ago
Oneflow-Inc / one-yolov5
View on GitHub
A more efficient yolov5 with oneflow backend 🎉🎉🎉
☆216Jul 10, 2025Updated last year
thomaschlt / mla.c
View on GitHub
Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.
☆18Jan 15, 2025Updated last year
xdit-project / DiTCacheAnalysis
View on GitHub
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆34Nov 29, 2024Updated last year
thunlp / BlockFFN
View on GitHub
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
☆19Jan 10, 2026Updated 6 months ago
tlc-pack / cutlass_fpA_intB_gemm
View on GitHub
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Jun 21, 2026Updated 3 weeks ago
CodeWatch / AlgorithmNote
View on GitHub
AlgorithmNote is a knowledge sharing github page, mainly has three parts: algorithm, engineering and basic knowledge.
☆13Feb 17, 2015Updated 11 years ago
tlc-pack / libflash_attn
View on GitHub
Standalone Flash Attention v2 kernel without libtorch dependency
☆113Sep 10, 2024Updated last year
alexeigor / sd-benchmarks
View on GitHub
Stable Diffusion inference benchmarks
☆10Jun 14, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Oneflow-Inc / libai
View on GitHub
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
☆403Jul 31, 2025Updated 11 months ago
pigirons / conv3x3_m1
View on GitHub
This is a demo how to write a high performance convolution run on apple silicon
☆56Feb 8, 2022Updated 4 years ago
octoml / deformable-attention-kernel
View on GitHub
TVMScript kernel for deformable attention
☆25Dec 15, 2021Updated 4 years ago
eedalong / Dpex
View on GitHub
Distributed DataLoader For Pytorch Based On Ray
☆25Nov 5, 2021Updated 4 years ago
gofreelee / SpaceServe
View on GitHub
☆31Jul 13, 2026Updated last week
hujiecpp / Mini-Segment-Anything
View on GitHub
Distilling the powerful segment anything models into lightweight ones for efficient segmentation.
☆30Apr 27, 2023Updated 3 years ago
mtrebi / ImagesPrimitives
View on GitHub
Reproduce images using geometric primitives
☆19Nov 24, 2017Updated 8 years ago
inducer / isl
View on GitHub
Mirror of Sven Verdoolaege's isl at http://repo.or.cz/w/isl.git (occasionally with changes for islpy)
☆10Dec 16, 2025Updated 7 months ago
owenliang / learnpytorch
View on GitHub
☆11Jun 6, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
NathanSWard / simd_hash_map
View on GitHub
A c++ hash map/table which utilizes simd (specifically Intel x86 SSE/AVX)
☆12Apr 30, 2019Updated 7 years ago
triton-inference-server / common
View on GitHub
Common source, scripts and utilities shared across all Triton repositories.
☆80Updated this week
iree-org / iree-nvgpu
View on GitHub
☆48Mar 5, 2024Updated 2 years ago
bytedance / ByteTransformer
View on GitHub
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆479Mar 15, 2024Updated 2 years ago
chenzhuoyu / SimpleRPC
View on GitHub
A C++-based RPC framework
☆12Oct 28, 2021Updated 4 years ago
Ryu1845 / hyena-jax
View on GitHub
Implementation of Hyena Hierarchy in JAX
☆10Apr 30, 2023Updated 3 years ago
lliai / EMQ-series
View on GitHub
[ICCV-2023] EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
☆29Dec 6, 2023Updated 2 years ago