kvcache-ai/TrEnv-X

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kvcache-ai/TrEnv-X)

kvcache-ai / TrEnv-X

☆95

Alternatives and similar repositories for TrEnv-X

Users that are interested in TrEnv-X are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Jul 17, 2026Updated last week
inclusionAI / AState
View on GitHub
☆42Updated this week
mlc-ai / tirx-kernels
View on GitHub
ML kernels and benchmarking infrastructure written in TIRx
☆70Updated this week
DeepLink-org / DLSlime
View on GitHub
Composable and Embeddable Communication Runtime for Distributed AI Services
☆102Jun 5, 2026Updated last month
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆10Aug 19, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
axio-project / FuseLink
View on GitHub
Efficient GPU communication over multiple NICs.
☆29Nov 20, 2025Updated 8 months ago
RLsys-Foundation / APRIL
View on GitHub
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…
☆60Oct 11, 2025Updated 9 months ago
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆27Jul 14, 2026Updated last week
inclusionAI / asystem-amem
View on GitHub
A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.
☆113Dec 17, 2025Updated 7 months ago
TransferQueue / TransferQueue
View on GitHub
[Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…
☆16Jan 16, 2026Updated 6 months ago
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
Infini-AI-Lab / vortex_torch
View on GitHub
Vortex: Programmable Sparse Attention for Agents as Algorithm Designers
☆67Jun 24, 2026Updated last month
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆982Jul 4, 2026Updated 3 weeks ago
microsoft / ParrotServe
View on GitHub
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆223Sep 21, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆255Jun 21, 2026Updated last month
nex-agi / NexVenusCL
View on GitHub
Nex Venus Communication Library
☆75Nov 17, 2025Updated 8 months ago
pkusys / Jolteon
View on GitHub
Automatic resource configuration for serverless workflows.
☆22Mar 24, 2024Updated 2 years ago
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,151Updated this week
blitz-serving / trace-replayer
View on GitHub
Repo to replay Qwen trace
☆31Jan 9, 2026Updated 6 months ago
Hanchenli / vllm-continuum
View on GitHub
Preview Code for Continuum Paper
☆90Updated this week
Terra-Flux / PolyRL
View on GitHub
[NSDI'26] PolyRL is a reinforcement learning framework for LLM that harvest spot instances on the cloud to reduce cost.
☆19Mar 30, 2026Updated 3 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Multi-LLM / prism-research
View on GitHub
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆71Mar 17, 2026Updated 4 months ago
alibaba-edu / qwen-bailian-usagetraces-anon
View on GitHub
☆153Apr 23, 2026Updated 3 months ago
togethercomputer / ParallelKernelBench
View on GitHub
☆44Jul 1, 2026Updated 3 weeks ago
chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 7 months ago
microsoft / llm-42
View on GitHub
[Accepted to SOSP 2026] Fast Deterministic LLM Inference
☆22Updated this week
romitjain / kachua-mlsys
View on GitHub
[MLSys 26] 🥇 Solution for Gated Delta Net Track of MLSys 26 Flash infer competition
☆35May 22, 2026Updated 2 months ago
gaocegege / xuruowei-forever
View on GitHub
https://xuruowei.com 是她的家人朋友们和她的爱人高策为纪念她留下的。徐若薇于 2026 年 2 月 28 日离世。我们希望通过这个时间线纪念她的一生——照片、故事、文字、音乐与她钟爱的一切。沿着她生命的轨迹漫步，重新触摸那些有温度的瞬间。
☆28Apr 1, 2026Updated 3 months ago
aibrix / PrisKV
View on GitHub
High Performance KV Cache Store for LLM
☆59May 20, 2026Updated 2 months ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ademeure / cuda-side-boost
View on GitHub
☆60Feb 24, 2026Updated 5 months ago
Mogball / triton_lite
View on GitHub
☆20May 24, 2025Updated last year
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
mlc-ai / pith-train
View on GitHub
Compact and Agent-Native MoE Training System
☆299Updated this week
ovg-project / kvcached
View on GitHub
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆1,115Updated this week
LeiWang1999 / Stream-k.tvm
View on GitHub
☆20Sep 28, 2024Updated last year
UT-InfraAI / cuco
View on GitHub
An agent for CUDA compute-communication kernel co-design
☆35May 7, 2026Updated 2 months ago