SGLang kernel library for NPU
☆101Feb 28, 2026Updated last week
Alternatives and similar repositories for sgl-kernel-npu
Users that are interested in sgl-kernel-npu are comparing it to the libraries listed below
Sorting:
- Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend☆113Updated this week
- ☆20Jun 13, 2025Updated 8 months ago
- LeetGPU Solutions☆111Oct 9, 2025Updated 4 months ago
- See vLLM official support: https://github.com/vllm-project/vllm-ascend☆11Feb 5, 2025Updated last year
- ☆118Sep 22, 2025Updated 5 months ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- mllm-npu: training multimodal large language models on Ascend NPUs☆95Aug 29, 2024Updated last year
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆54Feb 6, 2026Updated last month
- LMCache on Ascend☆51Updated this week
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Jan 16, 2026Updated last month
- JittorInfer is a high-performance C++ inference framework designed for large language models on Huawei's Ascend AI processor.☆79Feb 9, 2026Updated 3 weeks ago
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆29Jan 13, 2026Updated last month
- A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…☆88Feb 2, 2026Updated last month
- Efficient and easy multi-instance LLM serving☆528Sep 3, 2025Updated 6 months ago
- Lightweight framework for 3D rendering.☆11Jun 5, 2023Updated 2 years ago
- ☆12Oct 19, 2014Updated 11 years ago
- LlamaNet: Decentralized Inference Swarm for llama.cpp☆23Jan 18, 2026Updated last month
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- Cataloging released Triton kernels.☆296Sep 9, 2025Updated 5 months ago
- Generate Linux Perf event tables for Apple Silicon☆17Dec 16, 2025Updated 2 months ago
- A script to reorganize 'Want to go' Saved places in Google Maps into separate lists by category.☆11May 14, 2024Updated last year
- ☆11Dec 23, 2025Updated 2 months ago
- ☆11Nov 13, 2020Updated 5 years ago
- ☆14Nov 5, 2025Updated 4 months ago
- custom controller☆11Jan 3, 2024Updated 2 years ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆98Dec 17, 2025Updated 2 months ago
- a fast and customizable CUDA int4 tensor core gemm☆15Aug 2, 2024Updated last year
- A high-performance trading bot implemented in Rust, designed to detect live arbitrage opportunities in the SPX options market. The bot in…☆15Nov 25, 2024Updated last year
- A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.☆24Jan 4, 2026Updated 2 months ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Jan 6, 2026Updated 2 months ago
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- ☆15Dec 9, 2025Updated 2 months ago
- yet another C++ 3d engine☆12Jan 24, 2020Updated 6 years ago
- Fully open reproduction of DeepSeek-R1☆11Mar 24, 2025Updated 11 months ago
- ☆13May 8, 2025Updated 9 months ago
- Yad2 smart scraper with a minimal setup☆17Jun 18, 2023Updated 2 years ago
- Compiler plugin for performance analysis of HIP applications☆13Apr 7, 2025Updated 10 months ago
- A docker image for One Student One Chip's debug exam☆10Sep 22, 2023Updated 2 years ago
- ☆14Oct 30, 2024Updated last year