SGLang kernel library for NPU
☆108Mar 18, 2026Updated last week
Alternatives and similar repositories for sgl-kernel-npu
Users that are interested in sgl-kernel-npu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend☆115Updated this week
- ☆20Jun 13, 2025Updated 9 months ago
- MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation☆46Updated this week
- ☆121Sep 22, 2025Updated 6 months ago
- See vLLM official support: https://github.com/vllm-project/vllm-ascend☆11Feb 5, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Synthetic data generation for evaluating LLM symbolic and logic reasoning☆22Mar 6, 2026Updated 3 weeks ago
- ☆74Updated this week
- ☆17Mar 26, 2025Updated last year
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆58Feb 6, 2026Updated last month
- Community maintained hardware plugin for vLLM on Ascend☆1,805Mar 20, 2026Updated last week
- mllm-npu: training multimodal large language models on Ascend NPUs☆94Aug 29, 2024Updated last year
- ☆13Nov 3, 2025Updated 4 months ago
- A PyTorch native platform for training generative AI models☆16Nov 18, 2025Updated 4 months ago
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Jan 16, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A high-performance inference engine for LLMs, optimized for diverse AI accelerators.☆1,157Updated this week
- DLBlas: clean and efficient kernels☆35Mar 16, 2026Updated last week
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- A Triton-only attention backend for vLLM☆24Mar 17, 2026Updated last week
- Efficient and easy multi-instance LLM serving☆536Mar 12, 2026Updated 2 weeks ago
- Slony replication system for Postgresql☆42Mar 11, 2024Updated 2 years ago
- ☆12Oct 19, 2014Updated 11 years ago
- [Ebook]从零到百万店铺:一个没有计算机学位的普通人的系统设计实战之旅☆26Nov 11, 2025Updated 4 months ago
- A Triton JIT runtime and ffi provider in C++☆32Updated this week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆37Aug 29, 2025Updated 6 months ago
- A benchmark framework for LLM serving performance, based on API call☆14Apr 15, 2024Updated last year
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.☆24Jan 4, 2026Updated 2 months ago
- MultiPaxos and Disk Paxos in TLA+ and PlusCal☆13Jan 23, 2023Updated 3 years ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆100Dec 17, 2025Updated 3 months ago
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆48Updated this week
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Jan 6, 2026Updated 2 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆197Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Fine-Tune LLM Synthetic-Data application and "From Data to AGI: Unlocking the Secrets of Large Language Model"☆16Jul 5, 2024Updated last year
- JittorInfer is a high-performance C++ inference framework designed for large language models on Huawei's Ascend AI processor.☆80Mar 2, 2026Updated 3 weeks ago
- ☆13Sep 3, 2018Updated 7 years ago
- ☆52May 19, 2025Updated 10 months ago
- SRL_smp is a Sampling-Based Motion Planner library implemented in ROS designed for a differential drive robot.The package contains also a…☆10Jul 24, 2014Updated 11 years ago
- Training framework for Large Behavioral Models☆27Sep 17, 2025Updated 6 months ago
- Minimal TPU implementation with 8x8 systolic array and PyTorch integration☆56Jan 26, 2026Updated 2 months ago