SGLang kernel library for NPU
☆128Apr 30, 2026Updated this week
Alternatives and similar repositories for sgl-kernel-npu
Users that are interested in sgl-kernel-npu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend☆119Updated this week
- ☆20Jun 13, 2025Updated 10 months ago
- MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation☆52Mar 25, 2026Updated last month
- LeetGPU Solutions☆116Oct 9, 2025Updated 6 months ago
- Softened ROSA QKV Operators for Training Next-Generation LLM Models☆36Apr 7, 2026Updated 3 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Synthetic data generation for evaluating LLM symbolic and logic reasoning☆22Mar 6, 2026Updated 2 months ago
- ☆74Updated this week
- ☆17Mar 26, 2025Updated last year
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆29Dec 17, 2024Updated last year
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆60Feb 6, 2026Updated 3 months ago
- Community maintained hardware plugin for vLLM on Ascend☆2,019Updated this week
- mllm-npu: training multimodal large language models on Ascend NPUs☆95Aug 29, 2024Updated last year
- ☆14Nov 3, 2025Updated 6 months ago
- A PyTorch native platform for training generative AI models☆17Apr 21, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆15Jan 16, 2026Updated 3 months ago
- Fully open reproduction of DeepSeek-R1☆11Mar 24, 2025Updated last year
- DLBlas: clean and efficient kernels☆39Apr 28, 2026Updated last week
- Efficient kernel for RMS normalization with fused operations, includes both forward and backward passes, compatibility with PyTorch.☆13Jun 5, 2024Updated last year
- A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.☆1,254Apr 29, 2026Updated last week
- Official repository Flash Local Linear Attention☆23Apr 23, 2026Updated last week
- ☆22Dec 18, 2024Updated last year
- Efficient and easy multi-instance LLM serving☆547Mar 12, 2026Updated last month
- Cataloging released Triton kernels.☆302Sep 9, 2025Updated 7 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆12Oct 19, 2014Updated 11 years ago
- A Triton JIT runtime and ffi provider in C++☆33Apr 28, 2026Updated last week
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆39Aug 29, 2025Updated 8 months ago
- ☆11Nov 13, 2020Updated 5 years ago
- A benchmark framework for LLM serving performance, based on API call☆14Apr 15, 2024Updated 2 years ago
- An efficient video loader for deep learning with smart shuffling that's super easy to digest☆55Sep 29, 2023Updated 2 years ago
- A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.☆28Jan 4, 2026Updated 4 months ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆16Jan 6, 2026Updated 4 months ago
- sgl-mindspore☆17Mar 23, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆105Dec 17, 2025Updated 4 months ago
- A lightweight triton-based General Matrix Multiplication (GEMM) library.☆61Apr 22, 2026Updated 2 weeks ago
- Fine-Tune LLM Synthetic-Data application and "From Data to AGI: Unlocking the Secrets of Large Language Model"☆16Jul 5, 2024Updated last year
- FlashKDA: high-performance Kimi Delta Attention kernels☆403Apr 22, 2026Updated 2 weeks ago
- ☆52May 19, 2025Updated 11 months ago
- SRL_smp is a Sampling-Based Motion Planner library implemented in ROS designed for a differential drive robot.The package contains also a…☆10Jul 24, 2014Updated 11 years ago
- Minimal TPU implementation with 8x8 systolic array and PyTorch integration☆60Jan 26, 2026Updated 3 months ago