A domain-specific language (DSL) based on Triton but providing higher-level abstractions.
☆41Mar 5, 2026Updated 2 weeks ago
Alternatives and similar repositories for ninetoothed
Users that are interested in ninetoothed are comparing it to the libraries listed below
Sorting:
- ☆52Updated this week
- ☆125Jan 22, 2026Updated 2 months ago
- 算子库☆17Jul 9, 2025Updated 8 months ago
- 算子库(Rust)☆14Jul 24, 2025Updated 7 months ago
- ☆18Mar 4, 2025Updated last year
- 实验:rust 实现 llama2 推理☆17Feb 23, 2024Updated 2 years ago
- The Fundot programming language.☆14Dec 31, 2021Updated 4 years ago
- 训练营训练方向项目☆26Jan 28, 2026Updated last month
- InfiniTensor is a high-performance inference engine tailored for GPUs and AI accelerators. Its design focuses on effective deployment and…☆313Updated this week
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- 分层解耦的深度学习推理引擎☆78Feb 17, 2025Updated last year
- 开源软件通识课程 (Introduction to Open Source Software),本课程暂定设计面向信息大类专业的低年级学生☆46Feb 4, 2026Updated last month
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆84Updated this week
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆122Jun 14, 2025Updated 9 months ago
- Hypervisor written in Rust for the RISC-V 1.0 hypervisor extension☆16Oct 21, 2024Updated last year
- handle gguf files☆13Aug 14, 2025Updated 7 months ago
- ☆34Jun 20, 2023Updated 2 years ago
- Ship correct and fast LLM kernels to PyTorch☆145Jan 14, 2026Updated 2 months ago
- 基于 CUDA Driver API 的 cuda 运行时环境☆15Jul 30, 2025Updated 7 months ago
- ☆43Jan 8, 2025Updated last year
- Automated bottleneck detection and solution orchestration☆19Feb 24, 2026Updated 3 weeks ago
- ☆32Jul 2, 2025Updated 8 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 6 months ago
- FlagCX is a scalable and adaptive cross-chip communication library.☆179Updated this week
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- ☆20Sep 28, 2024Updated last year
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆49Feb 28, 2026Updated 3 weeks ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- ☆14Sep 25, 2025Updated 5 months ago
- OSDI 2023 Welder, deeplearning compiler☆33Nov 24, 2023Updated 2 years ago
- ☆33Jul 17, 2024Updated last year
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation☆32Nov 16, 2024Updated last year
- A lightweight triton-based General Matrix Multiplication (GEMM) library.☆55Updated this week
- ☆13Jan 7, 2025Updated last year
- Graph model execution API for Candle☆17Jul 27, 2025Updated 7 months ago
- Example of opencl on android. Edit from this bolg: http://developer.sonymobile.com/2013/10/29/boost-the-performance-of-your-android-app-w…☆10Mar 31, 2014Updated 11 years ago
- A Triton-only attention backend for vLLM☆24Feb 11, 2026Updated last month
- B-tree range map implementation for Rust☆13Oct 5, 2023Updated 2 years ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated last year