Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs
☆29Dec 17, 2024Updated last year
Alternatives and similar repositories for nitro
Users that are interested in nitro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.☆24Jul 18, 2025Updated 9 months ago
- ☆23Aug 14, 2024Updated last year
- ☆52May 19, 2025Updated 11 months ago
- Multi-Layer Key-Value sharing experiments on Pythia models☆34Jun 14, 2024Updated last year
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆81Aug 12, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- Cluster doctor skills☆15Feb 20, 2026Updated 2 months ago
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- Fast low-bit matmul kernels in Triton☆446Apr 27, 2026Updated last week
- ☆16Aug 19, 2024Updated last year
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆58Jul 23, 2024Updated last year
- ☆105Sep 9, 2024Updated last year
- Files used for the evaluation of uiCA☆18Dec 14, 2022Updated 3 years ago
- Community maintained hardware plugin for vLLM on AWS Neuron☆29Mar 20, 2026Updated last month
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆30Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆118Updated this week
- Official repository Flash Local Linear Attention☆23Apr 23, 2026Updated last week
- ☆24Apr 7, 2026Updated 3 weeks ago
- A side project that follows all the acceleration tricks in tinyllama, with the minimal modification to the huggingface transformers code.☆13Sep 2, 2024Updated last year
- ☆12Jan 18, 2023Updated 3 years ago
- Lightweight framework for 3D rendering.☆11Jun 5, 2023Updated 2 years ago
- 很好用的tnn classify demo☆11Mar 24, 2021Updated 5 years ago
- MegaStyle, 面向一致性与多样性的可扩展风格数据生成框架☆94Apr 23, 2026Updated last week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A script to reorganize 'Want to go' Saved places in Google Maps into separate lists by category.☆11May 14, 2024Updated last year
- Codeplay's tutorial LLDB-MSP430 - as presented at the 2016 EuroLLVM Developers' Meeting in Barcelona.☆11Mar 15, 2016Updated 10 years ago
- ☆19Jun 8, 2021Updated 4 years ago
- ☆14Mar 5, 2024Updated 2 years ago
- Automatic Differentiation for Gradient Boosted Decision Trees.☆13May 17, 2022Updated 3 years ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆338Jul 2, 2024Updated last year
- ☆20Apr 25, 2026Updated last week
- SGLang kernel library for NPU☆128Updated this week
- Tokenflood is a load testing framework for simulating arbitary loads on instruction-tuned LLMs☆45Apr 26, 2026Updated last week
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Minimal implementation of a Byte Pair Encoding (BPE) tokenizer in Zig☆14Apr 7, 2025Updated last year
- Yad2 smart scraper with a minimal setup☆20Jun 18, 2023Updated 2 years ago
- ☆14May 13, 2024Updated last year
- Low-Rank Llama Custom Training☆23Mar 27, 2024Updated 2 years ago
- Lightweight Chisel template☆13May 30, 2020Updated 5 years ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 7 months ago
- IntelliJ platform plugin for Wavefront OBJ format☆15Apr 20, 2026Updated 2 weeks ago