Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200
☆83Feb 28, 2026Updated 3 weeks ago
Alternatives and similar repositories for qwen3.5-triton
Users that are interested in qwen3.5-triton are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- ☆23Jul 11, 2025Updated 8 months ago
- ☆16Feb 24, 2026Updated last month
- ☆13Mar 5, 2025Updated last year
- Read audio with FFmpeg into NumPy/PyTorch via ctypes (standard library module)☆11Aug 12, 2020Updated 5 years ago
- YouTube Assistant☆12May 15, 2023Updated 2 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆18Feb 9, 2026Updated last month
- Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.☆25Sep 29, 2024Updated last year
- ☆53Feb 24, 2026Updated last month
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆87Dec 14, 2023Updated 2 years ago
- Google closure-library wrapper for node.js☆23Mar 17, 2013Updated 13 years ago
- ☆22Sep 29, 2025Updated 5 months ago
- Ship correct and fast LLM kernels to PyTorch☆147Jan 14, 2026Updated 2 months ago
- ☆49Feb 27, 2026Updated 3 weeks ago
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆87Updated this week
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆26Updated this week
- ☆21Oct 2, 2025Updated 5 months ago
- ☆32Apr 19, 2025Updated 11 months ago
- API Server for storing and graphing real-time time-series data in MongoDB☆18Nov 3, 2014Updated 11 years ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- Latent Large Language Models☆19Aug 24, 2024Updated last year
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- ☆14Mar 8, 2025Updated last year
- ☆43Jan 30, 2026Updated last month
- ☆13Jun 3, 2023Updated 2 years ago
- A collection of reproducible inference engine benchmarks☆38Apr 22, 2025Updated 11 months ago
- Emacs minor mode that automatically demangles C++, D, and Rust symbols☆23Aug 22, 2021Updated 4 years ago
- ☆13Dec 15, 2025Updated 3 months ago
- TypeScript port of Google's Agent Development Kit (ADK): An open-source, code-first toolkit for building, evaluating, and deploying AI ag…☆38Nov 4, 2025Updated 4 months ago
- pichuang personal website☆19Jun 10, 2025Updated 9 months ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- Houdini Python Wiki☆18Mar 18, 2024Updated 2 years ago
- ☆19Mar 3, 2025Updated last year
- ☆64Updated this week
- The official implementation of Bi-Mamba☆14Oct 22, 2025Updated 5 months ago
- Reimplementation of https://github.com/montemac/algebraic_value_editing in pure PyTorch for efficiency on large models☆11Jun 28, 2023Updated 2 years ago
- 삼각형의 실전! Triton☆16Feb 15, 2024Updated 2 years ago
- Manage ML configuration with pydantic☆16Mar 18, 2026Updated last week
- ☆28Sep 15, 2025Updated 6 months ago