☆96Feb 18, 2026Updated last week
Alternatives and similar repositories for popcorn-cli
Users that are interested in popcorn-cli are comparing it to the libraries listed below
Sorting:
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆208Feb 18, 2026Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆74Feb 18, 2026Updated last week
- ☆14Jul 5, 2025Updated 7 months ago
- ☆21Mar 3, 2025Updated 11 months ago
- ☆12Aug 26, 2025Updated 6 months ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- PolyLib official git.☆11Jan 27, 2026Updated last month
- Musings in GEMM (General Matrix Multiplication)☆14Dec 14, 2025Updated 2 months ago
- Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention☆45Oct 16, 2025Updated 4 months ago
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- ☆16May 11, 2017Updated 8 years ago
- ☆16Sep 24, 2024Updated last year
- AI Tensor Engine for ROCm☆356Feb 21, 2026Updated last week
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror☆521Updated this week
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆131Feb 21, 2026Updated last week
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆75Feb 11, 2026Updated 2 weeks ago
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆61Feb 23, 2025Updated last year
- extensible collectives library in triton☆95Mar 31, 2025Updated 11 months ago
- ☆60Updated this week
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆52Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Feb 20, 2026Updated last week
- Microprocessor 2 Lab Template☆11Apr 29, 2024Updated last year
- a website for accessing many models through api(deepseek、Qwen、Hunyuan etc.)☆17Jul 12, 2025Updated 7 months ago
- Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claud…☆30Mar 20, 2025Updated 11 months ago
- CLI utility to inspect and explore .safetensors and .gguf files☆47Oct 28, 2025Updated 3 months ago
- super repo for rocm libraries☆259Updated this week
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated 10 months ago
- Collection of scripts used for BlueField SoC system management.☆31Feb 19, 2026Updated last week
- a minimal cache manager for PagedAttention, on top of llama3.☆136Aug 26, 2024Updated last year
- Coursera Week 2: Python scripting and SQL☆12Feb 21, 2022Updated 4 years ago
- Resize icon for STM32Cube IDE ( toolbar )☆12Sep 23, 2021Updated 4 years ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆84Feb 11, 2026Updated 2 weeks ago
- It is an LLM-based AI agent, which can write correct and efficient gpu kernels automatically.☆68Updated this week
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆466Dec 31, 2025Updated last month
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆103Jan 8, 2026Updated last month
- making the official triton tutorials actually comprehensible☆119Aug 25, 2025Updated 6 months ago
- ☆53Updated this week