π₯ LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench
β127Nov 10, 2025Updated 3 months ago
Alternatives and similar repositories for TritonForge
Users that are interested in TritonForge are comparing it to the libraries listed below
Sorting:
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM traβ¦β51Oct 11, 2025Updated 4 months ago
- PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant with Reliable Referencesβ20Jun 13, 2024Updated last year
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoningβ67Oct 31, 2025Updated 4 months ago
- β20Jun 3, 2023Updated 2 years ago
- β74Sep 15, 2025Updated 5 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β293Nov 7, 2025Updated 3 months ago
- Self-contained Python lib with zero-dependencies that give you a unified device properties for gpu, cpu, and npu. No more calling separatβ¦β15Dec 12, 2025Updated 2 months ago
- Quantized Attention on GPUβ44Nov 22, 2024Updated last year
- Bridge Megatron-Core to Hugging Face/Reinforcement Learningβ193Feb 24, 2026Updated last week
- Estimate MFU for DeepSeekV3β26Jan 5, 2025Updated last year
- β15Jul 13, 2025Updated 7 months ago
- β15Mar 2, 2025Updated last year
- OpenAI gym environments for goal-conditioned and language-conditioned reinforcement learningβ14Jan 27, 2026Updated last month
- β52May 19, 2025Updated 9 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ138Dec 5, 2025Updated 2 months ago
- Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.β18Jan 15, 2025Updated last year
- β28Jul 29, 2025Updated 7 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repoβ139Updated this week
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ260Aug 9, 2025Updated 6 months ago
- Research prototype of PRISM β a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.β58Aug 15, 2025Updated 6 months ago
- An Attention Superoptimizerβ22Jan 20, 2025Updated last year
- Measuring the Signal to Noise Ratio in Language Model Evaluationβ28Aug 19, 2025Updated 6 months ago
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)β37Feb 10, 2026Updated 3 weeks ago
- A Dual-RL method DVL: Dual-V Learning for offline and online reinforcement learningβ15Oct 22, 2023Updated 2 years ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β766Updated this week
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through gitβ¦β14Apr 9, 2025Updated 10 months ago
- Tensors and Dynamic neural networks in Python with strong GPU accelerationβ19Jun 26, 2025Updated 8 months ago
- Unit Scaling demo and experimentation codeβ16Mar 12, 2024Updated last year
- [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inferenceβ49Jun 17, 2025Updated 8 months ago
- Tile-based language built for AI computation across all scalesβ138Updated this week
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsityβ71Jul 5, 2025Updated 7 months ago
- Collection of kernels written in Triton languageβ178Jan 27, 2026Updated last month
- slime is an LLM post-training framework for RL Scaling.β4,381Updated this week
- Wave: Python Domain-Specific Language for High Performance Machine Learningβ45Updated this week
- Code for the paper "Manipulating Embeddings of Stable Diffusion Prompts".β15Aug 8, 2024Updated last year
- β22May 5, 2025Updated 9 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β91Jul 17, 2025Updated 7 months ago
- Async pipelined version of Verlβ124Apr 8, 2025Updated 10 months ago
- β79Feb 10, 2026Updated 3 weeks ago