π₯ LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench
β134Nov 10, 2025Updated 5 months ago
Alternatives and similar repositories for TritonForge
Users that are interested in TritonForge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM traβ¦β55Oct 11, 2025Updated 6 months ago
- β37Mar 31, 2026Updated last week
- β79Sep 15, 2025Updated 6 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoningβ68Oct 31, 2025Updated 5 months ago
- [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inferenceβ50Jun 17, 2025Updated 9 months ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- β19Jun 3, 2023Updated 2 years ago
- OpenAI gym environments for goal-conditioned and language-conditioned reinforcement learningβ14Jan 27, 2026Updated 2 months ago
- Estimate MFU for DeepSeekV3β26Jan 5, 2025Updated last year
- Bridge Megatron-Core to Hugging Face/Reinforcement Learningβ205Apr 2, 2026Updated last week
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β299Nov 7, 2025Updated 5 months ago
- Tile-based language built for AI computation across all scalesβ141Mar 27, 2026Updated 2 weeks ago
- An agent for CUDA compute-communication kernel co-designβ34Mar 24, 2026Updated 2 weeks ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ162Feb 27, 2026Updated last month
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through gitβ¦β14Apr 9, 2025Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Self-contained Python lib with zero-dependencies that give you a unified device properties for gpu, cpu, and npu. No more calling separatβ¦β14Mar 30, 2026Updated 2 weeks ago
- Quantized Attention on GPUβ44Nov 22, 2024Updated last year
- β37Feb 12, 2025Updated last year
- PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant with Reliable Referencesβ21Jun 13, 2024Updated last year
- Collection of kernels written in Triton languageβ187Jan 27, 2026Updated 2 months ago
- β14Jul 13, 2025Updated 8 months ago
- A Dual-RL method DVL: Dual-V Learning for offline and online reinforcement learningβ15Oct 22, 2023Updated 2 years ago
- slime is an LLM post-training framework for RL Scaling.β5,139Apr 5, 2026Updated last week
- β51May 19, 2025Updated 10 months ago
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [DEPRECATED] Moved to ROCm/rocm-libraries repoβ139Updated this week
- DeeperGEMM: crazy optimized versionβ86May 5, 2025Updated 11 months ago
- Materials for learning SGLangβ792Jan 5, 2026Updated 3 months ago
- a fully learned index for larger-than-memory databasesβ15Sep 17, 2022Updated 3 years ago
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsityβ72Mar 10, 2026Updated last month
- Reproducing R1 for Code with Reliable Rewardsβ302May 5, 2025Updated 11 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ261Aug 9, 2025Updated 8 months ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatioβ¦β99Sep 11, 2025Updated 7 months ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programmingβ183Updated this week
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.β103Dec 17, 2025Updated 3 months ago
- β22May 5, 2025Updated 11 months ago
- Flash-Linear-Attention models beyond languageβ21Aug 28, 2025Updated 7 months ago
- Research prototype of PRISM β a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.β59Mar 17, 2026Updated 3 weeks ago
- Measuring the Signal to Noise Ratio in Language Model Evaluationβ29Aug 19, 2025Updated 7 months ago
- A simple JS script to register desired course when slots are available, for UM-SJTU JI students.β12May 9, 2022Updated 3 years ago
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)β44Feb 10, 2026Updated 2 months ago