Aleph-Alpha / Alpha-MoELinks
☆47Updated last month
Alternatives and similar repositories for Alpha-MoE
Users that are interested in Alpha-MoE are comparing it to the libraries listed below
Sorting:
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆319Updated last week
- Applied AI experiments and examples for PyTorch☆315Updated 5 months ago
- ☆102Updated last year
- ☆159Updated last year
- Fast low-bit matmul kernels in Triton☆424Updated this week
- Github mirror of trition-lang/triton repo.☆129Updated this week
- extensible collectives library in triton☆93Updated 10 months ago
- ☆277Updated last week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆276Updated 6 months ago
- ☆258Updated last year
- Cataloging released Triton kernels.☆291Updated 4 months ago
- Collection of kernels written in Triton language☆178Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆164Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆94Updated 3 weeks ago
- FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels☆60Updated last week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆249Updated 8 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆95Updated 4 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆165Updated 2 months ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆36Updated 5 months ago
- Accelerating MoE with IO and Tile-aware Optimizations☆563Updated 2 weeks ago
- TPU inference for vLLM, with unified JAX and PyTorch support.☆223Updated this week
- ☆60Updated this week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆732Updated this week
- torchcomms: a modern PyTorch communications API☆323Updated last week
- Autonomous GPU Kernel Generation via Deep Agents☆223Updated last week
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆97Updated 4 months ago
- An experimental CPU backend for Triton☆173Updated 2 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆141Updated 8 months ago
- kernels, of the mega variety☆657Updated 4 months ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆105Updated 7 years ago