Infatoshi / MegaQwenLinks
Qwen3-0.6B megakernel: 527 tok/s decode on RTX 3090 (3.8x faster than PyTorch)
☆70Updated this week
Alternatives and similar repositories for MegaQwen
Users that are interested in MegaQwen are comparing it to the libraries listed below
Sorting:
- Make triton easier☆50Updated last year
- [WIP] Better (FP8) attention for Hopper☆32Updated 11 months ago
- Simple high-throughput inference library☆155Updated 8 months ago
- ☆63Updated 7 months ago
- Samples of good AI generated CUDA kernels☆99Updated 8 months ago
- Repository for CPU Kernel Generation for LLM Inference☆28Updated 2 years ago
- ☆18Updated last year
- Standalone commandline CLI tool for compiling Triton kernels☆20Updated last year
- ☆67Updated 10 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆110Updated 11 months ago
- Because it's there.☆16Updated last year
- ☆119Updated last month
- [WIP] Transformer to embed Danbooru labelsets☆13Updated last year
- Efficient non-uniform quantization with GPTQ for GGUF☆58Updated 4 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 9 months ago
- ☆52Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- Gpu benchmark☆74Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 10 months ago
- mHC kernels implemented in CUDA☆249Updated 3 weeks ago
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆110Updated 2 years ago
- RWKV-7: Surpassing GPT☆104Updated last year
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Updated this week
- QuIP quantization☆62Updated last year
- ☆27Updated last year
- ☆41Updated 9 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆112Updated 8 months ago
- ☆50Updated last year
- A collection of reproducible inference engine benchmarks☆38Updated 9 months ago
- NanoGPT (124M) quality in 2.67B tokens☆28Updated 4 months ago