Experiments on Multi-Head Latent Attention
☆101Aug 19, 2024Updated last year
Alternatives and similar repositories for mla-experiments
Users that are interested in mla-experiments are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Cute layout visualization☆37Jan 18, 2026Updated 3 months ago
- Transformers components but in Triton☆34May 9, 2025Updated 11 months ago
- PyTorch implementation of the Flash Spectral Transform Unit.☆22Sep 19, 2024Updated last year
- Triton for OpenCL backend, and use mlir-translate to get source OpenCL code☆26Aug 27, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆51May 19, 2025Updated 11 months ago
- Design hardware-friendly model architectures and migrate existing LLMs with minimal performance loss☆457Apr 6, 2026Updated last week
- All-in-one benchmarking platform for evaluating LLM.☆15Nov 12, 2025Updated 5 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention