Experiments on Multi-Head Latent Attention
☆101Aug 19, 2024Updated last year
Alternatives and similar repositories for mla-experiments
Users that are interested in mla-experiments are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Transformers components but in Triton☆34May 9, 2025Updated last year
- Cute layout visualization☆40Jan 18, 2026Updated 5 months ago
- PyTorch implementation of the Flash Spectral Transform Unit.☆22Sep 19, 2024Updated last year
- Triton for OpenCL backend, and use mlir-translate to get source OpenCL code☆27Aug 27, 2025Updated 9 months ago
- ☆52May 19, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Design hardware-friendly model architectures and migrate existing LLMs with minimal performance loss☆484Jun 9, 2026Updated last week
- All-in-one benchmarking platform for evaluating LLM.☆15Nov 12, 2025Updated 7 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- My tests and experiments with some popular dl frameworks.