AuleTechnologies / Aule-AttentionLinks

High-performance FlashAttention-2 for AMD, Intel, and Apple GPUs. Drop-in replacement for PyTorch SDPA. Triton backend for ROCm (MI300X, RDNA3), Vulkan backend for consumer GPUs. No CUDA required.

☆117

Alternatives and similar repositories for Aule-Attention

Users that are interested in Aule-Attention are comparing it to the libraries listed below

Sorting:

s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Updated 2 months ago
reka-ai / rekaquant
☆62Updated 5 months ago
IST-DASLab / gptq-gguf-toolkit
Efficient non-uniform quantization with GPTQ for GGUF
☆57Updated 3 months ago
haizelabs / j1-micro
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆99Updated 5 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆73Updated 7 months ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated last year
ritabratamaiti / AnyModal
AnyModal is a Flexible Multimodal Language Model Framework for PyTorch
☆103Updated 11 months ago
leloykun / modded-nanogpt
NanoGPT (124M) quality in 2.67B tokens
☆28Updated 3 months ago
codelion / pts
Pivotal Token Search
☆135Updated this week
kubernetes-bad / reward-composer
Lego for GRPO
☆30Updated 6 months ago
deepgrove-ai / Bonsai
☆34Updated 8 months ago
NimbleEdge / sparse_transformers
Sparse Inferencing for transformer based LLMs
☆215Updated 4 months ago
facebookresearch / fastgen
Simple high-throughput inference library
☆152Updated 7 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆103Updated 7 months ago
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆73Updated 10 months ago
LaunchPlatform / marketplace
Marketplace ML experiment - training without backprop
☆27Updated 3 months ago
smolorg / smoltropix
MLX port for xjdr's entropix sampler (mimics jax implementation)
☆62Updated last year
tensorwavecloud / ScalarLM
ScalarLM - a unified training and inference stack
☆93Updated last month
facebookresearch / dacvae
DACVAE
☆124Updated this week
QuixiAI / grokadamw
☆136Updated last year
AlexBodner / How_Much_VRAM
☆101Updated last year
brendanhogan / picoDeepResearch
☆68Updated 6 months ago
nath1295 / MLX-Textgen
A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.
☆99Updated 5 months ago
abhisheknair10 / llama3.cu
Lightweight Llama 3 8B Inference Engine in CUDA C
☆53Updated 9 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 9 months ago
firstbatchxyz / function-calling-eval
The DPAB-α Benchmark
☆32Updated 11 months ago
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆85Updated this week
jxmorris12 / embzip
lossily compress representation vectors using product quantization
☆59Updated last month
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆101Updated last year
fal-ai-community / llmdifftracker
Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)
☆34Updated 9 months ago