philipturner / metal-flash-attentionLinks

FlashAttention (Metal Port)

☆542

Alternatives and similar repositories for metal-flash-attention

Users that are interested in metal-flash-attention are comparing it to the libraries listed below

Sorting:

ml-explore / mlx-data
Efficient framework-agnostic data loading
☆442Updated 3 weeks ago
TristanBilot / mlx-benchmark
Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) + MPS and CUDA.
☆199Updated 4 months ago
exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆224Updated last year
smpanaro / more-ane-transformers
Run transformers (incl. LLMs) on the Apple Neural Engine.
☆63Updated last year
riccardomusmeci / mlx-llm
Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.
☆456Updated 8 months ago
abetlen / ggml-python
Python bindings for ggml
☆146Updated last year
apple / ml-recurrent-drafter
☆218Updated 9 months ago
smpanaro / coreml-llm-cli
CLI to demonstrate running a large language model (LLM) on Apple Neural Engine.
☆117Updated 9 months ago
Cornell-RelaxML / quip-sharp
☆561Updated 11 months ago
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆223Updated last year
argmaxinc / DiffusionKit
On-device Image Generation for Apple Silicon
☆662Updated 6 months ago
mobiusml / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆883Updated last month
ml-explore / mlx-c
C API for MLX
☆143Updated 3 weeks ago
regrettable-username / llm.metal
LLM training in simple, raw C/Metal Shading Language
☆58Updated last year
ggml-org / p1
LLM-based code completion engine
☆190Updated 9 months ago
RobertRiachi / ANE-Optimized-Whisper-OpenAI
☆54Updated 2 years ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
kroggen / mamba.c
Inference of Mamba models in pure C
☆192Updated last year
armbues / SiLLM
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.
☆278Updated 4 months ago
arcee-ai / fastmlx
FastMLX is a high performance production ready API to host MLX models.
☆331Updated 7 months ago
kasper0406 / stablehlo-coreml
Convert StableHLO models into Apple Core ML format
☆19Updated 2 months ago
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆385Updated last year
smpanaro / ModernBERT-AppleNeuralEngine
ModernBERT model optimized for Apple Neural Engine.
☆28Updated 9 months ago
philipturner / metal-benchmarks
Apple GPU microarchitecture
☆556Updated last year
antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
☆291Updated last month
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated last year
monatis / clip.cpp
CLIP inference in plain C/C++ with no extra dependencies
☆523Updated 4 months ago
kolinko / effort
An implementation of bucketMul LLM inference
☆223Updated last year
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆360Updated 8 months ago
JosefAlbers / Phi-3-Vision-MLX
Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon
☆273Updated last year