Dao-AILab / flash-attentionLinks

Fast and memory-efficient exact attention

☆20,804

Alternatives and similar repositories for flash-attention

Users that are interested in flash-attention are comparing it to the libraries listed below

Sorting:

facebookresearch / xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
☆10,131Updated 2 weeks ago
bitsandbytes-foundation / bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
☆7,790Updated last week
huggingface / peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆20,157Updated last week
huggingface / trl
Train transformer language models with reinforcement learning.
☆16,473Updated this week
NVIDIA / Megatron-LM
Ongoing research training transformer models at scale
☆14,389Updated this week
huggingface / accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (i…
☆9,329Updated this week
NVIDIA / FasterTransformer
Transformer related optimization, including BERT, GPT
☆6,355Updated last year
triton-lang / triton
Development repository for the Triton language and compiler
☆17,730Updated this week
InternLM / lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
☆7,323Updated this week
volcengine / verl
verl: Volcano Engine Reinforcement Learning for LLMs
☆16,726Updated last week
microsoft / LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
☆13,010Updated 11 months ago
NVIDIA / TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…
☆12,258Updated this week
sgl-project / sglang
SGLang is a fast serving framework for large language models and vision language models.
☆20,874Updated this week
artidoro / qlora
QLoRA: Efficient Finetuning of Quantized LLMs
☆10,778Updated last year
EleutherAI / lm-evaluation-harness
A framework for few-shot evaluation of language models.
☆10,776Updated last week
deepspeedai / DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆40,890Updated last week
AutoGPTQ / AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆4,992Updated 7 months ago
deepspeedai / DeepSpeedExamples
Example models using DeepSpeed
☆6,743Updated last month
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,362Updated 4 months ago
vllm-project / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆64,235Updated this week
BlinkDL / RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)…
☆14,177Updated 2 weeks ago
QwenLM / Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
☆6,392Updated last year
mlfoundations / open_clip
An open source implementation of CLIP.
☆13,051Updated last month
OpenRLHF / OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Asy…
☆8,476Updated 3 weeks ago
state-spaces / mamba
Mamba SSM architecture
☆16,573Updated 3 weeks ago
meta-pytorch / gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
☆6,162Updated 3 months ago
modelscope / ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (…
☆11,418Updated this week
mit-han-lab / streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
☆7,135Updated last year
arcee-ai / mergekit
Tools for merging pretrained large language models.
☆6,494Updated last week
meta-pytorch / torchtune
PyTorch native post-training library
☆5,604Updated last week