kailums / flash-attention-rocmLinks

Fast and memory-efficient exact attention ported to rocm

☆11

Alternatives and similar repositories for flash-attention-rocm

Users that are interested in flash-attention-rocm are comparing it to the libraries listed below

Sorting:

yuzhenmao / IceFormer
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
☆25Updated 2 months ago
eric-haibin-lin / verl-data
☆11Updated 4 months ago
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆33Updated 5 months ago
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
NolanoOrg / llama-int4-quant
☆26Updated 2 years ago
hpcaitech / CachedEmbedding
A memory efficient DLRM training solution using ColossalAI
☆106Updated 2 years ago
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated 2 years ago
casper-hansen / AutoAWQ_kernels
☆78Updated 10 months ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year
UmerHA / triton_util
Make triton easier
☆47Updated last year
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆39Updated 10 months ago
EQ-bench / eqbench3
☆25Updated last month
iantbutler01 / ditty
A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.
☆16Updated 11 months ago
geronimi73 / 3090_shorts
minimal scripts for 24GB VRAM GPUs. training, inference, whatever
☆42Updated 2 weeks ago
anyscale / long-context-fine-tuning-blogpost
☆17Updated last year
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆56Updated 2 weeks ago
official-elinas / zeus-llm-trainer
Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models
☆69Updated 2 years ago
tridao / flash-attention-wheels
☆57Updated last year
kyegomez / Reka-Torch
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆28Updated this week
Zyphra / Zyda_processing
☆39Updated last year
FreedomIntelligence / FastLLM
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆41Updated last year
nod-ai / transformer-benchmarks
benchmarking some transformer deployments
☆26Updated 2 years ago
qdrant / bm42_eval
Evaluation of bm42 sparse indexing algorithm
☆68Updated last year
kemingy / vllm-env
setup the env for vllm users
☆16Updated last year
IPRC-DIP / ANPL
☆21Updated last year
juvi21 / CoPE-cuda
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
☆22Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆45Updated last year
tile-ai / tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆19Updated 2 weeks ago
kyegomez / MM1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
☆24Updated 2 weeks ago