huggingface / candle-paged-attentionLinks

☆12

Alternatives and similar repositories for candle-paged-attention

Users that are interested in candle-paged-attention are comparing it to the libraries listed below

Sorting:

LaurentMazare / glim
☆19Updated last year
chenwanqq / candle-llava
implement llava using candle
☆15Updated last year
KGrewal1 / candle-optimisers
A collection of optimisers for use with candle
☆44Updated this week
LaurentMazare / tch-ext
Sample Python extension using Rust/PyO3/tch to interact with PyTorch
☆39Updated last year
kuterd / opal_ptx
Experimental GPU language with meta-programming
☆24Updated last year
EricLBuehler / safetensors_explorer
CLI utility to inspect and explore .safetensors and .gguf files
☆35Updated last month
kyutai-labs / kaudio
Rust crate for some audio utilities
☆25Updated 9 months ago
LaurentMazare / tboard-rs
Read and write tensorboard data using Rust
☆24Updated last year
lianakoleva / no-libtorch-compile
☆21Updated 9 months ago
LaurentMazare / mamba.rs
☆135Updated last year
michaelfeil / candle-flash-attn-v3
☆13Updated 10 months ago
EricLBuehler / candle_graphs
Graph model execution API for Candle
☆16Updated 4 months ago
LaurentMazare / gemm-metal
☆17Updated last year
FL33TW00D / steelix
Your one stop CLI for ONNX model analysis.
☆47Updated 3 years ago
LaurentMazare / ug
Experimental compiler for deep learning models
☆71Updated 2 months ago
Narsil / bindgen_cuda
☆26Updated 7 months ago
fal-ai-community / NativeSparseAttention
research impl of Native Sparse Attention (2502.11089)
☆63Updated 9 months ago
Chillee / lit-llama
Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code
☆10Updated 2 years ago
huggingface / kernel-builder
👷 Build compute kernels
☆192Updated this week
cchan / tccl
extensible collectives library in triton
☆91Updated 8 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
huggingface / pyo3-special-method-derive
Automatically derive Python dunder methods for your Rust code
☆20Updated 7 months ago
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 6 months ago
sarah-quinones / gemm
☆96Updated 3 weeks ago
facebookresearch / fastgen
Simple high-throughput inference library
☆150Updated 6 months ago
eugenehp / gpu-fft
GPU based FFT written in Rust and CubeCL
☆24Updated 5 months ago
Dan-wanna-M / kbnf
A high-performance constrained decoding engine based on context free grammar in Rust
☆56Updated 6 months ago
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆65Updated last week
EricLBuehler / float8
8-bit floating point types for Rust
☆61Updated last week
EricLBuehler / candle-lora
Low rank adaptation (LoRA) for Candle.
☆168Updated 7 months ago