kyutai-labs / jax-flash-attn3Links
JAX bindings for the flash-attention3 kernels
β11Updated 10 months ago
Alternatives and similar repositories for jax-flash-attn3
Users that are interested in jax-flash-attn3 are comparing it to the libraries listed below
Sorting:
- π interactively explore `onnx` networks in your CLI.β24Updated last year
- FlexAttention w/ FlashAttention3 Supportβ26Updated 8 months ago
- A small python library to run iterators in a separate processβ10Updated last year
- β14Updated 6 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.β60Updated last month
- Open deep learning compiler stack for cpu, gpu and specialized acceleratorsβ18Updated last week
- β12Updated last year
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant β¦β15Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin toβ¦β23Updated this week
- Code and data for paper "(How) do Language Models Track State?"β14Updated 2 months ago
- Make triton easierβ47Updated 11 months ago
- Rust crate for some audio utilitiesβ23Updated 2 months ago
- Exploration into the Firefly algorithm in Pytorchβ39Updated 3 months ago
- β13Updated last year
- π· Build compute kernelsβ44Updated this week
- β19Updated 8 months ago
- TensorRT LLM Benchmark Configurationβ13Updated 10 months ago
- Repository for CPU Kernel Generation for LLM Inferenceβ26Updated last year
- Training hybrid models for dummies.β21Updated 4 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scalingβ15Updated 3 weeks ago
- Standalone Flash Attention v2 kernel without libtorch dependencyβ110Updated 8 months ago
- β74Updated 6 months ago
- GoldFinch and other hybrid transformer componentsβ10Updated 3 weeks ago
- Accelerate LLM preference tuning via prefix sharing with a single line of codeβ41Updated last month
- β31Updated last year
- Read and write tensorboard data using Rustβ21Updated last year
- PyTorch implementation of the Flash Spectral Transform Unit.β17Updated 8 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.β23Updated last year
- implement llava using candleβ15Updated 11 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPOβ29Updated last week