kyutai-labs / jax-flash-attn3Links
JAX bindings for the flash-attention3 kernels
☆11Updated 10 months ago
Alternatives and similar repositories for jax-flash-attn3
Users that are interested in jax-flash-attn3 are comparing it to the libraries listed below
Sorting:
- Read and write tensorboard data using Rust☆21Updated last year
- FlexAttention w/ FlashAttention3 Support☆26Updated 8 months ago
- A small python library to run iterators in a separate process☆10Updated last year
- 🔭 interactively explore `onnx` networks in your CLI.☆25Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated this week
- ☆20Updated 8 months ago
- ☆12Updated last year
- Rust crate for some audio utilities☆24Updated 3 months ago
- Make triton easier☆46Updated last year
- ☆15Updated 7 months ago
- ☆31Updated last year
- CLI utility to inspect and explore .safetensors and .gguf files☆20Updated 2 weeks ago
- TensorRT LLM Benchmark Configuration☆13Updated 11 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- PyTorch implementation of the Flash Spectral Transform Unit.☆17Updated 9 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated this week
- 👷 Build compute kernels☆68Updated this week
- Here we will test various linear attention designs.☆59Updated last year
- [WIP] Better (FP8) attention for Hopper☆30Updated 4 months ago
- Code and data for paper "(How) do Language Models Track State?"☆14Updated 2 months ago
- 基于 CUDA Driver API 的 cuda 运行时环境☆16Updated last week
- Effort to open-source 10.5 trillion parameter Gemini model.☆17Updated last year
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆61Updated 2 months ago
- Graph model execution API for Candle☆13Updated 7 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆17Updated 2 weeks ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated last month
- "PyTorch in Rust"☆16Updated last year
- code for paper "Accessing higher dimensions for unsupervised word translation"☆21Updated 2 years ago
- Training hybrid models for dummies.☆23Updated 5 months ago