kyutai-labs / jax-flash-attn3
JAX bindings for the flash-attention3 kernels
☆11Updated 8 months ago
Alternatives and similar repositories for jax-flash-attn3:
Users that are interested in jax-flash-attn3 are comparing it to the libraries listed below
- ☆13Updated 5 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated this week
- A small python library to run iterators in a separate process☆10Updated last year
- Read and write tensorboard data using Rust☆21Updated last year
- FlexAttention w/ FlashAttention3 Support☆26Updated 6 months ago
- 🔭 interactively explore `onnx` networks in your CLI.☆23Updated 10 months ago
- "PyTorch in Rust"☆16Updated last year
- TensorRT LLM Benchmark Configuration☆13Updated 9 months ago
- Rust crate for some audio utilities☆22Updated last month
- Loop Nest - Linear algebra compiler and code generator.☆22Updated 2 years ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆57Updated last week
- ☆19Updated 6 months ago
- ☆12Updated last year
- Make triton easier☆47Updated 10 months ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- Benchmark tests supporting the TiledCUDA library.☆16Updated 5 months ago
- implement llava using candle☆14Updated 10 months ago
- Exploration into the Firefly algorithm in Pytorch☆38Updated 2 months ago
- Sample Python extension using Rust/PyO3/tch to interact with PyTorch☆34Updated last year
- Training hybrid models for dummies.☆20Updated 3 months ago
- Tutorial on how to convert machine learned models into ONNX☆16Updated 2 years ago
- Personal solutions to the Triton Puzzles☆18Updated 9 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated this week
- Multi-Layer Key-Value sharing experiments on Pythia models☆32Updated 10 months ago
- Implementation of Hyena Hierarchy in JAX☆10Updated last year
- ☆13Updated last month
- Awesome code, projects, books, etc. related to CUDA☆16Updated last week
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Updated last year
- A CUDA kernel for NHWC GroupNorm for PyTorch☆18Updated 5 months ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆54Updated 10 months ago