kyutai-labs / jax-flash-attn3
JAX bindings for the flash-attention3 kernels
☆11Updated 7 months ago
Alternatives and similar repositories for jax-flash-attn3:
Users that are interested in jax-flash-attn3 are comparing it to the libraries listed below
- FlexAttention w/ FlashAttention3 Support☆26Updated 5 months ago
- A small python library to run iterators in a separate process☆10Updated last year
- Rust crate for some audio utilities☆22Updated 3 weeks ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated last week
- Read and write tensorboard data using Rust☆20Updated last year
- ☆30Updated 10 months ago
- Awesome Triton Resources☆23Updated this week
- ☆23Updated last month
- ☆12Updated last year
- ☆12Updated 4 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆53Updated 3 weeks ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated last month
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆28Updated 3 weeks ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Updated last year
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated last month
- Exploration into the Firefly algorithm in Pytorch☆35Updated last month
- Here we will test various linear attention designs.☆60Updated 11 months ago
- 🔭 interactively explore `onnx` networks in your CLI.☆23Updated 9 months ago
- GoldFinch and other hybrid transformer components☆45Updated 8 months ago
- ☆19Updated 5 months ago
- Benchmark tests supporting the TiledCUDA library.☆15Updated 4 months ago
- TensorRT LLM Benchmark Configuration☆13Updated 8 months ago
- GoldFinch and other hybrid transformer components☆10Updated 2 weeks ago
- Training hybrid models for dummies.☆20Updated 2 months ago
- ☆14Updated 8 months ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆21Updated last year
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- Proof of concept for running moshi/hibiki using webrtc☆18Updated last month
- [WIP] Better (FP8) attention for Hopper☆26Updated last month
- Using FlexAttention to compute attention with different masking patterns☆42Updated 6 months ago