amorehead / jvp_flash_attentionView external linksLinks
Flash Attention Triton kernel with support for second-order derivatives
☆144Feb 4, 2026Updated last week
Alternatives and similar repositories for jvp_flash_attention
Users that are interested in jvp_flash_attention are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Scalable and Stable Parallelization of Nonlinear RNNS☆28Oct 21, 2025Updated 3 months ago
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 2 months ago
- This is the implementation of the 4th place solution (yu4u's part) for RSNA 2024 Lumbar Spine Degenerative Classification at Kaggle.☆10Oct 11, 2024Updated last year
- Code the ICML 2024 paper: "Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models"☆11Jun 25, 2024Updated last year
- Official codebase for our paper "Do Language Models Use Their Depth Efficiently?"☆29Jun 25, 2025Updated 7 months ago
- Enhanced Reverberation As Supervision (ERAS) for unsupervised reverberant speech separation☆15Aug 1, 2024Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset☆12Sep 29, 2025Updated 4 months ago
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- ☆55Nov 5, 2024Updated last year
- ☆10Oct 28, 2024Updated last year
- Working examples in the Vale programming language☆14Mar 21, 2022Updated 3 years ago
- Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction☆13Jul 22, 2024Updated last year
- ☆20Oct 22, 2025Updated 3 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆174Oct 20, 2025Updated 3 months ago
- PyTorch re-implementation for MeanFlow☆116Jul 17, 2025Updated 6 months ago
- [NeurIPS 2025, Spotlight]: Ambient-o: Training Good models with Bad Data.☆30Jan 21, 2026Updated 3 weeks ago
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆14Mar 18, 2025Updated 10 months ago
- Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch☆153May 2, 2025Updated 9 months ago