amorehead/jvp_flash_attention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/amorehead/jvp_flash_attention)

amorehead / jvp_flash_attention

Flash Attention Triton kernel with support for second-order derivatives

☆144

Alternatives and similar repositories for jvp_flash_attention

Users that are interested in jvp_flash_attention are comparing it to the libraries listed below

Sorting:

MikaStars39 / StableMask
View on GitHub
PyTorch implementation of StableMask (ICML'24)
☆15Jun 27, 2024Updated last year
ThomAS122102RAY / PanNuke-cell-core-region-identification-with-DINO
View on GitHub
coded with and corrected by Google Anti-Gravity
☆13Nov 23, 2025Updated 3 months ago
lindermanlab / elk
View on GitHub
Scalable and Stable Parallelization of Nonlinear RNNS
☆29Oct 21, 2025Updated 4 months ago
zichongli5 / NorMuon
View on GitHub
Official Implementation for NorMuon paper
☆57Feb 9, 2026Updated 3 weeks ago
yu4u / kaggle-rsna2024-4th
View on GitHub
This is the implementation of the 4th place solution (yu4u's part) for RSNA 2024 Lumbar Spine Degenerative Classification at Kaggle.
☆10Oct 11, 2024Updated last year
kaistmm / voxsim_trainer
View on GitHub
[INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset
☆12Sep 29, 2025Updated 5 months ago
graphcore-research / jax-scalify
View on GitHub
JAX Scalify: end-to-end scaled arithmetics
☆18Oct 30, 2024Updated last year
RobertCsordas / llm_effective_depth
View on GitHub
Official codebase for our paper "Do Language Models Use Their Depth Efficiently?"
☆29Jun 25, 2025Updated 8 months ago
merlresearch / reverberation-as-supervision
View on GitHub
Enhanced Reverberation As Supervision (ERAS) for unsupervised reverberant speech separation
☆15Aug 1, 2024Updated last year
iwiwi / epochraft
View on GitHub
Checkpointable dataset utilities for foundation model training
☆32Jan 29, 2024Updated 2 years ago
sony / diffusion-timbre-transfer
View on GitHub
☆55Nov 5, 2024Updated last year
Lizn-zn / Nesy-Programming
View on GitHub
☆10Oct 28, 2024Updated last year
ysngki / UMoE
View on GitHub
☆21Oct 22, 2025Updated 4 months ago
Ivo-Balbaert / Vale_Examples
View on GitHub
Working examples in the Vale programming language
☆14Mar 21, 2022Updated 3 years ago
theAdamColton / ijepa-enhanced
View on GitHub
recipe for training fully-featured self supervised image jepa models
☆12Jun 4, 2025Updated 9 months ago
amazon-science / mezo_svrg
View on GitHub
Code the ICML 2024 paper: "Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models"
☆12Jun 25, 2024Updated last year
xinan-chen / AP_BWE
View on GitHub
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction
☆13Jul 22, 2024Updated last year
HaoyiZhu / MeanFlow-PyTorch
View on GitHub
PyTorch re-implementation for MeanFlow
☆117Jul 17, 2025Updated 7 months ago
NX-AI / flashrnn
View on GitHub
FlashRNN - Fast RNN Kernels with I/O Awareness
☆175Oct 20, 2025Updated 4 months ago
bentherien / mu_learned_optimization
View on GitHub
[Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
☆15Feb 12, 2026Updated 3 weeks ago
giannisdaras / ambient-omni
View on GitHub
[NeurIPS 2025, Spotlight]: Ambient-o: Training Good models with Bad Data.
☆31Jan 21, 2026Updated last month
Aleph-Alpha-Research / trigrams
View on GitHub
☆59Nov 18, 2025Updated 3 months ago
peterant330 / KUEA
View on GitHub
[ICML'25] Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models
☆21Sep 7, 2025Updated 6 months ago
dykestra / Yang_Ramanan
View on GitHub
code for Articulated Human Detection with Flexible Mixtures-of-Parts
☆15May 7, 2016Updated 9 years ago
smonsays / hypernetwork-attention
View on GitHub
Official code for the paper "Attention as a Hypernetwork"
☆51Feb 24, 2026Updated last week
Raincleared-Song / DejaVu_predictor
View on GitHub
The codes for training sparsity predictor on LLaMA.
☆18May 12, 2024Updated last year
7Xin / DPI-TTS
View on GitHub
☆13Sep 12, 2024Updated last year
RobertCsordas / switchhead
View on GitHub
☆17Jun 11, 2025Updated 8 months ago
apple / ml-dataset-decomposition
View on GitHub
Official repo of dataset-decomposition paper [NeurIPS 2024]
☆21Jan 8, 2025Updated last year
LINs-lab / GMem
View on GitHub
[Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models
☆43Mar 11, 2025Updated 11 months ago
yifanzhang-pro / HLA
View on GitHub
Official Project Page for HLA: Higher-order Linear Attention (https://arxiv.org/abs/2510.27258)
☆45Jan 6, 2026Updated 2 months ago
NVlabs / PerAda
View on GitHub
Repo for the paper: PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees (CVPR 2024)
☆23Aug 14, 2024Updated last year
sigaloid / mutter
View on GitHub
Easy-to-use Rust bindings to the Whisper.cpp machine learning transcription library!
☆25Oct 31, 2025Updated 4 months ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated last year
Adibian / ResGrad
View on GitHub
Unofficial implementation of ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
☆19Feb 9, 2025Updated last year
yxlllc / vocal-remover
View on GitHub
Vocal Remover using Deep Neural Networks
☆19Dec 31, 2024Updated last year
IBM / activated-lora
View on GitHub
Source code for Activated LoRA
☆24Nov 22, 2025Updated 3 months ago
thu-ml / GFT
View on GitHub
☆52Jun 13, 2025Updated 8 months ago
keshik6 / grafting
View on GitHub
[NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting
☆72Jan 9, 2026Updated last month