Flash Attention Triton kernel with support for second-order derivatives
☆177May 14, 2026Updated last month
Alternatives and similar repositories for jvp_flash_attention
Users that are interested in jvp_flash_attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of AlphaGenome, Deepmind's updated genomic attention model☆101Mar 25, 2026Updated 3 months ago
- [NeurIPS 2025, Spotlight]: Ambient-o: Training Good models with Bad Data.☆35Apr 6, 2026Updated 2 months ago
- Training framework for Large Behavioral Models☆28Sep 17, 2025Updated 9 months ago
- ☆10Oct 28, 2024Updated last year
- PyTorch re-implementation for MeanFlow☆125Jul 17, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆75Updated this week
- ☆23Oct 22, 2025Updated 8 months ago
- Working examples in the Vale programming language☆14Mar 21, 2022Updated 4 years ago
- Scalable and Stable Parallelization of Nonlinear RNNS☆31Mar 6, 2026Updated 3 months ago
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated 2 years ago
- Scalable Minecraft multiplayer data collection engine☆137Apr 23, 2026Updated 2 months ago
- Code the ICML 2024 paper: "Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models"☆12Jun 25, 2024Updated 2 years ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆185Oct 20, 2025Updated 8 months ago
- The official codebase for Reflected Flow Matching (ICML 2024)☆24Jun 19, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆59Nov 18, 2025Updated 7 months ago
- ☆28Apr 23, 2026Updated 2 months ago
- Official Implementation of iMF https://arxiv.org/abs/2512.02012☆312Feb 27, 2026Updated 4 months ago
- [ACMMM 2025] ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies☆22Jun 20, 2025Updated last year
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆49Sep 2, 2025Updated 9 months ago
- Contrastive Reinforcement Learning☆65Apr 4, 2026Updated 2 months ago
- The codes for training sparsity predictor on LLaMA.☆18May 12, 2024Updated 2 years ago
- ☆206Oct 9, 2025Updated 8 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆25Jun 6, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Mixture of Lora Experts☆11Apr 7, 2024Updated 2 years ago
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆78Apr 3, 2026Updated 2 months ago
- This is the implementation of the 4th place solution (yu4u's part) for RSNA 2024 Lumbar Spine Degenerative Classification at Kaggle.☆10Oct 11, 2024Updated last year
- One stop shop for all things carp☆58Sep 9, 2022Updated 3 years ago
- [NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting☆74Jan 9, 2026Updated 5 months ago
- High performance pytorch modules☆18Jan 14, 2023Updated 3 years ago
- Official PyTorch Implementation of "Flow Map Distillation Without Data"☆128Nov 25, 2025Updated 7 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"☆164Mar 3, 2026Updated 3 months ago
- PyTorch implementation for "Parallel Sampling of Diffusion Models", NeurIPS 2023 Spotlight☆157Oct 13, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Portfolio REgret for Confidence SEquences☆21Jan 6, 2026Updated 5 months ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆57Feb 24, 2026Updated 4 months ago
- ☆56Nov 5, 2024Updated last year
- [ICML 2026] Elastic Diffusion Transformer: Accelerating SOTA generation models (e.g., Qwen-Image, Hunyuan3d ) through adaptive computatio…☆44May 1, 2026Updated 2 months ago
- Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch☆545Jan 18, 2026Updated 5 months ago
- EDM2 and Autoguidance -- Official PyTorch implementation☆844Dec 9, 2024Updated last year