nate-gillman / fourier-head
Official implementation of "Fourier Head: Helping Large Language Models Learn Complex Probability Distributions" (ICLR 2025)
☆60Updated 3 weeks ago
Alternatives and similar repositories for fourier-head:
Users that are interested in fourier-head are comparing it to the libraries listed below
- A State-Space Model with Rational Transfer Function Representation.☆78Updated 11 months ago
- ☆31Updated last year
- ☆94Updated 3 months ago
- ☆52Updated 6 months ago
- ☆25Updated last year
- ☆58Updated 3 weeks ago
- ☆79Updated last year
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆58Updated 2 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆156Updated last month
- ☆31Updated 11 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆83Updated last year
- ☆31Updated 11 months ago
- Gradient Boosting Reinforcement Learning (GBRL)☆108Updated last month
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆36Updated 2 weeks ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆54Updated 2 weeks ago
- Explorations into the recently proposed Taylor Series Linear Attention☆97Updated 8 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆98Updated 4 months ago
- Kolmogorov–Arnold Networks with modified activation (using MLP to represent the activation)☆103Updated 5 months ago
- A simple example of VAEs with KANs☆12Updated 11 months ago
- Code repository for Trajectory Flow Matching☆60Updated 5 months ago
- Code for☆27Updated 4 months ago
- σ-GPT: A New Approach to Autoregressive Models☆62Updated 8 months ago
- ☆175Updated 4 months ago
- This repository contains a better implementation of Kolmogorov-Arnold networks☆61Updated 11 months ago
- ☆53Updated last year
- High order and sparse layers in pytorch. Lagrange Polynomial, Piecewise Lagrange Polynomial, Piecewise Discontinuous Lagrange Polynomial…☆44Updated 9 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆28Updated 4 years ago
- ☆88Updated 10 months ago
- ☆27Updated 9 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆111Updated 4 months ago