nate-gillman / fourier-head
Official implementation of "Fourier Head: Helping Large Language Models Learn Complex Probability Distributions" (ICLR 2025)
☆56Updated this week
Alternatives and similar repositories for fourier-head:
Users that are interested in fourier-head are comparing it to the libraries listed below
- A State-Space Model with Rational Transfer Function Representation.☆77Updated 8 months ago
- Flow-matching algorithms in JAX☆83Updated 6 months ago
- ☆78Updated 10 months ago
- ☆31Updated 10 months ago
- ☆25Updated last year
- ☆53Updated last year
- ☆29Updated 9 months ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆72Updated 2 weeks ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated last month
- PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.☆76Updated 11 months ago
- Cellular Automata Accelerated in JAX (Oral at ICLR 2025).☆81Updated 2 months ago
- ☆157Updated 2 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆108Updated 3 months ago
- Implementation of the proposed Spline-Based Transformer from Disney Research☆85Updated 3 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆82Updated last year
- Graph neural networks in JAX.☆67Updated 7 months ago
- ☆31Updated 9 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆80Updated this week
- ☆14Updated 8 months ago
- σ-GPT: A New Approach to Autoregressive Models☆61Updated 6 months ago
- Code repository for Trajectory Flow Matching☆51Updated 3 months ago
- Evaluating the Mamba architecture on the Othello game☆44Updated 9 months ago
- ☆149Updated 6 months ago
- Visualizations of the theory behind diffusion models.☆77Updated 9 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆123Updated this week
- Code for https://arxiv.org/abs/2406.04329☆56Updated 2 months ago
- Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"☆14Updated 2 weeks ago
- Gradient Boosting Reinforcement Learning (GBRL)☆99Updated last week