nate-gillman / fourier-head
Official implementation of "Fourier Head: Helping Large Language Models Learn Complex Probability Distributions" (ICLR 2025)
☆54Updated this week
Alternatives and similar repositories for fourier-head:
Users that are interested in fourier-head are comparing it to the libraries listed below
- A State-Space Model with Rational Transfer Function Representation.☆77Updated 8 months ago
- ☆31Updated 9 months ago
- ☆14Updated 8 months ago
- ☆150Updated last month
- Code repository for Trajectory Flow Matching☆50Updated 2 months ago
- Graph neural networks in JAX.☆67Updated 7 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated last month
- σ-GPT: A New Approach to Autoregressive Models☆61Updated 5 months ago
- PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.☆73Updated 10 months ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆66Updated last week
- Flow-matching algorithms in JAX☆83Updated 5 months ago
- ☆78Updated 9 months ago
- ☆53Updated last year
- Code for https://arxiv.org/abs/2406.04329☆56Updated last month
- Visualizations of the theory behind diffusion models.☆77Updated 9 months ago
- Lightning-like training API for JAX with Flax☆38Updated last month
- ☆30Updated 2 months ago
- Neural Optimal Transport with Lagrangian Costs☆50Updated 6 months ago
- ☆30Updated 3 weeks ago
- ☆50Updated 3 months ago
- A simple example of VAEs with KANs☆12Updated 8 months ago
- Implementation of the proposed Spline-Based Transformer from Disney Research☆85Updated 2 months ago
- This repository contains a better implementation of Kolmogorov-Arnold networks☆59Updated 8 months ago
- ☆46Updated 2 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆82Updated last year
- A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.☆71Updated 2 weeks ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆106Updated 3 months ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆36Updated 4 months ago
- Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.☆24Updated 2 months ago
- Implementation of Denoising Diffusion Probabilistic Models (DDPM) in JAX and Flax.☆17Updated last year