The official code of "Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers"
☆20Jul 24, 2024Updated last year
Alternatives and similar repositories for StructuredFFN
Users that are interested in StructuredFFN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …☆30Dec 8, 2025Updated 6 months ago
- Official code for the NeurIPS25 paper "RAT: Bridging RNN Efficiencyand Attention Accuracy in Language Modeling" (https://arxiv.org/abs/25…☆26Dec 10, 2025Updated 6 months ago
- Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (M…☆32Nov 20, 2024Updated last year
- Grokking on modular arithmetic in less than 150 epochs in MLX☆15Oct 24, 2024Updated last year
- ☆13Jul 3, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Simple GRPO scripts and configurations.☆58Feb 6, 2025Updated last year
- [ICML-2025] We introduce Lie group Relative position Encodings (LieRE) that goes beyond RoPE in supporting n-dimensional inputs.☆14Aug 8, 2025Updated 10 months ago
- ☆17Oct 27, 2024Updated last year
- ☆25May 25, 2024Updated 2 years ago
- Model configurations for scaling SE models in the paper "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enha…☆41Aug 7, 2024Updated last year
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆16Apr 30, 2025Updated last year
- Reasoning-based Evaluation and Ranking of Translations.☆20Jun 2, 2026Updated 3 weeks ago
- gRPC server over a FAISS index☆19Aug 19, 2021Updated 4 years ago
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)☆12Oct 11, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Scratchpad/Chain-of-Thought Prompts☆12Jun 6, 2022Updated 4 years ago
- Mastodon server running for the Doubanius Tertius project☆10Apr 4, 2022Updated 4 years ago
- [ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models☆51Feb 26, 2026Updated 4 months ago
- Combining SOAP and MUON☆22Feb 11, 2025Updated last year
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- ☆34Jun 4, 2025Updated last year
- Code to generate visual metamers via foveated feed-forward style transfer (ICLR 2019)☆19Apr 13, 2021Updated 5 years ago
- ☆14Mar 2, 2025Updated last year
- Fast and differentiable hidden Markov model in C++☆19Jan 20, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆15Sep 6, 2021Updated 4 years ago
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14May 26, 2025Updated last year
- Uses Processing and Perlin Noise to generate a procedural 2D rendering of different landscapes, which are then rendered into 3D☆16Aug 14, 2018Updated 7 years ago
- Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification☆13Feb 5, 2022Updated 4 years ago
- This is the code which powers the Twitter Bot https://twitter.com/RGB_Colours☆15Apr 14, 2017Updated 9 years ago
- ☆63Oct 3, 2024Updated last year
- Schedule free optimiser implemented in JAX using Optimistix☆15May 29, 2024Updated 2 years ago
- I can haz planetz?☆12Jun 12, 2020Updated 6 years ago
- Repository for code and dataset for our EMNLP 2021 paper - “So You Think You’re Funny?”: Rating the Humour Quotient in Standup Comedy.☆15Sep 26, 2022Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- KGML for EMNLP 2021☆10Feb 2, 2022Updated 4 years ago
- ☆19Dec 4, 2025Updated 6 months ago
- A simulator of Michelson interferometer.☆13Nov 23, 2020Updated 5 years ago
- ☆54May 20, 2024Updated 2 years ago
- Simulating a 2D Hovering SpaceX Grasshopper with a Thrust Vector Control) engine.☆12Dec 28, 2015Updated 10 years ago
- [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches☆66Mar 4, 2025Updated last year