The official code of "Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers"
☆19Jul 24, 2024Updated last year
Alternatives and similar repositories for StructuredFFN
Users that are interested in StructuredFFN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …☆30Dec 8, 2025Updated 4 months ago
- Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (M…☆31Nov 20, 2024Updated last year
- Grokking on modular arithmetic in less than 150 epochs in MLX☆16Oct 24, 2024Updated last year
- ☆70Nov 15, 2024Updated last year
- ☆13Jul 3, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Automatically exported from code.google.com/p/transducersaurus☆11Apr 1, 2015Updated 11 years ago
- Code for the paper Task Agnostic Morphology Evolution.☆20May 25, 2021Updated 4 years ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- [ICML-2025] We introduce Lie group Relative position Encodings (LieRE) that goes beyond RoPE in supporting n-dimensional inputs.☆14Aug 8, 2025Updated 8 months ago
- Source code accompanying the NeurIPS 2022 paper "Learning Partial Equivariances From Data"☆10Nov 18, 2022Updated 3 years ago
- Training, optimization and deployment of Object Detection model with dinov2 backbone for efficient inference on NVIDIA Jetson☆13Jul 26, 2025Updated 8 months ago
- Model configurations for scaling SE models in the paper "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enha…☆38Aug 7, 2024Updated last year
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆15Apr 30, 2025Updated 11 months ago
- CLI tool for submitting GPU kernels☆13Apr 1, 2026Updated last week
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Reasoning-based Evaluation and Ranking of Translations.☆20Jul 18, 2025Updated 8 months ago
- Structured Neuron Level Pruning to compress Transformer-based models [ECCV'24]☆17Aug 7, 2024Updated last year
- manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices☆11Jan 12, 2021Updated 5 years ago
- Continuous regular group convolutions for Pytorch☆12Jun 9, 2024Updated last year
- Official Implementation of paper "Distilling Long-tailed Datasets" [CVPR 2025]☆21Aug 13, 2025Updated 7 months ago
- ☆21Oct 1, 2024Updated last year
- Mastodon server running for the Doubanius Tertius project☆10Apr 4, 2022Updated 4 years ago
- [ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models☆48Feb 26, 2026Updated last month
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- C# Specification for https://github.com/vega/vega-lite☆10Aug 12, 2021Updated 4 years ago
- Code to generate visual metamers via foveated feed-forward style transfer (ICLR 2019)☆19Apr 13, 2021Updated 4 years ago
- ☆15Mar 2, 2025Updated last year
- Fast and differentiable hidden Markov model in C++☆19Jan 20, 2023Updated 3 years ago
- An example application that uses SkiaSharp with Wpf☆15Apr 1, 2016Updated 10 years ago
- Elucidated Dataset Condensation (NeurIPS 2024)☆20Oct 5, 2024Updated last year
- An approximate implementation of the OpenAI paper - An Empirical Model of Large-Batch Training for MNIST☆11Nov 19, 2022Updated 3 years ago
- a compact audio-to-phoneme aligner for singing voice☆12Jan 17, 2024Updated 2 years ago
- A Statistical Arbitrage Strategy to trade Cryptocurrency Pairs☆14Nov 6, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Uses Processing and Perlin Noise to generate a procedural 2D rendering of different landscapes, which are then rendered into 3D☆16Aug 14, 2018Updated 7 years ago
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14May 26, 2025Updated 10 months ago
- This repo contains the code for studying the interplay between quantization and sparsity methods☆26Feb 26, 2025Updated last year
- This is the code which powers the Twitter Bot https://twitter.com/RGB_Colours☆15Apr 14, 2017Updated 8 years ago
- Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification☆13Feb 5, 2022Updated 4 years ago
- repository for paper "Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis"☆18Jun 17, 2022Updated 3 years ago
- Minimal implementation of TokenFormer for inference and learning☆13Nov 6, 2024Updated last year