The official code of "Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers"
☆19Jul 24, 2024Updated last year
Alternatives and similar repositories for StructuredFFN
Users that are interested in StructuredFFN are comparing it to the libraries listed below
Sorting:
- Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (M…☆31Nov 20, 2024Updated last year
- A template for starting reproducible Python machine-learning projects with hardware acceleration. Find an example at https://github.com/C…☆114Jun 6, 2025Updated 9 months ago
- Grokking on modular arithmetic in less than 150 epochs in MLX☆16Oct 24, 2024Updated last year
- ☆70Nov 15, 2024Updated last year
- Automatically exported from code.google.com/p/transducersaurus☆11Apr 1, 2015Updated 10 years ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- Source code accompanying the NeurIPS 2022 paper "Learning Partial Equivariances From Data"☆10Nov 18, 2022Updated 3 years ago
- ☆22May 25, 2024Updated last year
- Model configurations for scaling SE models in the paper "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enha…☆38Aug 7, 2024Updated last year
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆14Apr 30, 2025Updated 10 months ago
- manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices☆12Jan 12, 2021Updated 5 years ago
- torchvision-based transforms that provide access to parameterization☆16Dec 4, 2025Updated 3 months ago
- Code for [Re] On the Reproducibility of Post-Hoc Concept Bottleneck Models.☆13Nov 27, 2024Updated last year
- [NeurIPS 2024] "AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment" by Yonggan Fu, Zhongzhi Yu,…☆19Dec 13, 2024Updated last year
- Continuous regular group convolutions for Pytorch☆12Jun 9, 2024Updated last year
- ☆11Dec 2, 2017Updated 8 years ago
- 🧞♀️ Discover AI-generated programming project ideas☆10Sep 2, 2022Updated 3 years ago
- gRPC server over a FAISS index☆19Aug 19, 2021Updated 4 years ago
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)☆12Oct 11, 2024Updated last year
- A few models converted from caffe to CoreMLs format.☆15Jun 6, 2017Updated 8 years ago
- Scratchpad/Chain-of-Thought Prompts☆12Jun 6, 2022Updated 3 years ago
- Some benchmarks☆12Sep 19, 2019Updated 6 years ago
- Mastodon server running for the Doubanius Tertius project☆10Apr 4, 2022Updated 3 years ago
- [ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models☆47Feb 26, 2026Updated 3 weeks ago
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- some papers about Kalman Filter☆14Sep 4, 2019Updated 6 years ago
- ☆33Jun 4, 2025Updated 9 months ago
- A Visual Studio debugger extension for viewing SkiaSharp bitmaps and images.☆14Apr 19, 2024Updated last year
- A Python interface to OpenFst (fix FstDrawer interface issue for 1.6 version)☆17Apr 2, 2018Updated 7 years ago
- [CVPR 2025] Official repository for GETA☆39Nov 5, 2025Updated 4 months ago
- ☆15Mar 2, 2025Updated last year
- Fast and differentiable hidden Markov model in C++☆19Jan 20, 2023Updated 3 years ago
- Elucidated Dataset Condensation (NeurIPS 2024)☆20Oct 5, 2024Updated last year
- ☆15Sep 6, 2021Updated 4 years ago
- An approximate implementation of the OpenAI paper - An Empirical Model of Large-Batch Training for MNIST☆11Nov 19, 2022Updated 3 years ago
- A Statistical Arbitrage Strategy to trade Cryptocurrency Pairs☆13Nov 6, 2020Updated 5 years ago
- Uses Processing and Perlin Noise to generate a procedural 2D rendering of different landscapes, which are then rendered into 3D☆16Aug 14, 2018Updated 7 years ago
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14May 26, 2025Updated 9 months ago