sap-ient-ai / FFFLinks

FastFeedForward Networks

☆19

Alternatives and similar repositories for FFF

Users that are interested in FFF are comparing it to the libraries listed below

Sorting:

epfml / DenseFormer
☆81Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆103Updated 3 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 4 months ago
euclaise / supertrainer2000
☆50Updated last year
joey00072 / microjax
Jax like function transformation engine but micro, microjax
☆33Updated last year
okarthikb / state-space-models
☆28Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
Aleph-Alpha-Research / trigrams
☆57Updated last month
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
main-horse / hnet-old
H-Net Dynamic Hierarchical Architecture
☆80Updated last month
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆98Updated 11 months ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
EleutherAI / training-jacobian
☆22Updated 10 months ago
google-deepmind / spectral_ssm
☆34Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
cosmoquester / memoria
Memoria is a human-inspired memory architecture for neural networks.
☆76Updated last year
catid / spectral_ssm
Implementation of Spectral State Space Models
☆16Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 10 months ago
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 11 months ago
Z-T-WANG / LaProp-Optimizer
Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"
☆29Updated 5 years ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
ethansmith2000 / TransformerExperiments
☆19Updated 5 months ago
nisten / grokadamw
new optimizer
☆20Updated last year
evanatyourservice / llm-jax
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆18Updated 3 months ago
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆51Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
glassroom / heinsen_sequence
Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)
☆95Updated 10 months ago