abietti / transformer-birthLinks

☆19

Alternatives and similar repositories for transformer-birth

Users that are interested in transformer-birth are comparing it to the libraries listed below

Sorting:

sjelassi / transformers_ssm_copy
☆35Updated last year
berlino / seq_icl
☆53Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated 2 years ago
locuslab / edge-of-stability
☆72Updated 11 months ago
Ping-C / optimizer
This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…
☆40Updated 2 years ago
sustcsonglin / flash-linear-rnn
Implementations of various linear RNN layers using pytorch and triton
☆54Updated 2 years ago
shauli-ravfogel / rlace-icml
☆36Updated 3 years ago
DeqingFu / transformers-icl-second-order
Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…
☆19Updated last year
machine-discovery / deer
Parallelizing non-linear sequential models over the sequence length
☆55Updated 4 months ago
princeton-nlp / LM-Kernel-FT
A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
☆78Updated 2 years ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 4 months ago
p-lambda / incontext-learning
Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…
☆106Updated 2 years ago
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆37Updated 3 weeks ago
RobertCsordas / ndr
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆34Updated 5 months ago
ethancaballero / broken_neural_scaling_laws
Code Release for "Broken Neural Scaling Laws" (BNSL) paper
☆59Updated 2 years ago
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆68Updated last year
igul222 / plaid
☆108Updated 2 years ago
KindXiaoming / Omnigrok
Omnigrok: Grokking Beyond Algorithmic Data
☆62Updated 2 years ago
KihoPark / linear_rep_geometry
☆110Updated 9 months ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
gregorbachmann / Next-Token-Failures
☆104Updated last year
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated 2 years ago
srush / mamba-scans
Blog post
☆17Updated last year
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆64Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
RakitinDen / pytorch-recursive-gumbel-max-trick
Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces, NeurIPS 2021
☆13Updated 3 years ago
radarFudan / Curse-of-memory
Curse-of-memory phenomenon of RNNs in sequence modelling
☆18Updated 6 months ago
TRAIS-Lab / dattri
`dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.
☆94Updated last week
LIONS-EPFL / scion
☆47Updated last month
sustcsonglin / gated_linear_attention_layer
☆31Updated last year