albietz / transformer-birth

☆17

Alternatives and similar repositories for transformer-birth:

Users that are interested in transformer-birth are comparing it to the libraries listed below

r-three / mats
☆28Updated 8 months ago
berlino / seq_icl
☆51Updated 10 months ago
locuslab / edge-of-stability
☆65Updated 3 months ago
aw31 / empirical-ntks
Efficient empirical NTKs in PyTorch
☆18Updated 2 years ago
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆63Updated 6 months ago
MadryLab / DsDm
☆46Updated last year
gregorbachmann / Next-Token-Failures
☆81Updated last year
tml-epfl / sharpness-vs-generalization
A modern look at the relationship between sharpness and generalization [ICML 2023]
☆43Updated last year
princeton-nlp / LM-Kernel-FT
A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
☆74Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆79Updated last year
rtaori / data_feedback
Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"
☆15Updated 2 years ago
UKPLab / iclr2024-model-merging
This is the repository for "Model Merging by Uncertainty-Based Gradient Matching", ICLR 2024.
☆27Updated 10 months ago
EleutherAI / w2s
☆21Updated 6 months ago
mansheej / icl-task-diversity
Code for the paper "Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression"
☆20Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆26Updated 11 months ago
mmatena / model_merging
☆65Updated 3 years ago
radarFudan / Curse-of-memory
Curse-of-memory phenomenon of RNNs in sequence modelling
☆19Updated last week
jiahai-feng / binding-iclr
☆13Updated last year
Nix07 / finetuning
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆25Updated last year
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆59Updated last month
formll / resolving-scaling-law-discrepancies
☆18Updated 8 months ago
KihoPark / linear_rep_geometry
☆90Updated last month
shauli-ravfogel / rlace-icml
☆35Updated 2 years ago
sjelassi / transformers_ssm_copy
☆30Updated last year
RobertCsordas / ndr
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆33Updated 3 years ago
Liuhong99 / implicitbiasmlmcode
☆11Updated 2 years ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆49Updated 2 weeks ago
sustcsonglin / gated_linear_attention_layer
☆33Updated last year
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆65Updated 7 months ago
RobertCsordas / modules
The official repository for our paper "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks". We…
☆46Updated last year