languini-kitchen / languini-kitchenLinks

The official Languini Kitchen repository

☆14

Alternatives and similar repositories for languini-kitchen

Users that are interested in languini-kitchen are comparing it to the libraries listed below

Sorting:

ethancaballero / broken_neural_scaling_laws
Code Release for "Broken Neural Scaling Laws" (BNSL) paper
☆59Updated last year
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
stanislavfort / dissect-git-re-basin
Replicating and dissecting the git-re-basin project in one-click-replication Colabs
☆36Updated 2 years ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆54Updated last year
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆87Updated last year
nikhilvyas / SOAP_MUON
Combining SOAP and MUON
☆16Updated 5 months ago
thjashin / multires-conv
Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
☆124Updated last year
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆66Updated 9 months ago
edwardjhu / TP4
Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)
☆62Updated 4 years ago
berlino / seq_icl
☆53Updated last year
johnryan465 / pscan
☆40Updated last year
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
arogozhnikov / adamw_bfloat16
AdamW optimizer for bfloat16 models in pytorch 🔥.
☆33Updated last year
HazyResearch / structured-nets
Structured matrices for compressing neural networks
☆67Updated last year
CLAIRE-Labo / StructuredFFN
The official code of "Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers"
☆19Updated 11 months ago
optimizedlearning / mechanic
☆36Updated last year
RobertCsordas / moe_layer
sigma-MoE layer
☆20Updated last year
sustcsonglin / mamba-triton
☆48Updated last year
GallagherCommaJack / modulax
☆17Updated 10 months ago
IDSIA / rtrl-elstm
Official repository for the paper "Exploring the Promise and Limits of Real-Time Recurrent Learning" (ICLR 2024)
☆11Updated last month
lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆179Updated last month
UW-Madison-Lee-Lab / Expressive_Power_of_LoRA
Code for "The Expressive Power of Low-Rank Adaptation".
☆20Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated last month
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆89Updated last year