berlino / seq_iclLinks

☆53

Alternatives and similar repositories for seq_icl

Users that are interested in seq_icl are comparing it to the libraries listed below

Sorting:

epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆77Updated 9 months ago
sustcsonglin / mamba-triton
☆49Updated last year
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆221Updated 3 weeks ago
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆32Updated 8 months ago
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆70Updated last year
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆123Updated 7 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆149Updated last month
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆102Updated 2 months ago
RobertCsordas / moeut
☆83Updated 11 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆58Updated last month
mnoukhov / async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆59Updated 3 months ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆97Updated last year
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated last year
srush / mamba-primer
☆37Updated last year
sjelassi / transformers_ssm_copy
☆33Updated last year
shikaiqiu / compute-better-spent
☆53Updated 10 months ago
HazyResearch / prefix-linear-attention
☆56Updated last year
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆136Updated last year
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆82Updated 2 weeks ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated last month
insuhan / hyper-attn
☆81Updated last year
adamkarvonen / SAE_BoardGameEval
☆23Updated 6 months ago
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆30Updated last year
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last week
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆54Updated 3 months ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆54Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year