berlino / seq_icl
โ50Updated 6 months ago
Related projects โ
Alternatives and complementary repositories for seq_icl
- โ45Updated 9 months ago
- A MAD laboratory to improve AI architecture designs ๐งชโ95Updated 6 months ago
- โ53Updated 3 weeks ago
- โ44Updated last year
- Language models scale reliably with over-training and on downstream tasksโ94Updated 7 months ago
- NanoGPT-like codebase for LLM trainingโ75Updated this week
- Minimal but scalable implementation of large language models in JAXโ26Updated 2 weeks ago
- Understand and test language model architectures on synthetic tasks.โ162Updated 6 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)โ61Updated 7 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.โ54Updated 3 months ago
- โ62Updated 3 months ago
- โ24Updated 8 months ago
- โ36Updated 3 months ago
- Stick-breaking attentionโ34Updated last week
- โ35Updated 7 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"โ61Updated last week
- โ46Updated last month
- Universal Neurons in GPT2 Language Modelsโ27Updated 5 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)โ79Updated last year
- A library for efficient patching and automatic circuit discovery.โ31Updated last month
- Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"โ12Updated 6 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountโฆโ49Updated last year
- โ54Updated last month
- Blog postโ16Updated 9 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Seโฆโ61Updated 6 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMsโ48Updated 7 months ago
- โ107Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ84Updated last week
- โ127Updated 10 months ago
- โ45Updated 4 months ago