berlino / seq_icl
β51Updated 10 months ago
Alternatives and similar repositories for seq_icl:
Users that are interested in seq_icl are comparing it to the libraries listed below
- Stick-breaking attentionβ49Updated 2 weeks ago
- A MAD laboratory to improve AI architecture designs π§ͺβ108Updated 3 months ago
- β47Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β71Updated 5 months ago
- β74Updated 7 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.β65Updated 8 months ago
- Sparse Autoencoder Training Libraryβ47Updated 5 months ago
- β52Updated 5 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountβ¦β53Updated last year
- Universal Neurons in GPT2 Language Modelsβ27Updated 10 months ago
- Minimal but scalable implementation of large language models in JAXβ34Updated 5 months ago
- β24Updated last year
- Understand and test language model architectures on synthetic tasks.β185Updated 3 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β103Updated 4 months ago
- β30Updated last year
- β87Updated 6 months ago
- A toolkit for scaling law research ββ49Updated 2 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"β26Updated 11 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"β73Updated 4 months ago
- nanoGPT-like codebase for LLM trainingβ91Updated this week
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entityβ¦β25Updated last year
- β66Updated 4 months ago
- β37Updated 11 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)β72Updated last year
- Experiments on the impact of depth in transformers and SSMs.β23Updated 4 months ago
- Language models scale reliably with over-training and on downstream tasksβ96Updated last year
- A framework for few-shot evaluation of autoregressive language models.β24Updated last year
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Modelsβ41Updated last week
- β53Updated last year
- β81Updated last year