shreyansh26 / An-Empirical-Model-of-Large-Batch-Training
An approximate implementation of the OpenAI paper - An Empirical Model of Large-Batch Training for MNIST
☆10Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for An-Empirical-Model-of-Large-Batch-Training
- ☆50Updated 6 months ago
- ☆24Updated 8 months ago
- Efficient PScan implementation in PyTorch☆15Updated 10 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆24Updated 7 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- ☆45Updated 4 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆11Updated last month
- Minimal but scalable implementation of large language models in JAX☆26Updated 3 weeks ago
- ☆26Updated last month
- ☆62Updated 3 months ago
- ☆45Updated 9 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆54Updated 3 months ago
- ☆115Updated 4 months ago
- ☆29Updated this week
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆39Updated 3 months ago
- A toolkit for scaling law research ⚖☆43Updated 8 months ago
- Parallel Associative Scan for Language Models☆18Updated 10 months ago
- ☆35Updated 7 months ago
- Fast and memory-efficient exact attention☆27Updated last week
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆49Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆79Updated 9 months ago
- ☆22Updated 2 weeks ago
- Stick-breaking attention☆34Updated 2 weeks ago
- ☆20Updated 11 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆41Updated 10 months ago
- Official Implementation Of The Paper: `DeciMamba: Exploring the Length Extrapolation Potential of Mamba'☆20Updated 4 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆27Updated 3 weeks ago
- ☆53Updated 3 weeks ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆19Updated 2 months ago