JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆48Updated last month
Related projects: ⓘ
- Language models scale reliably with over-training and on downstream tasks☆91Updated 5 months ago
- ☆48Updated 4 months ago
- ☆66Updated 3 months ago
- ☆43Updated 7 months ago
- ☆61Updated 6 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆53Updated 5 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆77Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆99Updated 6 months ago
- ☆47Updated 3 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆25Updated 5 months ago
- ☆42Updated 7 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆51Updated 3 months ago
- AI Logging for Interpretability and Explainability🔬☆74Updated 3 months ago
- ☆38Updated 5 months ago
- ☆129Updated last year
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆123Updated 4 months ago
- ☆68Updated 3 weeks ago
- This repo is based on https://github.com/jiaweizzhao/GaLore, paper coming soon☆18Updated this week
- Official implementation of Goldfish Loss: Mitigating Memorization in Generative LLMs☆68Updated 2 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆57Updated 5 months ago
- Understand and test language model architectures on synthetic tasks.☆156Updated 4 months ago
- Randomized Positional Encodings Boost Length Generalization of Transformers☆78Updated 6 months ago
- ☆65Updated 9 months ago
- A framework for few-shot evaluation of autoregressive language models.☆23Updated 8 months ago
- ☆50Updated last month
- Triton implementation of FlashAttention2 that adds Custom Masks.☆62Updated last month
- ☆44Updated 11 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆73Updated last month
- ☆94Updated 6 months ago
- ☆69Updated 4 months ago