Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆81Aug 30, 2023Updated 2 years ago
Alternatives and similar repositories for NoTrainNoGain
Users that are interested in NoTrainNoGain are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Aug 23, 2024Updated last year
- Official PyTorch implementation of CD-MOE☆12Mar 18, 2026Updated 3 months ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- ☆13Aug 2, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Pruning is all you need (hopefully)☆12Sep 7, 2022Updated 3 years ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆101Sep 30, 2024Updated last year
- ☆24Sep 25, 2024Updated last year
- ☆11Oct 11, 2023Updated 2 years ago
- ☆10Sep 29, 2023Updated 2 years ago
- ☆50Jan 18, 2024Updated 2 years ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆39Jun 11, 2025Updated last year
- Highly concurrent and fast content processing for Mighty Inference Server☆10Feb 6, 2023Updated 3 years ago
- 🚂 Fine-tune OpenAI models for text classification, question answering, and more☆17May 1, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 9 months ago
- Low-bit optimizers for PyTorch☆139Oct 9, 2023Updated 2 years ago
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆1,019Aug 21, 2024Updated last year
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333☆1,167Jan 11, 2024Updated 2 years ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆68Apr 24, 2024Updated 2 years ago
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings☆45Mar 6, 2024Updated 2 years ago
- ACL 2023☆39Jun 6, 2023Updated 3 years ago
- Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention☆71Apr 7, 2026Updated 2 months ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆41Jan 5, 2022Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆25Oct 4, 2024Updated last year
- ☆43Oct 13, 2023Updated 2 years ago
- ☆33Apr 12, 2021Updated 5 years ago
- ☆32Mar 1, 2024Updated 2 years ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"☆16Aug 11, 2023Updated 2 years ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- SelectiveBackprop accelerates training by dynamically prioritizing useful examples with high loss☆32Mar 12, 2020Updated 6 years ago
- Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"☆13Jul 23, 2023Updated 2 years ago
- ☆316Jun 21, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A library for unit scaling in PyTorch☆134Jul 11, 2025Updated 11 months ago
- The backend behind the LLM-Perf Leaderboard☆11May 5, 2024Updated 2 years ago
- Cramming the training of a (BERT-type) language model into limited compute.☆1,367Jun 13, 2024Updated 2 years ago
- ☆47Oct 11, 2023Updated 2 years ago
- A RAG that can scale 🧑🏻💻☆11May 28, 2024Updated 2 years ago
- ☆54May 20, 2024Updated 2 years ago
- Randomized Positional Encodings Boost Length Generalization of Transformers☆83Mar 14, 2024Updated 2 years ago