nanoGPT-like codebase for LLM training
☆116Nov 7, 2025Updated 3 months ago
Alternatives and similar repositories for llm-baselines
Users that are interested in llm-baselines are comparing it to the libraries listed below
Sorting:
- some mixture of experts architecture implementations☆26Mar 22, 2024Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆91Oct 30, 2024Updated last year
- ☆54Dec 17, 2025Updated 2 months ago
- ☆27May 3, 2024Updated last year
- Code for "Practical Low-Rank Communication Compression in Decentralized Deep Learning"☆17Aug 4, 2020Updated 5 years ago
- ☆19Jun 10, 2024Updated last year
- ☆15Apr 26, 2022Updated 3 years ago
- ☆20Nov 3, 2020Updated 5 years ago
- Robust Cross-lingual Embeddings from Parallel Sentences☆22Jun 27, 2020Updated 5 years ago
- Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"☆21Dec 10, 2021Updated 4 years ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727☆149Oct 29, 2024Updated last year
- Code for paper "Patch-Level Training for Large Language Models"☆96Nov 10, 2025Updated 3 months ago
- SGD with large step sizes learns sparse features [ICML 2023]☆33Apr 24, 2023Updated 2 years ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- ☆64Apr 9, 2024Updated last year
- Display tensors directly from GPU☆11Oct 12, 2025Updated 4 months ago
- Official codebase for our paper "Do Language Models Use Their Depth Efficiently?"☆29Jun 25, 2025Updated 8 months ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12May 24, 2023Updated 2 years ago
- RWKV6 in native pytorch and triton:)☆11Aug 4, 2024Updated last year
- ☆16Apr 26, 2023Updated 2 years ago
- recipe for training fully-featured self supervised image jepa models☆12Jun 4, 2025Updated 8 months ago
- A framework for implementing equivariant DL☆10May 25, 2021Updated 4 years ago
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments☆13Jul 8, 2024Updated last year
- Supporting code for the blog post on modular manifolds.☆117Sep 26, 2025Updated 5 months ago
- ☆13Mar 22, 2023Updated 2 years ago
- research impl of Native Sparse Attention (2502.11089)☆63Feb 19, 2025Updated last year
- ☆16Dec 9, 2023Updated 2 years ago
- ☆17Jul 25, 2025Updated 7 months ago
- Rate-Adaptive Quantization: A Multi-Rate Codebook Adaptation for Vector Quantization-based Generative Models☆15Sep 10, 2025Updated 5 months ago
- ☆35Jun 13, 2023Updated 2 years ago
- Code related to ’Beyond spectral gap: The role of the topology in decentralized learning‘.☆13Jun 7, 2022Updated 3 years ago
- Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms☆12Jan 11, 2022Updated 4 years ago
- A powerful white-box adversarial attack that exploits knowledge about the geometry of neural networks to find minimal adversarial perturb…☆12Aug 5, 2020Updated 5 years ago
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Do input gradients highlight discriminative features? [NeurIPS 2021] (https://arxiv.org/abs/2102.12781)☆13Jan 10, 2023Updated 3 years ago
- A School for All Seasons on Trustworthy Machine Learning☆12Jun 30, 2021Updated 4 years ago
- ☆36Sep 23, 2022Updated 3 years ago
- MLBench Framework Core Python Library☆18Mar 1, 2023Updated 2 years ago