nanoGPT-like codebase for LLM training
☆116Nov 7, 2025Updated 4 months ago
Alternatives and similar repositories for llm-baselines
Users that are interested in llm-baselines are comparing it to the libraries listed below
Sorting:
- some mixture of experts architecture implementations☆26Mar 22, 2024Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- ☆19Jun 10, 2024Updated last year
- ☆56Feb 24, 2026Updated 3 weeks ago
- Code for "Practical Low-Rank Communication Compression in Decentralized Deep Learning"☆17Aug 4, 2020Updated 5 years ago
- Robust Cross-lingual Embeddings from Parallel Sentences☆22Jun 27, 2020Updated 5 years ago
- Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727☆149Oct 29, 2024Updated last year
- Scalable and Stable Parallelization of Nonlinear RNNS☆29Mar 6, 2026Updated 2 weeks ago
- ☆20Nov 3, 2020Updated 5 years ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- ☆27May 3, 2024Updated last year
- CoLa - Decentralized Linear Learning: https://arxiv.org/abs/1808.04883☆20Nov 30, 2021Updated 4 years ago
- {DeepL, Google, WMT-Best, davinci-003, turbo, gpt-4} × {En-De, En-Cs, En-Ru, En-Zh, De-Fr, En-Ja, Uk-En, Uk-Cs, En-Hr, En-Ha, En-Is}☆14Jun 18, 2023Updated 2 years ago
- A framework for implementing equivariant DL☆10May 25, 2021Updated 4 years ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆21May 2, 2024Updated last year
- ☆16Apr 26, 2023Updated 2 years ago
- ☆15Apr 26, 2022Updated 3 years ago
- trying to make WebGPU a bit easier to use☆19Jan 9, 2024Updated 2 years ago
- ☆18Jun 9, 2021Updated 4 years ago
- Do input gradients highlight discriminative features? [NeurIPS 2021] (https://arxiv.org/abs/2102.12781)☆13Jan 10, 2023Updated 3 years ago
- Official code for "In Search of Robust Measures of Generalization" (NeurIPS 2020)☆28Dec 22, 2020Updated 5 years ago
- Using PyTorch autograd to compute Hessian of Perplexity for Large Language Models☆29Apr 17, 2025Updated 11 months ago
- ☆48Jan 18, 2024Updated 2 years ago
- ☆64Apr 9, 2024Updated last year
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆44Sep 11, 2023Updated 2 years ago
- research impl of Native Sparse Attention (2502.11089)☆63Feb 19, 2025Updated last year
- Code related to ’Beyond spectral gap: The role of the topology in decentralized learning‘.☆14Jun 7, 2022Updated 3 years ago
- Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms☆12Jan 11, 2022Updated 4 years ago
- ☆63Oct 3, 2024Updated last year
- ☆70Nov 15, 2024Updated last year
- ☆30Jan 12, 2026Updated 2 months ago
- Introduction to PyTorch Workshop at the AMLD 2019☆31Jun 10, 2019Updated 6 years ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆190Jan 19, 2026Updated 2 months ago
- LLM training in simple, raw C/CUDA☆15Dec 5, 2024Updated last year
- ☆51Jan 28, 2024Updated 2 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆71Sep 25, 2024Updated last year
- Pytorch implementation of paper: Small Pre-trained Language Models Can be Fine-tuned as Large Models via Over-Parameterization.☆12May 18, 2023Updated 2 years ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- Website for the book "The Elements of Differentiable Programming".☆14Jul 2, 2025Updated 8 months ago