nanoGPT-like codebase for LLM training
☆118Nov 7, 2025Updated 7 months ago
Alternatives and similar repositories for llm-baselines
Users that are interested in llm-baselines are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- some mixture of experts architecture implementations☆27Mar 22, 2024Updated 2 years ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- ☆65Apr 8, 2026Updated 2 months ago
- Code for "Practical Low-Rank Communication Compression in Decentralized Deep Learning"☆17Aug 4, 2020Updated 5 years ago
- Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727☆151Oct 29, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- SGD with large step sizes learns sparse features [ICML 2023]☆33Apr 24, 2023Updated 3 years ago
- Scalable and Stable Parallelization of Nonlinear RNNS☆31Mar 6, 2026Updated 3 months ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- ☆27May 3, 2024Updated 2 years ago
- {DeepL, Google, WMT-Best, davinci-003, turbo, gpt-4} × {En-De, En-Cs, En-Ru, En-Zh, De-Fr, En-Ja, Uk-En, Uk-Cs, En-Hr, En-Ha, En-Is}☆14Jun 18, 2023Updated 2 years ago
- A framework for implementing equivariant DL☆10May 25, 2021Updated 5 years ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆21May 2, 2024Updated 2 years ago
- ☆16Apr 26, 2023Updated 3 years ago
- ☆15Apr 26, 2022Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Repository for Deep Learning Theory papers☆15Jan 24, 2024Updated 2 years ago
- trying to make WebGPU a bit easier to use☆19Jan 9, 2024Updated 2 years ago
- ☆19Jun 9, 2021Updated 5 years ago
- Do input gradients highlight discriminative features? [NeurIPS 2021] (https://arxiv.org/abs/2102.12781)☆12Jan 10, 2023Updated 3 years ago
- Official code for "In Search of Robust Measures of Generalization" (NeurIPS 2020)☆29Dec 22, 2020Updated 5 years ago
- ☆50Jan 18, 2024Updated 2 years ago
- Implement FlashAttention v2 with minimal code to learn.☆16Jun 12, 2024Updated last year
- ☆64Apr 9, 2024Updated 2 years ago
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆44Sep 11, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- research impl of Native Sparse Attention (2502.11089)☆63Feb 19, 2025Updated last year
- ☆16Dec 9, 2023Updated 2 years ago
- ☆35Jun 13, 2023Updated 2 years ago
- Code related to ’Beyond spectral gap: The role of the topology in decentralized learning‘.☆14Jun 7, 2022Updated 4 years ago
- Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms☆12Jan 11, 2022Updated 4 years ago
- ☆63Oct 3, 2024Updated last year
- ☆32Mar 18, 2026Updated 2 months ago
- Towards Understanding Sharpness-Aware Minimization [ICML 2022]☆38Jun 14, 2022Updated 3 years ago
- ☆13Mar 22, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆194Jan 19, 2026Updated 4 months ago
- LLM training in simple, raw C/CUDA☆15Dec 5, 2024Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆73Sep 25, 2024Updated last year
- ☆52Jan 28, 2024Updated 2 years ago
- Pytorch implementation of paper: Small Pre-trained Language Models Can be Fine-tuned as Large Models via Over-Parameterization.☆12May 18, 2023Updated 3 years ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆33Jan 23, 2025Updated last year