nanoGPT-like codebase for LLM training
☆117Nov 7, 2025Updated 6 months ago
Alternatives and similar repositories for llm-baselines
Users that are interested in llm-baselines are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- some mixture of experts architecture implementations☆27Mar 22, 2024Updated 2 years ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- ☆19Jun 10, 2024Updated last year
- Codebase for ICML submission "DOGE: Domain Reweighting with Generalization Estimation"☆21Feb 29, 2024Updated 2 years ago
- ☆63Apr 8, 2026Updated last month
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code for "Practical Low-Rank Communication Compression in Decentralized Deep Learning"☆17Aug 4, 2020Updated 5 years ago
- Robust Cross-lingual Embeddings from Parallel Sentences☆22Jun 27, 2020Updated 5 years ago
- Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727☆150Oct 29, 2024Updated last year
- SGD with large step sizes learns sparse features [ICML 2023]☆33Apr 24, 2023Updated 3 years ago
- Scalable and Stable Parallelization of Nonlinear RNNS☆30Mar 6, 2026Updated 2 months ago
- ☆20Nov 3, 2020Updated 5 years ago
- Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"☆21Dec 10, 2021Updated 4 years ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- ☆27May 3, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- CoLa - Decentralized Linear Learning: https://arxiv.org/abs/1808.04883☆20Nov 30, 2021Updated 4 years ago
- {DeepL, Google, WMT-Best, davinci-003, turbo, gpt-4} × {En-De, En-Cs, En-Ru, En-Zh, De-Fr, En-Ja, Uk-En, Uk-Cs, En-Hr, En-Ha, En-Is}☆14Jun 18, 2023Updated 2 years ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆21May 2, 2024Updated 2 years ago
- ☆16Apr 26, 2023Updated 3 years ago
- trying to make WebGPU a bit easier to use☆19Jan 9, 2024Updated 2 years ago
- ☆18Jun 9, 2021Updated 4 years ago
- Do input gradients highlight discriminative features? [NeurIPS 2021] (https://arxiv.org/abs/2102.12781)☆12Jan 10, 2023Updated 3 years ago
- Official code for "In Search of Robust Measures of Generalization" (NeurIPS 2020)☆28Dec 22, 2020Updated 5 years ago
- ☆50Jan 18, 2024Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆64Apr 9, 2024Updated 2 years ago
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆44Sep 11, 2023Updated 2 years ago
- ☆16Dec 9, 2023Updated 2 years ago
- Code related to ’Beyond spectral gap: The role of the topology in decentralized learning‘.☆14Jun 7, 2022Updated 3 years ago
- Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms☆12Jan 11, 2022Updated 4 years ago
- ☆63Oct 3, 2024Updated last year
- ☆71Nov 15, 2024Updated last year
- Introduction to PyTorch Workshop at the AMLD 2019☆31Jun 10, 2019Updated 6 years ago
- Towards Understanding Sharpness-Aware Minimization [ICML 2022]☆38Jun 14, 2022Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆13Mar 22, 2023Updated 3 years ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆191Jan 19, 2026Updated 4 months ago
- LLM training in simple, raw C/CUDA☆15Dec 5, 2024Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆72Sep 25, 2024Updated last year
- Pytorch implementation of paper: Small Pre-trained Language Models Can be Fine-tuned as Large Models via Over-Parameterization.☆12May 18, 2023Updated 3 years ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆32Jan 23, 2025Updated last year