Benchmarking Optimizers for LLM Pretraining
☆57Dec 30, 2025Updated 3 months ago
Alternatives and similar repositories for llm-optimizer-benchmark
Users that are interested in llm-optimizer-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "What really matters in matrix-whitening optimizers?"☆23Oct 31, 2025Updated 5 months ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated last month
- ☆31Nov 30, 2025Updated 4 months ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆65Jan 26, 2026Updated 2 months ago
- ☆26Feb 20, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Pytorch routines for (Ker)nel (Mac)hines☆11Oct 10, 2025Updated 5 months ago
- Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.☆25Sep 29, 2024Updated last year
- ☆52Mar 30, 2026Updated last week
- Flax (JAX) implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation☆12May 24, 2021Updated 4 years ago
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆71Mar 30, 2026Updated last week
- LLM4OR homepage project.☆24Aug 29, 2025Updated 7 months ago
- [ACL 2025] Official implementation of the "CoT-ICL Lab" framework☆11Oct 10, 2025Updated 6 months ago
- Repository for ACL 2020 paper: "Multi-Sentence Argument Linking"☆27Feb 11, 2023Updated 3 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆27Oct 14, 2025Updated 5 months ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- ☆32Apr 19, 2025Updated 11 months ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆17Feb 9, 2026Updated 2 months ago
- Simple MoE - Day 17 of 365 Days of Repos☆18Jan 17, 2025Updated last year
- Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"☆27Jun 4, 2024Updated last year
- ☆11May 24, 2024Updated last year
- A project designed to build and render a full Minecraft crafting tree.☆10Aug 10, 2021Updated 4 years ago
- Exploring the minimal architecture required for coherent English language generation.☆12Mar 5, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Library that provides metrics to assess representation quality☆27Feb 5, 2025Updated last year
- ☆16Feb 6, 2024Updated 2 years ago
- hwpxlib 패키지 python에서 쉽게 사용 할수 있게 만든 github repo 입니다.☆35Mar 29, 2025Updated last year
- Offical implementation of our paper "Exploring the Potential of Diffusion Large Language Models in Code Generation".☆20Oct 29, 2025Updated 5 months ago
- 💻 Terminal-Agent with Human-in-the-Loop Learning☆39Jan 16, 2026Updated 2 months ago
- ☆11Jun 20, 2023Updated 2 years ago
- A Jupyter-style custom node for executing Python code and plotting within ComfyUI workflows.☆37Mar 18, 2026Updated 3 weeks ago
- ☆15Nov 18, 2025Updated 4 months ago
- ☆84Aug 31, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Reproducing GPT on the TinyStories dataset☆19Jan 18, 2024Updated 2 years ago
- Tutorials for MATH 4432 Statistical Machine Learning, HKUST, Fall 2022☆11Sep 17, 2024Updated last year
- The implementation of paper: Empowering Visible-Infrared Person Re-Identification with Large Foundation Models, NeurIPS 2024☆28Dec 30, 2024Updated last year
- This is a repository for RM2021 Software tutorial☆11Nov 4, 2020Updated 5 years ago
- MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research☆24Sep 23, 2025Updated 6 months ago
- [ICLR 2025] This repository contains the code to reproduce the results from our paper From Sparse Dependence to Sparse Attention: Unveili…☆12Mar 7, 2025Updated last year
- This repository contains the implementation of the paper -- KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks☆15Sep 15, 2022Updated 3 years ago