☆48Jul 21, 2025Updated 8 months ago
Alternatives and similar repositories for assignment3-scaling
Users that are interested in assignment3-scaling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch☆194Jul 25, 2025Updated 8 months ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated last month
- ☆31Nov 30, 2025Updated 3 months ago
- ☆25Feb 20, 2026Updated last month
- The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size☆19May 19, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Code for "What really matters in matrix-whitening optimizers?"☆23Oct 31, 2025Updated 4 months ago
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆1,364Aug 29, 2025Updated 6 months ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Flax (JAX) implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation☆12May 24, 2021Updated 4 years ago
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆69Updated this week
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- ☆58Sep 17, 2025Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Nov 27, 2023Updated 2 years ago
- [ACL 2025] Official implementation of the "CoT-ICL Lab" framework☆11Oct 10, 2025Updated 5 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 创建一个通过串口访问DeepSeek的设备☆20May 30, 2025Updated 9 months ago
- ☆14Dec 20, 2021Updated 4 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- Laboratory for Fluorescence Dynamics (LFD) file formats.☆11Mar 19, 2026Updated last week
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆27Oct 14, 2025Updated 5 months ago
- LLM training in simple, raw C/CUDA☆18May 6, 2024Updated last year
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆64Jan 26, 2026Updated 2 months ago
- Simple MoE - Day 17 of 365 Days of Repos☆18Jan 17, 2025Updated last year
- hakken is a coding agent which needs hell lot of context☆31Dec 4, 2025Updated 3 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"☆27Jun 4, 2024Updated last year
- A project designed to build and render a full Minecraft crafting tree.☆10Aug 10, 2021Updated 4 years ago
- Library that provides metrics to assess representation quality☆26Feb 5, 2025Updated last year
- Exploring the minimal architecture required for coherent English language generation.☆12Mar 5, 2025Updated last year
- ☆11Jun 20, 2023Updated 2 years ago
- A Jupyter-style custom node for executing Python code and plotting within ComfyUI workflows.☆36Mar 18, 2026Updated last week
- A Template Repository for a Swift Package-based Stanford Byers Center for Biodesign Digital Health Project☆18Updated this week
- ☆84Aug 31, 2023Updated 2 years ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆86Jan 12, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Reproducing GPT on the TinyStories dataset☆19Jan 18, 2024Updated 2 years ago
- NeurIPS22 "RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection" and T-PAMI Extension☆20Feb 21, 2025Updated last year
- ☆17Feb 4, 2025Updated last year
- Tutorials for MATH 4432 Statistical Machine Learning, HKUST, Fall 2022☆11Sep 17, 2024Updated last year
- Modern utility library and typescript typings for building JSON Schema documents☆14Nov 28, 2025Updated 3 months ago
- [ICLR 2025] This repository contains the code to reproduce the results from our paper From Sparse Dependence to Sparse Attention: Unveili…☆12Mar 7, 2025Updated last year
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆19May 29, 2023Updated 2 years ago