stanford-cs336 / assignment3-scalingLinks
☆18Updated last month
Alternatives and similar repositories for assignment3-scaling
Users that are interested in assignment3-scaling are comparing it to the libraries listed below
Sorting:
- ☆217Updated 7 months ago
- ☆360Updated 8 months ago
- Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch☆73Updated last month
- Open-source framework for the research and development of foundation models.☆439Updated this week
- Dion optimizer algorithm☆338Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆274Updated last month
- Decentralized RL Training at Scale☆592Updated this week
- Physics of Language Models, Part 4☆242Updated last month
- rl from zero pretrain, can it be done? yes.☆265Updated 3 weeks ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆193Updated 3 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆566Updated 2 weeks ago
- ☆99Updated this week
- Simple & Scalable Pretraining for Neural Architecture Research☆291Updated 3 weeks ago
- ☆45Updated last month
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆651Updated 2 weeks ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆405Updated 6 months ago
- ☆104Updated 3 weeks ago
- ☆423Updated this week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆591Updated last week
- An extension of the nanoGPT repository for training small MOE models.☆185Updated 6 months ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆524Updated 2 months ago
- Memory optimized Mixture of Experts☆65Updated last month
- Training-Ready RL Environments + Evals☆90Updated this week
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆328Updated 9 months ago
- Open source interpretability artefacts for R1.☆158Updated 4 months ago
- ☆534Updated last year
- ☆199Updated 8 months ago
- ☆95Updated 11 months ago
- Decoder only transformer, built from scratch with PyTorch☆31Updated last year
- Evaluation of LLMs on latest math competitions☆162Updated last month