apoorvkh / academic-pretraining
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
โ135Updated last month
Alternatives and similar repositories for academic-pretraining:
Users that are interested in academic-pretraining are comparing it to the libraries listed below
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ103Updated 4 months ago
- A MAD laboratory to improve AI architecture designs ๐งชโ109Updated 3 months ago
- โ128Updated 2 weeks ago
- โ76Updated 9 months ago
- โ107Updated 3 months ago
- PyTorch library for Active Fine-Tuningโ63Updated last month
- Understand and test language model architectures on synthetic tasks.โ190Updated last month
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ123Updated 11 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"โ71Updated 5 months ago
- โ77Updated 7 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrunโ46Updated last month
- โ79Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.โ170Updated 3 months ago
- supporting pytorch FSDP for optimizersโ80Updated 4 months ago
- WIPโ93Updated 8 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)โ189Updated 10 months ago
- An introduction to LLM Samplingโ77Updated 4 months ago
- Normalized Transformer (nGPT)โ167Updated 4 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)โ95Updated last month
- Getting crystal-like representations with harmonic lossโ182Updated 2 weeks ago
- ฯ-GPT: A New Approach to Autoregressive Modelsโ62Updated 8 months ago
- โ53Updated last year
- Implementation of ๐ฅฅ Coconut, Chain of Continuous Thought, in Pytorchโ164Updated 3 months ago
- code for training & evaluating Contextual Document Embedding modelsโ180Updated 3 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"โ73Updated 4 months ago
- โ25Updated last month
- EvaByte: Efficient Byte-level Language Models at Scaleโ86Updated 3 weeks ago
- NanoGPT-speedrunning for the poor T4 enjoyersโ60Updated last week
- โ32Updated last week
- โ25Updated last year