apoorvkh / academic-pretraining
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
β119Updated this week
Alternatives and similar repositories for academic-pretraining:
Users that are interested in academic-pretraining are comparing it to the libraries listed below
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β90Updated last month
- 𧱠Modula software packageβ132Updated this week
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)β182Updated 7 months ago
- WIPβ92Updated 5 months ago
- β69Updated 4 months ago
- β75Updated 6 months ago
- A MAD laboratory to improve AI architecture designs π§ͺβ102Updated last month
- Ο-GPT: A New Approach to Autoregressive Modelsβ61Updated 5 months ago
- supporting pytorch FSDP for optimizersβ75Updated last month
- β115Updated this week
- β53Updated 11 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β66Updated 2 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ121Updated 9 months ago
- β135Updated this week
- Official implementation of "BERTs are Generative In-Context Learners"β23Updated 7 months ago
- Collection of autoregressive model implementationβ76Updated last week
- β146Updated last month
- Understand and test language model architectures on synthetic tasks.β175Updated this week
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptationβ36Updated 3 months ago
- PyTorch library for Active Fine-Tuningβ52Updated last week
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ145Updated 2 weeks ago
- The official repository for HyperZβ Zβ W Operator Connects Slow-Fast Networks for Full Context Interaction.β31Updated this week
- β25Updated last year
- DeMo: Decoupled Momentum Optimizationβ170Updated last month
- Automatic Evals for Instruction-Tuned Modelsβ100Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β157Updated this week
- β96Updated 3 weeks ago
- An introduction to LLM Samplingβ75Updated last month
- A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).β136Updated 2 weeks ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ115Updated 4 months ago