apoorvkh / academic-pretrainingLinks

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

☆148

Alternatives and similar repositories for academic-pretraining

Users that are interested in academic-pretraining are comparing it to the libraries listed below

Sorting:

EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆195Updated last year
RobertCsordas / moeut
☆89Updated last year
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 10 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆111Updated 7 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
epfml / DenseFormer
☆82Updated last year
google-deepmind / mishax
☆144Updated 3 months ago
VsonicV / es-fine-tuning-paper
This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"
☆266Updated 2 weeks ago
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 7 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
idiap / sigma-gpt
σ-GPT: A New Approach to Autoregressive Models
☆70Updated last year
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆84Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆132Updated last month
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆112Updated 2 months ago
zaydzuhri / softpick-attention
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
☆85Updated 2 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆164Updated 7 months ago
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆68Updated 2 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
KindXiaoming / grow-crystals
Getting crystal-like representations with harmonic loss
☆192Updated 8 months ago
jerber / lang-jepa
☆129Updated 11 months ago
wmn-231314 / diffusion-data-constraint
Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…
☆109Updated last month
evanatyourservice / kron_torch
An implementation of PSGD Kron second-order optimizer for PyTorch
☆97Updated 4 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
cloneofsimo / scaling-guide
WIP
☆93Updated last year
facebookresearch / ExploreToM
Code for ExploreTom
☆88Updated 5 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆72Updated 7 months ago
nahidalam / maya
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
☆123Updated 4 months ago
SHI-Labs / CompactNet
☆32Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆105Updated 4 months ago