google-deepmind / altaLinks

☆27

Alternatives and similar repositories for alta

Users that are interested in alta are comparing it to the libraries listed below

Sorting:

RobertCsordas / moeut
☆88Updated last year
google-deepmind / superhuman
☆59Updated 3 weeks ago
wmn-231314 / diffusion-data-constraint
Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…
☆109Updated last month
princeton-nlp / USACO
Can Language Models Solve Olympiad Programming?
☆122Updated 10 months ago
Phylliida / MambaLens
Mamba support for transformer lens
☆18Updated last year
katiekang1998 / reasoning_generalization
☆33Updated 10 months ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆40Updated last month
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
imagination-research / lbt
[NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
☆55Updated last year
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆151Updated 9 months ago
convergence-ai / lm2
Official repo of paper LM2
☆46Updated 9 months ago
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 6 months ago
JacobPfau / fillerTokens
☆75Updated last year
likenneth / q_probe
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆41Updated last year
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆84Updated last year
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆31Updated last year
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆194Updated last year
SalesforceAIResearch / LaTRO
☆124Updated 9 months ago
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆90Updated 11 months ago
JinjieNi / dlms-are-super-data-learners
The official github repo for "Diffusion Language Models are Super Data Learners".
☆205Updated 3 weeks ago
Lagooon / LeanSTaR
☆42Updated last year
adamkarvonen / SAE_BoardGameEval
☆23Updated 10 months ago
IBM / ColPret
Efficient Scaling laws and collaborative pretraining.
☆18Updated 2 months ago
Parallel-Reasoning / APR
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
☆132Updated 3 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 4 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 11 months ago
upiterbarg / lintseq
[ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)
☆19Updated 9 months ago
ScalingIntelligence / large_language_monkeys
☆109Updated last year
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆23Updated 5 months ago
complex-reasoning / RPG
Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
☆54Updated last month