brantondemoss / GrokkingComplexity
Code for
☆24Updated 3 months ago
Alternatives and similar repositories for GrokkingComplexity:
Users that are interested in GrokkingComplexity are comparing it to the libraries listed below
- ☆31Updated 10 months ago
- LLM training in simple, raw C/CUDA☆14Updated 3 months ago
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- Jax like function transformation engine but micro, microjax☆30Updated 5 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Evaluation of neuro-symbolic engines☆35Updated 7 months ago
- The repository contains code for Adaptive Data Optimization☆20Updated 3 months ago
- Official repo of paper LM2☆34Updated last month
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last week
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆35Updated last year
- Simple GRPO scripts and configurations.☆58Updated last month
- A repository for research on medium sized language models.☆76Updated 10 months ago
- ☆74Updated 7 months ago
- Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"☆14Updated last month
- Implementation of Spectral State Space Models☆16Updated last year
- Official Code Release for "Training a Generally Curious Agent"☆19Updated 2 weeks ago
- ☆91Updated 2 months ago
- Your favourite classical machine learning algos on the GPU/TPU☆20Updated 2 months ago
- ☆52Updated 5 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆83Updated last year
- ☆31Updated 11 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated 3 months ago
- Using FlexAttention to compute attention with different masking patterns☆42Updated 6 months ago
- ☆38Updated 7 months ago
- GoldFinch and other hybrid transformer components☆45Updated 8 months ago
- Code accompanying the paper "A Language Model's Guide Through Latent Space". It contains functionality for training and using concept vec…☆19Updated last year
- ☆53Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 6 months ago