open-thought / arc-agi-2
Building the cognitive-core to solve ARC-AGI-2
โ20Updated 2 months ago
Alternatives and similar repositories for arc-agi-2:
Users that are interested in arc-agi-2 are comparing it to the libraries listed below
- โ78Updated 9 months ago
- Minimal but scalable implementation of large language models in JAXโ34Updated 5 months ago
- A MAD laboratory to improve AI architecture designs ๐งชโ111Updated 4 months ago
- โ71Updated this week
- Latent Program Network (from the "Searching Latent Program Spaces" paper)โ81Updated last month
- A puzzle to learn about promptingโ127Updated last year
- Machine Learning eXperiment Utilitiesโ46Updated 10 months ago
- โ37Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ105Updated 5 months ago
- โ20Updated last year
- Jax/Flax rewrite of Karpathy's nanoGPTโ57Updated 2 years ago
- A basic pure pytorch implementation of flash attentionโ16Updated 5 months ago
- Jax like function transformation engine but micro, microjaxโ30Updated 6 months ago
- โ102Updated this week
- Code associated to papers on superposition (in ML interpretability)โ27Updated 2 years ago
- Latent Diffusion Language Modelsโ68Updated last year
- Large scale 4D parallelism pre-training for ๐ค transformers in Mixture of Experts *(still work in progress)*โ82Updated last year
- JAX implementation of the Mistral 7b v0.2 modelโ35Updated 9 months ago
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.โ18Updated 10 months ago
- โ72Updated 2 months ago
- A set of Python scripts that makes your experience on TPU betterโ51Updated 9 months ago
- gzip Predicts Data-dependent Scaling Lawsโ34Updated 10 months ago
- A simple library for scaling up JAX programsโ134Updated 5 months ago
- LLM training in simple, raw C/CUDAโ14Updated 4 months ago
- Experiment of using Tangent to autodiff tritonโ78Updated last year
- some common Huggingface transformers in maximal update parametrization (ยตP)โ80Updated 3 years ago
- Language models scale reliably with over-training and on downstream tasksโ96Updated last year
- Experiments for efforts to train a new and improved t5โ77Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ123Updated last year
- seqax = sequence modeling + JAXโ154Updated 2 weeks ago