SakanaAI / Sudoku-Bench
An AI benchmark for creative, human-like problem solving using Sudoku variants
☆20Updated last week
Alternatives and similar repositories for Sudoku-Bench:
Users that are interested in Sudoku-Bench are comparing it to the libraries listed below
- CycleQD is a framework for parameter space model merging.☆35Updated last month
- Checkpointable dataset utilities for foundation model training☆32Updated last year
- Code for Discovering Preference Optimization Algorithms with and for Large Language Models☆61Updated 9 months ago
- ☆53Updated last year
- ☆52Updated 5 months ago
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- ☆30Updated 4 months ago
- Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.☆38Updated 2 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆29Updated last year
- Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper☆80Updated 3 years ago
- RWKV model implementation☆37Updated last year
- Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation☆55Updated this week
- Mamba training library developed by kotoba technologies☆68Updated last year
- [ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models☆21Updated 2 months ago
- Easily run PyTorch on multiple GPUs & machines☆45Updated last week
- σ-GPT: A New Approach to Autoregressive Models☆62Updated 7 months ago
- Deep Networks Grok All the Time and Here is Why☆33Updated 10 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆63Updated 6 months ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated 2 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- ☆31Updated 11 months ago
- My explorations into editing the knowledge and memories of an attention network☆34Updated 2 years ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆71Updated 5 months ago
- Automatically take good care of your preemptible TPUs☆36Updated last year
- Jax like function transformation engine but micro, microjax☆30Updated 5 months ago
- Minimal but scalable implementation of large language models in JAX☆34Updated 4 months ago
- ☆79Updated 11 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- ☆33Updated 6 months ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆99Updated 2 years ago