SakanaAI / robust-kbenchLinks
β40Updated 3 weeks ago
Alternatives and similar repositories for robust-kbench
Users that are interested in robust-kbench are comparing it to the libraries listed below
Sorting:
- πSmall Batch Size Training for Language Modelsβ63Updated last week
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.β114Updated 2 months ago
- The evaluation framework for training-free sparse attention in LLMsβ101Updated 3 months ago
- β34Updated last year
- Official code for the paper "Attention as a Hypernetwork"β43Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ130Updated 10 months ago
- β58Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruningβ131Updated 2 weeks ago
- β13Updated 7 months ago
- β28Updated 2 weeks ago
- Kinetics: Rethinking Test-Time Scaling Lawsβ80Updated 3 months ago
- β52Updated last year
- β41Updated 6 months ago
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token predictionβ71Updated 4 months ago
- Measuring the Signal to Noise Ratio in Language Model Evaluationβ23Updated last month
- Fast and memory-efficient exact attentionβ70Updated 7 months ago
- β33Updated 10 months ago
- Code for the paper "Function-Space Learning Rates"β23Updated 4 months ago
- β34Updated 7 months ago
- Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.β29Updated 11 months ago
- Don't just regulate gradients like in Muon, regulate the weights tooβ27Updated 2 months ago
- β32Updated last year
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiersβ22Updated 7 months ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"β114Updated last week
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β84Updated 11 months ago
- A simple, performant and scalable JAX-based world modeling codebaseβ76Updated this week
- β19Updated 6 months ago
- Deep Networks Grok All the Time and Here is Whyβ37Updated last year
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Modelsβ40Updated 2 months ago
- H-Net Dynamic Hierarchical Architectureβ80Updated last month