google-deepmind / transformer_ngramsLinks
☆33Updated last year
Alternatives and similar repositories for transformer_ngrams
Users that are interested in transformer_ngrams are comparing it to the libraries listed below
Sorting:
- ☆152Updated 4 months ago
- ☆186Updated last week
- Open source interpretability artefacts for R1.☆169Updated 9 months ago
- Library for text-to-text regression, applicable to any input string representation and allows pretraining and fine-tuning over multiple r…☆313Updated last month
- ☆116Updated last week
- Curated collection of community environments☆208Updated this week
- Our solution for the arc challenge 2024☆187Updated 7 months ago
- Evaluation of LLMs on latest math competitions☆213Updated last month
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆30Updated 9 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆127Updated 3 months ago
- 🧱 Modula software package☆322Updated 5 months ago
- ☆105Updated 5 months ago
- ☆214Updated 3 weeks ago
- Attribution-based Parameter Decomposition☆33Updated 7 months ago
- 📄Small Batch Size Training for Language Models☆80Updated 3 months ago
- ☆167Updated 5 months ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆288Updated 2 months ago
- Fluid Language Model Benchmarking☆25Updated 4 months ago
- ☆29Updated last year
- A package for defining deep learning models using categorical algebraic expressions.☆61Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆186Updated last week
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆107Updated 2 months ago
- KernelBench v2: Can LLMs Write GPU Kernels? - Benchmark with Torch -> Triton (and more!) problems☆21Updated 6 months ago
- rl from zero pretrain, can it be done? yes.☆286Updated 4 months ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆133Updated 3 years ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆150Updated 3 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆307Updated last month
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆198Updated last year
- Implementation of SOAR☆48Updated 4 months ago
- ☆483Updated 6 months ago