NousResearch / StripedHyenaTrainerLinks
☆61Updated last year
Alternatives and similar repositories for StripedHyenaTrainer
Users that are interested in StripedHyenaTrainer are comparing it to the libraries listed below
Sorting:
- Simplex Random Feature attention, in PyTorch☆74Updated last year
- Implementation of the Llama architecture with RLHF + Q-learning☆166Updated 6 months ago
- ☆81Updated last year
- Collection of autoregressive model implementation☆86Updated 3 months ago
- ☆27Updated last year
- Experiments for efforts to train a new and improved t5☆76Updated last year
- ☆22Updated last year
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆64Updated 9 months ago
- ☆47Updated last year
- Simple GRPO scripts and configurations.☆59Updated 6 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆82Updated 3 years ago
- gzip Predicts Data-dependent Scaling Laws☆35Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆101Updated 7 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- Public Inflection Benchmarks☆68Updated last year
- ☆69Updated 11 months ago
- A repository for research on medium sized language models.☆78Updated last year
- ☆134Updated 11 months ago
- ☆53Updated 8 months ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆103Updated 4 months ago
- ☆37Updated last year
- ☆49Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆123Updated 7 months ago
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- ☆45Updated last year
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated last year
- Data preparation code for Amber 7B LLM☆91Updated last year
- ☆87Updated last year