1a3orn / very-simple-moeLinks
Extremely simple MoE implementation, mostly based off Switch Transformer
☆12Updated last year
Alternatives and similar repositories for very-simple-moe
Users that are interested in very-simple-moe are comparing it to the libraries listed below
Sorting:
- Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.☆28Updated 9 months ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated last year
- Google Research☆45Updated 2 years ago
- Minimum Description Length probing for neural network representations☆18Updated 7 months ago
- 📰 Computing the information content of trained neural networks☆21Updated 3 years ago
- Understanding how features learned by neural networks evolve throughout training☆36Updated 10 months ago
- ☆48Updated 11 months ago
- ☆82Updated last year
- Embedding Recycling for Language models☆39Updated 2 years ago
- Latent Diffusion Language Models☆69Updated last year
- Your favourite classical machine learning algos on the GPU/TPU☆20Updated 7 months ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated 7 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated 7 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆60Updated 2 weeks ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- Simple repository for training small reasoning models☆37Updated 6 months ago
- Aioli: A unified optimization framework for language model data mixing☆27Updated 7 months ago
- Utilities for PyTorch distributed☆25Updated 6 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆37Updated 6 months ago
- QLoRA for Masked Language Modeling☆22Updated last year
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated 2 years ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆25Updated 2 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆34Updated last year
- We study toy models of skill learning.☆30Updated 7 months ago
- PyTorch implementation for MRL☆19Updated last year
- Pokedex for LLMs☆13Updated 4 months ago
- ☆51Updated last year
- Simple GRPO scripts and configurations.☆59Updated 6 months ago