1a3orn / very-simple-moeLinks

Extremely simple MoE implementation, mostly based off Switch Transformer

☆12

Alternatives and similar repositories for very-simple-moe

Users that are interested in very-simple-moe are comparing it to the libraries listed below

Sorting:

Networks-Learning / counterfactual-llms
Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.
☆28Updated 9 months ago
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
ekinakyurek / google-research
Google Research
☆45Updated 2 years ago
EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆18Updated 7 months ago
jxbz / entropix
📰 Computing the information content of trained neural networks
☆21Updated 3 years ago
EleutherAI / features-across-time
Understanding how features learned by neural networks evolve throughout training
☆36Updated 10 months ago
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated 11 months ago
epfml / DenseFormer
☆82Updated last year
allenai / EmbeddingRecycling
Embedding Recycling for Language models
☆39Updated 2 years ago
crowsonkb / LDLM
Latent Diffusion Language Models
☆69Updated last year
LiibanMo / scikit-jax
Your favourite classical machine learning algos on the GPU/TPU
☆20Updated 7 months ago
UKPLab / on-emergence
Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning
☆33Updated 7 months ago
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 7 months ago
AndyShih12 / LongHorizonTemperatureScaling
PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023
☆20Updated 2 years ago
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆60Updated 2 weeks ago
kumar-shridhar / Screws
SCREWS: A Modular Framework for Reasoning with Revisions
☆27Updated last year
tyler-romero / microR1
Simple repository for training small reasoning models
☆37Updated 6 months ago
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆27Updated 7 months ago
crowsonkb / torch-dist-utils
Utilities for PyTorch distributed
☆25Updated 6 months ago
lucidrains / transformer-lm-gan
Explorations into adversarial losses on top of autoregressive loss for language modeling
☆37Updated 6 months ago
ChrisHayduk / QLoRA-for-MLM
QLoRA for Masked Language Modeling
☆22Updated last year
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆34Updated 2 years ago
lilakk / BLEUBERI
Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"
☆25Updated 2 months ago
ctlllll / understanding_llm_benchmarks
Understanding the correlation between different LLM benchmarks
☆29Updated last year
ElleLeonne / Lightning-ReLoRA
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆34Updated last year
KindXiaoming / physics_of_skill_learning
We study toy models of skill learning.
☆30Updated 7 months ago
krypticmouse / matryoshka-representation-learning
PyTorch implementation for MRL
☆19Updated last year
attentionmech / dex
Pokedex for LLMs
☆13Updated 4 months ago
srush / LLM-Talk
☆51Updated last year
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 6 months ago