joey00072 / oharaLinks

Collection of autoregressive model implementation

☆86

Alternatives and similar repositories for ohara

Users that are interested in ohara are comparing it to the libraries listed below

Sorting:

VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆73Updated 7 months ago
euclaise / supertrainer2000
☆50Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 11 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 8 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆60Updated last year
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆168Updated 10 months ago
epfml / DenseFormer
☆82Updated last year
evanatyourservice / llm-jax
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆18Updated 4 months ago
QuixiAI / grokadamw
☆136Updated last year
RobertCsordas / moeut
☆89Updated last year
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 6 months ago
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 9 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Updated last month
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
joey00072 / Multi-Head-Latent-Attention-MLA-
working implimention of deepseek MLA
☆45Updated 10 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
main-horse / hnet-old
H-Net Dynamic Hierarchical Architecture
☆80Updated 2 months ago
arcee-ai / DAM
☆55Updated last year
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆46Updated 11 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
minyoungg / LTE
☆69Updated last year
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆66Updated last year
geronimi73 / phi2-finetune
☆86Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆101Updated last year
CERC-AAI / Robin
☆63Updated last year
Zyphra / Zyda_processing
☆39Updated last year
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year