tysam-code / hlb-gptLinks

Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).

☆350

Alternatives and similar repositories for hlb-gpt

Users that are interested in hlb-gpt are comparing it to the libraries listed below

Sorting:

gautierdag / bpeasy
Fast bare-bones BPE for modern tokenizer training
☆167Updated 4 months ago
abacaj / train-with-fsdp
☆94Updated 2 years ago
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆603Updated last year
SkunkworksAI / hydra-moe
☆415Updated last year
apple / ml-sigma-reparam
☆309Updated last year
Sentdex / Lambda-Cloud
Helpers and such for working with Lambda Cloud
☆51Updated last year
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
Cerebras / gigaGPT
a small code base for training large models
☆309Updated 5 months ago
RobertRiachi / nanoPALM
☆144Updated 2 years ago
sanjeevanahilan / nanoChatGPT
A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trick
☆293Updated last year
srush / GPTWorld
A puzzle to learn about prompting
☆135Updated 2 years ago
HazyResearch / H3
Language Modeling with the H3 State Space Model
☆518Updated 2 years ago
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆518Updated last year
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆139Updated last year
sabetAI / BLoRA
batched loras
☆346Updated 2 years ago
r-three / git-theta
git extension for {collaborative, communal, continual} model development
☆215Updated 11 months ago
Locutusque / TPU-Alignment
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆231Updated 11 months ago
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆247Updated last year
srush / Transformer-Puzzles
Puzzles for exploring transformers
☆373Updated 2 years ago
AblateIt / finetune-study
Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.
☆82Updated 2 years ago
rgreenblatt / arc_draw_more_samples_pub
Draw more samples
☆194Updated last year
abacaj / fine-tune-mistral
Fine-tune mistral-7B on 3090s, a100s, h100s
☆714Updated 2 years ago
kyegomez / Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
☆382Updated last year
marin-community / levanter
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆671Updated this week
HenryNdubuaku / nanodl
A Jax-based library for building transformers, includes implementations of GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT and more.
☆295Updated last year
srush / raspy
An interactive exploration of Transformer programming.
☆269Updated last year
imoneoi / multipack
Multipack distributed sampler for fast padding-free training of LLMs
☆201Updated last year
euclaise / SlimTrainer
Full finetuning of large language models without large memory requirements
☆93Updated last month
erfanzar / EasyDeL
Accelerate, Optimize performance with streamlined training and serving options with JAX.
☆317Updated this week
google-deepmind / nanodo
☆283Updated last year