SeanNaren / min-LLMLinks

Minimal code to train a Large Language Model (LLM).

☆172

Alternatives and similar repositories for min-LLM

Users that are interested in min-LLM are comparing it to the libraries listed below

Sorting:

kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆138Updated last year
Rallio67 / language-model-agents
Experiments with generating opensource language model assistants
☆97Updated 2 years ago
imoneoi / multipack
Multipack distributed sampler for fast padding-free training of LLMs
☆202Updated last year
kernelmachine / cbtm
Code repository for the c-BTM paper
☆108Updated 2 years ago
huggingface / transformers_bloom_parallel
Techniques used to run BLOOM at inference in parallel
☆37Updated 3 years ago
huggingface / olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
☆178Updated 2 years ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆87Updated 3 years ago
LAION-AI / Open-Instruction-Generalist
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
☆209Updated last year
google-research-datasets / presto
A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs
☆115Updated 2 years ago
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆96Updated 2 years ago
huggingface / bloom-jax-inference
☆66Updated 3 years ago
tomekkorbak / pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
☆180Updated last year
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆98Updated 2 years ago
Dahoas / reward-modeling
☆98Updated 2 years ago
LLM360 / amber-train
Pre-training code for Amber 7B LLM
☆169Updated last year
EleutherAI / DeeperSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
☆171Updated 2 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆127Updated 2 years ago
xrsrke / instructGOOSE
Implementation of Reinforcement Learning from Human Feedback (RLHF)
☆173Updated 2 years ago
orhonovich / unnatural-instructions
☆180Updated 2 years ago
zsc / llama_infer
Inference script for Meta's LLaMA models using Hugging Face wrapper
☆110Updated 2 years ago
CarperAI / cheese
Used for adaptive human in the loop evaluation of language and embedding models.
☆308Updated 2 years ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
tianjunz / HIR
☆159Updated 2 years ago
sanjeevanahilan / nanoChatGPT
A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trick
☆293Updated 2 years ago
huggingface / datablations
Scaling Data-Constrained Language Models
☆342Updated 5 months ago
HazyResearch / TART
TART: A plug-and-play Transformer module for task-agnostic reasoning
☆201Updated 2 years ago
hpcaitech / PaLM-colossalai
Scalable PaLM implementation of PyTorch
☆189Updated 2 years ago
catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆112Updated last month