knotgrass / How-Transformers-WorkLinks

🧠 A study guide to learn about Transformers

☆12

Alternatives and similar repositories for How-Transformers-Work

Users that are interested in How-Transformers-Work are comparing it to the libraries listed below

Sorting:

coaxsoft / pytorch_bert
Tutorial for how to build BERT from scratch
☆100Updated last year
lucasdelimanogueira / PyNorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
☆161Updated last week
hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆361Updated 2 years ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆215Updated 8 months ago
bkitano / llama-from-scratch
Llama from scratch, or How to implement a paper without crying
☆581Updated last year
jsbaan / transformer-from-scratch
Well documented, unit tested, type checked and formatted implementation of a vanilla transformer - for educational purposes.
☆271Updated last year
NVIDIA / logits-processor-zoo
A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.
☆375Updated 5 months ago
pacman100 / LLM-Workshop
LLM Workshop by Sourab Mangrulkar
☆397Updated last year
aju22 / LLaMA2
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…
☆74Updated 2 years ago
hkproj / pytorch-llama-notes
Notes about LLaMA 2 model
☆71Updated 2 years ago
huggingface / picotron_tutorial
☆224Updated last week
1y33 / 100Days
GPU Kernels
☆209Updated 7 months ago
hkproj / pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
☆88Updated last year
ChanCheeKean / DataScience
☆81Updated last year
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆215Updated last year
arcee-ai / EvolKit
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…
☆243Updated last year
hkproj / rlhf-ppo
Notes and commented code for RLHF (PPO)
☆118Updated last year
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 6 months ago
cmu-l3 / anlp-spring2025-code
Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/
☆68Updated 8 months ago
LLM360 / amber-train
Pre-training code for Amber 7B LLM
☆169Updated last year
tcapelle / llm_recipes
A set of scripts and notebooks on LLM finetunning and dataset creation
☆112Updated last year
hkproj / triton-flash-attention
☆222Updated 11 months ago
hkproj / bert-from-scratch
BERT explained from scratch
☆16Updated 2 years ago
llm-efficiency-challenge / neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
☆258Updated 2 years ago
ymoslem / Adaptive-MT-LLM-Fine-tuning
Fine-tuning Open-Source LLMs for Adaptive Machine Translation
☆89Updated 4 months ago
bigscience-workshop / data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
☆317Updated 2 years ago
predibase / llm_distillation_playbook
Best practices for distilling large language models.
☆591Updated last year
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆482Updated this week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆313Updated last month