knotgrass / How-Transformers-Work
π§ A study guide to learn about Transformers
β11Updated last year
Alternatives and similar repositories for How-Transformers-Work:
Users that are interested in How-Transformers-Work are comparing it to the libraries listed below
- Tutorial for how to build BERT from scratchβ92Updated 11 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β213Updated 5 months ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedbackβ94Updated last year
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)β¦β64Updated last year
- Fine-tuning Open-Source LLMs for Adaptive Machine Translationβ77Updated 2 weeks ago
- A set of scripts and notebooks on LLM finetunning and dataset creationβ106Updated 6 months ago
- β155Updated 3 months ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)β148Updated 10 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ231Updated 5 months ago
- Scripts for fine-tuning Llama2 via SFT and DPO.β197Updated last year
- Implementation of paper Data Engineering for Scaling Language Models to 128K Contextβ459Updated last year
- A Multilingual Replicable Instruction-Following Modelβ93Updated last year
- Well documented, unit tested, type checked and formatted implementation of a vanilla transformer - for educational purposes.β242Updated last year
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultinβ¦β23Updated last year
- β85Updated 7 months ago
- Notes and commented code for RLHF (PPO)β86Updated last year
- LoRA and DoRA from Scratch Implementationsβ202Updated last year
- Official PyTorch implementation of QA-LoRAβ131Updated last year
- A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.β267Updated last month
- Notes about LLaMA 2 modelβ59Updated last year
- LLaMA 2 implemented from scratch in PyTorchβ322Updated last year
- Simple implementation of Speculative Sampling in NumPy for GPT-2.β93Updated last year
- Distributed training (multi-node) of a Transformer modelβ64Updated last year
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budgetβ146Updated last year
- Notes on quantization in neural networksβ79Updated last year
- Efficient Attention for Long Sequence Processingβ93Updated last year
- β82Updated last year
- Comparing the Performance of LLMs: A Deep Dive into Roberta, Llama, and Mistral for Disaster Tweets Analysis with Loraβ50Updated last year
- Prune transformer layersβ68Updated 10 months ago
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog poβ¦β91Updated last year