coaxsoft / pytorch_bert
Tutorial for how to build BERT from scratch
☆81Updated 3 months ago
Related projects: ⓘ
- LoRA and DoRA from Scratch Implementations☆179Updated 6 months ago
- LLaMA 2 implemented from scratch in PyTorch☆216Updated 11 months ago
- Early solution for Google AI4Code competition☆75Updated 2 years ago
- An open collection of implementation tips, tricks and resources for training large language models☆455Updated last year
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆72Updated last year
- Efficient Attention for Long Sequence Processing☆84Updated 9 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆248Updated 10 months ago
- Code implementation from my blog post: https://fkodom.substack.com/p/transformers-from-scratch-in-pytorch☆90Updated last year
- Well documented, unit tested, type checked and formatted implementation of a vanilla transformer - for educational purposes.☆211Updated 5 months ago
- A Simplified PyTorch Implementation of Vision Transformer (ViT)☆123Updated 3 months ago
- Prune transformer layers☆60Updated 3 months ago
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆84Updated last year
- Recurrent Memory Transformer☆148Updated last year
- A set of scripts and notebooks on LLM finetunning and dataset creation☆89Updated last week
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆161Updated last week
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆115Updated last year
- 🧠 A study guide to learn about Transformers☆10Updated 8 months ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆278Updated 3 months ago
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆222Updated 2 weeks ago
- Some notebooks for NLP☆186Updated 10 months ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆89Updated last year
- Fine tune a T5 transformer model using PyTorch & Transformers🤗☆192Updated 3 years ago
- Sequence modeling with Mega.☆296Updated last year
- Distributed training (multi-node) of a Transformer model☆36Updated 5 months ago
- Define Transformers, T5 model and RoBERTa Encoder decoder model for product names generation☆47Updated 2 years ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆47Updated 11 months ago
- a simplified version of Meta's Llama 3 model to be used for learning☆26Updated 3 months ago
- A simple and working implementation of Electra, the fastest way to pretrain language models from scratch, in Pytorch☆222Updated last year
- LLM Workshop by Sourab Mangrulkar☆322Updated 3 months ago
- Implementation of the first paper on word2vec☆196Updated 2 years ago