hkproj / bert-from-scratch
BERT explained from scratch
β12Updated last year
Related projects β
Alternatives and complementary repositories for bert-from-scratch
- Complete implementation of Llama2 with/without KV cache & inference πβ47Updated 5 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creationβ92Updated last month
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.β116Updated last year
- Notes on quantization in neural networksβ58Updated 10 months ago
- Reference implementation of Mistral AI 7B v0.1 model.β27Updated 10 months ago
- RAGs: Simple implementations of Retrieval Augmented Generation (RAG) Systemsβ83Updated 7 months ago
- End-to-End LLM Guideβ97Updated 4 months ago
- Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)β101Updated last year
- Distributed training (multi-node) of a Transformer modelβ42Updated 7 months ago
- Prune transformer layersβ64Updated 5 months ago
- From scratch implementation of a vision language model in pure PyTorchβ160Updated 6 months ago
- Notes about "Attention is all you need" video (https://www.youtube.com/watch?v=bCz4OMemCcA)β220Updated last year
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultinβ¦β23Updated 10 months ago
- This playlab encompasses a multitude of projects crafted through the utilization of Large Language Models, showcasing the versatility andβ¦β74Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β48Updated 4 months ago
- Toolkit for attaching, training, saving and loading of new heads for transformer modelsβ242Updated last week
- LoRA and DoRA from Scratch Implementationsβ188Updated 8 months ago
- Notes about LLaMA 2 modelβ47Updated last year
- Building a 2.3M-parameter LLM from scratch with LLaMA 1 architecture.β111Updated 6 months ago
- β80Updated 11 months ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)β113Updated 5 months ago
- β35Updated 5 months ago
- Direct Preference Optimization Implementationβ14Updated 9 months ago
- β150Updated 9 months ago
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.β64Updated 2 months ago
- Building GPT ...β17Updated 2 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ236Updated 4 months ago
- ML/DL Math and Method notesβ57Updated 11 months ago
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 linesβ194Updated 6 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayβ251Updated last year