hkproj / bert-from-scratch
BERT explained from scratch
β13Updated last year
Alternatives and similar repositories for bert-from-scratch:
Users that are interested in bert-from-scratch are comparing it to the libraries listed below
- Complete implementation of Llama2 with/without KV cache & inference πβ47Updated 10 months ago
- β82Updated 6 months ago
- Distributed training (multi-node) of a Transformer modelβ62Updated 11 months ago
- Prune transformer layersβ68Updated 9 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creationβ105Updated 6 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679β44Updated 6 months ago
- Notes on quantization in neural networksβ77Updated last year
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)β¦β64Updated last year
- Mixed precision training from scratch with Tensors and CUDAβ21Updated 10 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β35Updated 10 months ago
- β125Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β49Updated 8 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayβ255Updated last year
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".β13Updated 6 months ago
- Code for studying the super weight in LLMβ94Updated 3 months ago
- LLM_library is a comprehensive repository serves as a one-stop resource hands-on code, insightful summaries.β69Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofβ¦β122Updated 7 months ago
- Starter pack for NeurIPS LLM Efficiency Challenge 2023.β124Updated last year
- β21Updated 5 months ago
- An extension of the nanoGPT repository for training small MOE models.β106Updated 2 weeks ago
- β138Updated 2 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"β116Updated 9 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performanceβ¦β148Updated 2 months ago
- Notes and commented code for RLHF (PPO)β77Updated last year
- End-to-End LLM Guideβ104Updated 8 months ago
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultinβ¦β23Updated last year
- Set of scripts to finetune LLMsβ37Updated 11 months ago
- β158Updated last month
- Open Implementations of LLM Analysesβ103Updated 5 months ago
- Notes about LLaMA 2 modelβ55Updated last year