hkproj / bert-from-scratch
BERT explained from scratch
β12Updated last year
Alternatives and similar repositories for bert-from-scratch:
Users that are interested in bert-from-scratch are comparing it to the libraries listed below
- Complete implementation of Llama2 with/without KV cache & inference πβ47Updated 11 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creationβ106Updated 6 months ago
- Prune transformer layersβ68Updated 10 months ago
- Notes about LLaMA 2 modelβ59Updated last year
- Set of scripts to finetune LLMsβ37Updated last year
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultinβ¦β23Updated last year
- This is the code that went into our practical dive using mamba as information extractionβ54Updated last year
- Notes on Direct Preference Optimizationβ19Updated last year
- β40Updated 11 months ago
- β85Updated 7 months ago
- Distributed training (multi-node) of a Transformer modelβ64Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β49Updated 9 months ago
- minimal GRPO implementation from scratchβ85Updated last month
- β17Updated 3 months ago
- An extension of the nanoGPT repository for training small MOE models.β131Updated last month
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β36Updated 11 months ago
- β43Updated this week
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β74Updated 6 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)β¦β64Updated last year
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorchβ101Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β286Updated last week
- Collection of autoregressive model implementationβ85Updated 2 months ago
- β157Updated 3 months ago
- An introduction to LLM Samplingβ77Updated 4 months ago
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuningβ46Updated last year
- The official evaluation suite and dynamic data release for MixEval.β235Updated 5 months ago
- Code for studying the super weight in LLMβ98Updated 4 months ago
- ML/DL Math and Method notesβ60Updated last year
- Reference implementation of Mistral AI 7B v0.1 model.β28Updated last year
- Code for NeurIPS LLM Efficiency Challengeβ57Updated last year