hkproj / bert-from-scratchLinks

BERT explained from scratch

☆16

Alternatives and similar repositories for bert-from-scratch

Users that are interested in bert-from-scratch are comparing it to the libraries listed below

Sorting:

ThinamXx / Meta-llama
Complete implementation of Llama2 with/without KV cache & inference 🚀
☆48Updated last year
neubig / minllama-assignment
☆99Updated last year
hkproj / pytorch-lora
LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch
☆117Updated 2 years ago
hkproj / multi-latent-attention
☆45Updated 6 months ago
hkproj / pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
☆87Updated last year
tcapelle / llm_recipes
A set of scripts and notebooks on LLM finetunning and dataset creation
☆111Updated last year
alperiox / Compact-Language-Models-via-Pruning-and-Knowledge-Distillation
Unofficial implementation of https://arxiv.org/pdf/2407.14679
☆51Updated last year
melisa-writer / short-transformers
Prune transformer layers
☆74Updated last year
ayulockin / neurips-llm-efficiency-challenge
Starter pack for NeurIPS LLM Efficiency Challenge 2023.
☆128Updated 2 years ago
aju22 / LLaMA2
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…
☆74Updated 2 years ago
hkproj / quantization-notes
Notes on quantization in neural networks
☆109Updated last year
hkproj / pytorch-llama-notes
Notes about LLaMA 2 model
☆70Updated 2 years ago
hkproj / dpo-notes
Notes on Direct Preference Optimization
☆23Updated last year
pacman100 / LLM-Workshop
LLM Workshop by Sourab Mangrulkar
☆397Updated last year
hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆361Updated 2 years ago
llm-efficiency-challenge / neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
☆257Updated 2 years ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆215Updated 8 months ago
humza909 / LLM_Survey
☆86Updated last year
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆121Updated 11 months ago
ThinamXx / build-GPT
Building GPT ...
☆18Updated 11 months ago
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆347Updated 6 months ago
AviSoori1x / seemore
From scratch implementation of a vision language model in pure PyTorch
☆251Updated last year
FareedKhan-dev / Building-llama3-from-scratch
LLaMA 3 is one of the most promising open-source model after Mistral, we will recreate it's architecture in a simpler manner.
☆190Updated last year
hkproj / transformer-from-scratch-notes
Notes about "Attention is all you need" video (https://www.youtube.com/watch?v=bCz4OMemCcA)
☆327Updated 2 years ago
tspeterkim / mixed-precision-from-scratch
Mixed precision training from scratch with Tensors and CUDA
☆28Updated last year
sangmichaelxie / cs324_p2
Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)
☆105Updated 2 years ago
neubig / anlp-code
☆189Updated last year
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆215Updated last year
CASE-Lab-UMD / LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
☆181Updated 2 weeks ago
koayon / awesome-adaptive-computation
A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).
☆160Updated 10 months ago