sradc / pretraining-BERT
Pre-train BERT from scratch, with HuggingFace. Accompanies the blog post: sidsite.com/posts/bert-from-scratch
☆40Updated last year
Alternatives and similar repositories for pretraining-BERT:
Users that are interested in pretraining-BERT are comparing it to the libraries listed below
- gzip Predicts Data-dependent Scaling Laws☆34Updated 10 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆48Updated last week
- QLoRA with Enhanced Multi GPU Support☆36Updated last year
- ☆60Updated last year
- ☆22Updated last year
- Your favourite classical machine learning algos on the GPU/TPU☆20Updated 2 months ago
- ML/DL Math and Method notes☆59Updated last year
- PyTorch implementation for MRL☆18Updated last year
- A case study of efficient training of large language models using commodity hardware.☆69Updated 2 years ago
- ☆92Updated last year
- ☆20Updated 11 months ago
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…☆42Updated last year
- ☆28Updated last month
- Implementation of GateLoop Transformer in Pytorch and Jax☆87Updated 9 months ago
- Jax like function transformation engine but micro, microjax☆30Updated 5 months ago
- Toy genetic algorithm in Pytorch☆33Updated this week
- Collection of autoregressive model implementation☆83Updated last month
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆87Updated last year
- A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)☆37Updated 2 years ago
- Supercharge huggingface transformers with model parallelism.☆76Updated 5 months ago
- An introduction to LLM Sampling☆77Updated 3 months ago
- ☆47Updated 4 months ago
- Various transformers for FSDP research☆37Updated 2 years ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆98Updated 3 months ago
- ☆79Updated 11 months ago
- Simplified implementation of UMAP like dimensionality reduction algorithm☆48Updated 4 months ago
- Functional local implementations of main model parallelism approaches☆95Updated 2 years ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆108Updated 3 months ago
- ☆27Updated 8 months ago