Pre-train BERT from scratch, with HuggingFace. Accompanies the blog post: sidsite.com/posts/bert-from-scratch
☆43May 20, 2025Updated last year
Alternatives and similar repositories for pretraining-BERT
Users that are interested in pretraining-BERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A single-line modification to any (dualizer-based) optimizer that allows the optimizer to adapt to the scale of the gradients as they cha…☆19Jan 11, 2025Updated last year
- ☆14Dec 2, 2024Updated last year
- ☆34Apr 23, 2023Updated 3 years ago
- ☆24Sep 3, 2024Updated last year
- Implementation of VQ-VAE with a GPT-style sampler in the JAX and Haiku ecosystem.☆11Nov 23, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A collection of various discourse segmenters☆10Jun 30, 2017Updated 8 years ago
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 9 months ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆11Dec 30, 2024Updated last year
- ☆10Aug 26, 2022Updated 3 years ago
- JAX implementation of the T5 model: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer☆24Jun 10, 2023Updated 2 years ago
- Jupyter notebooks from our weekly (or so) hackathons☆11Dec 3, 2024Updated last year
- Source code repository for our EMNLP paper on cross-domain claim identification☆14Oct 24, 2018Updated 7 years ago
- ☆36Aug 23, 2023Updated 2 years ago
- Patch for MPT-7B which allows using and training a LoRA☆58May 20, 2023Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Code for Semi-crowdsourced Clustering with Deep Generative Models☆12Dec 9, 2022Updated 3 years ago
- Code accompanying VarGrad: A Low-Variance Gradient Estimator for Variational Inference☆12Oct 12, 2020Updated 5 years ago
- A large-image collection explorer and fast classification tool☆24Jul 12, 2022Updated 3 years ago
- A more modern pytest☆34May 18, 2026Updated last week
- Arduino Code to control the LED strip on the NUC 11 extreme☆11Jul 2, 2023Updated 2 years ago
- ☆11Oct 21, 2017Updated 8 years ago
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆35Sep 12, 2025Updated 8 months ago
- ☆11Jul 25, 2021Updated 4 years ago
- EdX course from MIT on machine learning 6.86x☆11Dec 16, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Posterior with interesting shapes from actually used models☆13Feb 10, 2025Updated last year
- PyTorch utilities for ML, specifically speech☆13Jan 30, 2024Updated 2 years ago
- ☆10Dec 4, 2018Updated 7 years ago
- Scalable In-Memory Acceleration With Mesh: Device, Circuits, Architecture, and Algorithm☆16Oct 11, 2020Updated 5 years ago
- An attempt to create a free PROFINET daemon☆15Oct 24, 2018Updated 7 years ago
- Userspace USB driver for CAN to USB adapters - based on the Kvaser canlib API☆19Feb 22, 2020Updated 6 years ago
- ☆19Jun 10, 2024Updated last year
- Mini Model Daemon☆13Nov 9, 2024Updated last year
- Unofficial Implementation of Selective Attention Transformer☆20Oct 31, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- code supplement for variational boosting (https://arxiv.org/abs/1611.06585)☆11Jul 24, 2017Updated 8 years ago
- Approximate Bayesian Inference Toolkit (Python, C++)☆14Apr 16, 2014Updated 12 years ago
- Specialization offered by Imperial College London at Coursera☆13Jan 5, 2022Updated 4 years ago
- Development of High-Throughput Polymer Network Atomistic Simulation☆27May 18, 2026Updated last week
- An implementation of Squared Earth-Mover's Distance loss for Neural Networks.☆14Mar 25, 2023Updated 3 years ago
- ☆28Apr 14, 2024Updated 2 years ago
- Transformer from Scratch in PyTorch☆17Mar 26, 2022Updated 4 years ago