Nano-BERT is a straightforward, lightweight and comprehensible custom implementation of BERT, inspired by the foundational "Attention is All You Need" paper. The primary objective of this project is to distill the essence of transformers by simplifying the complexities and unnecessary details.
☆20Oct 19, 2023Updated 2 years ago
Alternatives and similar repositories for nano-BERT
Users that are interested in nano-BERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Use pretrained BERT model to automatically generate grammar multiple choice questions (MCQ) from any news article or story.☆13Oct 2, 2019Updated 6 years ago
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆11Jul 27, 2024Updated last year
- A platform for Interactive AI-assisted Hypothesis Generation [ACL 2025]☆28Aug 18, 2025Updated 7 months ago
- (NBCE)Naive Bayes-based Context Extension on ChatGLM-6b☆15Jun 7, 2023Updated 2 years ago
- ☆13May 7, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 微调阿里开源的文字检测模型,利用合合识别返回的OCR结果作为初始训练数据,对模型进行优化训练,使其更加适应1万张图片的具体场景,提高文字识别的精度。☆10Dec 9, 2024Updated last year
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆15Sep 6, 2024Updated last year
- [ICML'25] The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products☆18Jul 16, 2025Updated 8 months ago
- The Polaris datasets and benchmarks recipes☆13May 26, 2025Updated 10 months ago
- Java port of wolfgarbe/PruningRadixTrie☆16Jun 29, 2021Updated 4 years ago
- Collaborative inference of latent diffusion via hivemind☆12May 29, 2023Updated 2 years ago
- ☆11Oct 15, 2023Updated 2 years ago
- A Java JNI wrapper for KenLM: Faster and Smaller Language Model Queries☆14Oct 25, 2020Updated 5 years ago
- This repository provides scripts to train an LSTM and then extract states from it in Tensorflow.☆19Nov 20, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- An implementation of the Equivariant Graph Neural Network (EGNN) layer type for DGL-PyTorch.☆15Dec 27, 2022Updated 3 years ago
- Word Embeddings for Low Resource Languages: The Case of Buryat☆10Mar 12, 2025Updated last year
- fast trainer for educational purposes☆24Updated this week
- SIMD instructions for faster distance calculations.☆25Oct 13, 2025Updated 5 months ago
- Minimalistic, hackable PyTorch implementation of SimSiam in ~400 lines. Achieves good performance on ImageNet with ResNet50. Features dis…☆21Nov 25, 2024Updated last year
- Building Blocks for Equivariant Neural Networks in e3nn and PyTorch 2.0☆19Nov 16, 2025Updated 4 months ago
- RND1: Scaling Diffusion Language Models☆176Feb 22, 2026Updated last month
- ⛰️ PrexSyn: Efficient and Programmable Exploration of Synthesizable Chemical Space☆43Updated this week
- Code for the paper "Secure Distributed Training at Scale" (ICML 2022)☆16Feb 4, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Minutes GPT is a GPT tool that helps you quickly turn meeting recordings into minutes. Minutes GPT 是一个帮助你快速将会议录音转化为会议纪要的 GPT 工具☆17Nov 20, 2023Updated 2 years ago
- ☆14Jul 24, 2025Updated 8 months ago
- [ICML 2025] Repurposing pre-trained score-based generative models for transition path sampling by minimizing the Onsager-Machlup (OM) act…☆27Mar 20, 2026Updated last week
- Russian dialog datasets parsers and crawlers.☆15Sep 6, 2021Updated 4 years ago
- Jax / Haiku implementation of DimeNet++.☆18Mar 31, 2022Updated 3 years ago
- CRF(Conditional Random Field) Layer for TensorFlow 1.X with many powerful functions☆15Jan 3, 2020Updated 6 years ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Oct 11, 2025Updated 5 months ago
- Zero Shot Molecular Generation via Similarity Kernels☆29Aug 27, 2025Updated 7 months ago
- MLX implementation of Meta's ESM-1 protein language model☆21Apr 17, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A corpus of speech from the Joe Rogan Experience podcast, consisting of 8.43 million words. It includes aligned TextGrids with phonetic a…☆21Jan 26, 2020Updated 6 years ago
- Improving Neural Text Generation with Reinforcement Learning☆23Jan 13, 2021Updated 5 years ago
- ☆30Mar 20, 2024Updated 2 years ago
- Real time monitor for snakemake☆17Mar 19, 2026Updated last week
- This repository contains the official implementation of the research paper: "Towards Training Large-Scale Pathology Foundation Models: fr…☆38Jan 17, 2025Updated last year
- Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding (Findings of EMNLP'23)☆11Aug 24, 2024Updated last year
- The source code used for paper "Unsupervised Key Event Detection from Massive Text Corpora", published in KDD 2022.☆22Jul 15, 2023Updated 2 years ago