kanpuriyanawab / minbpe.c
View external linksLinks

a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.

☆23

Alternatives and similar repositories for minbpe.c

Users that are interested in minbpe.c are comparing it to the libraries listed below

Sorting:

pchizhov / picky_bpe
View on GitHub
BPE modification that implements removing of the intermediate tokens during tokenizer training.
☆26Nov 25, 2024Updated last year
pravinmishraaws / book-review-app
View on GitHub
Book Review App
☆49May 14, 2025Updated 9 months ago
flowersteam / vivarium
View on GitHub
Multi-agent simulator in Jax for research and teaching in AI & ALife
☆29Updated this week
marepilc / pink-parquet
View on GitHub
User-friendly viewer for Parquet files
☆10Jan 10, 2026Updated last month
skfairchild / MathData-Winter22-23
View on GitHub
Mathematical foundations of data analysis, Winter semester 22-23
☆13Jan 31, 2023Updated 3 years ago
VedantJoshi1409 / stockfish_nnue_probe
View on GitHub
A library for probing Stockfish's NNUEs. The code for reading parameters and forward propagation is taken from Stockfish
☆12Nov 18, 2025Updated 2 months ago
KenrickLance / BalitaNLP-Dataset
View on GitHub
Filipino multi-modal NLP dataset. Consists of 350k+ Filipino news articles and associated images
☆12Mar 11, 2025Updated 11 months ago
bigcode-project / bigcode-tokenizer
View on GitHub
☆15Oct 24, 2023Updated 2 years ago
anirudhsudhir / snoopy
View on GitHub
A VPN written in Rust
☆13Apr 17, 2025Updated 10 months ago
JingzheSun / Recommender_System_Adaptive_KNN
View on GitHub
BPR recommender system
☆10Apr 14, 2018Updated 7 years ago
karlstratos / ammi
View on GitHub
☆11Jul 15, 2020Updated 5 years ago
bmschmidt / pySRP
View on GitHub
Python Module implementing SRP
☆12Jul 29, 2022Updated 3 years ago
apavlo / hash-function-benchmark
View on GitHub
Benchmark of common hash functions
☆10Sep 15, 2019Updated 6 years ago
KempnerInstitute / chess-research
View on GitHub
☆11Jun 17, 2024Updated last year
shenxudeu / Convnet
View on GitHub
Python - Numpy Convolutional Neural Network
☆10Jun 12, 2015Updated 10 years ago
nitzan-treg / Polar_Trail
View on GitHub
Daily render shared with the community
☆12Jul 11, 2022Updated 3 years ago
KunstDerFuge / Q-notebook
View on GitHub
☆14Jul 26, 2021Updated 4 years ago
shauli-ravfogel / conformal-prediction
View on GitHub
☆10Feb 2, 2023Updated 3 years ago
virajitgp / NAND-2-FPGA
View on GitHub
Building a Computer From Scratch with verilog
☆11Feb 6, 2026Updated last week
helblazer811 / UnderstandingIsomap
View on GitHub
Interactive Article Explaining Isomap
☆44Jan 6, 2026Updated last month
yogeshhk / BharatVidya
View on GitHub
Repository for course material for Indian Knowledge System (IKS)
☆13Jan 20, 2026Updated 3 weeks ago
dkappe / badgyal
View on GitHub
Simple pytorch net evaluator with Bad Gyal 8 and Mean Girl 8 net included.
☆10Nov 23, 2020Updated 5 years ago
sovit-123 / attention_is_all_you_need
View on GitHub
Implementation of language model papers along with several examples [NOT ALL WRITTEN FROM SCRATCH].
☆12Oct 2, 2024Updated last year
victor-explore / Deep-Learning-Lecture-Notes-IISC-Banglore
View on GitHub
Notes of ADRL course taught at IISC as part of MTech AI curriculum
☆13Nov 30, 2024Updated last year
MiuLab / GenDef
View on GitHub
Probing task; contextual embeddings -> textual definitions (EMNLP19)
☆11Apr 22, 2021Updated 4 years ago
ritikamangla / QSalience
View on GitHub
https://arxiv.org/abs/2404.10917
☆14Mar 18, 2025Updated 10 months ago
Jhaprince / MultiBully
View on GitHub
☆17Oct 2, 2024Updated last year
forgi86 / lru-reduction
View on GitHub
Python code of the paper Model order reduction of deep structured state-space models: A system-theoretic approach
☆14Nov 22, 2024Updated last year
jtoleary / SPINN
View on GitHub
Stochastic Physics-Informed Neural Networks: A Moment-Matching Framework for Learning Hidden Physics within Stochastic Differential Equat…
☆14Dec 21, 2021Updated 4 years ago
jalexine / jalexine.github.io
View on GitHub
my beautiful page <3
☆21Jan 20, 2026Updated 3 weeks ago
alisawuffles / tokenizer-attack
View on GitHub
Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"
☆18May 15, 2025Updated 9 months ago
huggingface / ember
View on GitHub
ANE accelerated embedding models!
☆20Dec 11, 2024Updated last year
qiliuchn / gatsim
View on GitHub
GATSim: Generative-Agent Transport Simulation
☆23Feb 6, 2026Updated last week
victoriano / notion_data
View on GitHub
Convert Notion Databases into CSV or Pandas Dataframes for analytics and other useful functions
☆17Feb 2, 2022Updated 4 years ago
00cpxxx / proxylite
View on GitHub
Simple, single-thread, non-caching, lightweight HTTP proxy server
☆18Jul 11, 2022Updated 3 years ago
nateraw / encoded-video
View on GitHub
Utilities for working with videos
☆13Jul 5, 2025Updated 7 months ago
kensho-technologies / pathpiece
View on GitHub
PathPiece tokenizer
☆13Nov 10, 2024Updated last year
RCIITG / Raman-TheVisionBot
View on GitHub
Humaniod Robot with abilities of foveated vision and object recognition
☆14Aug 26, 2023Updated 2 years ago
viai957 / llama-inference
View on GitHub
A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that inc…
☆13May 6, 2024Updated last year

kanpuriyanawab / minbpe.cView external linksLinks

Alternatives and similar repositories for minbpe.c

kanpuriyanawab / minbpe.c
View external linksLinks