kanpuriyanawab / minbpe.cView external linksLinks
a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.
☆23Jul 6, 2024Updated last year
Alternatives and similar repositories for minbpe.c
Users that are interested in minbpe.c are comparing it to the libraries listed below
Sorting:
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆26Nov 25, 2024Updated last year
- Book Review App☆49May 14, 2025Updated 9 months ago
- Multi-agent simulator in Jax for research and teaching in AI & ALife☆29Updated this week
- User-friendly viewer for Parquet files☆10Jan 10, 2026Updated last month
- Mathematical foundations of data analysis, Winter semester 22-23☆13Jan 31, 2023Updated 3 years ago
- A library for probing Stockfish's NNUEs. The code for reading parameters and forward propagation is taken from Stockfish☆12Nov 18, 2025Updated 2 months ago
- Filipino multi-modal NLP dataset. Consists of 350k+ Filipino news articles and associated images☆12Mar 11, 2025Updated 11 months ago
- ☆15Oct 24, 2023Updated 2 years ago
- A VPN written in Rust☆13Apr 17, 2025Updated 10 months ago
- BPR recommender system☆10Apr 14, 2018Updated 7 years ago
- ☆11Jul 15, 2020Updated 5 years ago
- Python Module implementing SRP☆12Jul 29, 2022Updated 3 years ago
- Benchmark of common hash functions☆10Sep 15, 2019Updated 6 years ago
- ☆11Jun 17, 2024Updated last year
- Python - Numpy Convolutional Neural Network☆10Jun 12, 2015Updated 10 years ago
- Daily render shared with the community☆12Jul 11, 2022Updated 3 years ago
- ☆14Jul 26, 2021Updated 4 years ago
- ☆10Feb 2, 2023Updated 3 years ago
- Building a Computer From Scratch with verilog☆11Feb 6, 2026Updated last week
- Interactive Article Explaining Isomap☆44Jan 6, 2026Updated last month
- Repository for course material for Indian Knowledge System (IKS)☆13Jan 20, 2026Updated 3 weeks ago
- Simple pytorch net evaluator with Bad Gyal 8 and Mean Girl 8 net included.☆10Nov 23, 2020Updated 5 years ago
- Implementation of language model papers along with several examples [NOT ALL WRITTEN FROM SCRATCH].☆12Oct 2, 2024Updated last year
- Notes of ADRL course taught at IISC as part of MTech AI curriculum☆13Nov 30, 2024Updated last year
- Probing task; contextual embeddings -> textual definitions (EMNLP19)☆11Apr 22, 2021Updated 4 years ago
- https://arxiv.org/abs/2404.10917☆14Mar 18, 2025Updated 10 months ago
- ☆17Oct 2, 2024Updated last year
- Python code of the paper Model order reduction of deep structured state-space models: A system-theoretic approach☆14Nov 22, 2024Updated last year
- Stochastic Physics-Informed Neural Networks: A Moment-Matching Framework for Learning Hidden Physics within Stochastic Differential Equat…☆14Dec 21, 2021Updated 4 years ago
- my beautiful page <3☆21Jan 20, 2026Updated 3 weeks ago
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated 9 months ago
- ANE accelerated embedding models!☆20Dec 11, 2024Updated last year
- GATSim: Generative-Agent Transport Simulation☆23Feb 6, 2026Updated last week
- Convert Notion Databases into CSV or Pandas Dataframes for analytics and other useful functions☆17Feb 2, 2022Updated 4 years ago
- Simple, single-thread, non-caching, lightweight HTTP proxy server☆18Jul 11, 2022Updated 3 years ago
- Utilities for working with videos☆13Jul 5, 2025Updated 7 months ago
- PathPiece tokenizer☆13Nov 10, 2024Updated last year
- Humaniod Robot with abilities of foveated vision and object recognition☆14Aug 26, 2023Updated 2 years ago
- A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that inc…☆13May 6, 2024Updated last year