kuprel / minbpe-pytorch
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization, with PyTorch/CUDA
☆36Updated last year
Alternatives and similar repositories for minbpe-pytorch
Users that are interested in minbpe-pytorch are comparing it to the libraries listed below
Sorting:
- [WIP] Transformer to embed Danbooru labelsets☆13Updated last year
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- Latent Large Language Models☆18Updated 8 months ago
- Full finetuning of large language models without large memory requirements☆94Updated last year
- GPU accelerated client-side embeddings for vector search, RAG etc.☆66Updated last year
- Training code for Sparse Autoencoders on Embedding models☆38Updated 2 months ago
- ☆22Updated 11 months ago
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆45Updated last year
- RWKV-7: Surpassing GPT☆84Updated 5 months ago
- A collection of optimizers for MLX☆35Updated this week
- ☆63Updated 7 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- Because it's there.☆16Updated 7 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆18Updated 2 years ago
- ☆38Updated 9 months ago
- Lightweight tools for quick and easy LLM demo's☆26Updated 7 months ago
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- Command-line script for inferencing from models such as MPT-7B-Chat☆101Updated last year
- Completion After Prompt Probability. Make your LLM make a choice☆76Updated 6 months ago
- assign color hues to a collection of text fragments based on embeddings☆20Updated 10 months ago
- NLP with Rust for Python 🦀🐍☆62Updated 11 months ago
- Turing machines, Rule 110, and A::B reversal using Claude 3 Opus.☆59Updated 11 months ago
- utilities for loading and running text embeddings with onnx☆44Updated 9 months ago
- A library for incremental loading of large PyTorch checkpoints☆56Updated 2 years ago
- ☆35Updated 2 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated 2 years ago
- ☆33Updated 10 months ago
- An introduction to LLM Sampling☆78Updated 4 months ago