gautierdag / bpeasyLinks
Fast bare-bones BPE for modern tokenizer training
☆159Updated 2 weeks ago
Alternatives and similar repositories for bpeasy
Users that are interested in bpeasy are comparing it to the libraries listed below
Sorting:
- A puzzle to learn about prompting☆130Updated 2 years ago
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆288Updated this week
- Understand and test language model architectures on synthetic tasks.☆219Updated last month
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆347Updated 11 months ago
- Website for hosting the Open Foundation Models Cheat Sheet.☆267Updated 2 months ago
- RuLES: a benchmark for evaluating rule-following in language models☆227Updated 4 months ago
- A comprehensive deep dive into the world of tokens☆224Updated last year
- JAX implementation of the Llama 2 model☆219Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆141Updated 2 weeks ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆256Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆262Updated last year
- ☆303Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆232Updated 8 months ago
- code for training & evaluating Contextual Document Embedding models☆194Updated last month
- A repository for research on medium sized language models.☆502Updated last month
- ☆92Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆190Updated last year
- Extract full next-token probabilities via language model APIs☆247Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆194Updated 11 months ago
- batched loras☆343Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆129Updated last year
- ☆259Updated this week
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆237Updated last month
- A MAD laboratory to improve AI architecture designs 🧪☆123Updated 6 months ago
- Implementation of the Llama architecture with RLHF + Q-learning☆165Updated 5 months ago
- ☆134Updated 3 months ago
- ☆273Updated 11 months ago
- Long context evaluation for large language models☆219Updated 4 months ago
- Simple Transformer in Jax☆138Updated last year
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year