ad8e / TinyStories-cleaner
Remove generated stories with stray unicode characters
☆13Updated last year
Alternatives and similar repositories for TinyStories-cleaner:
Users that are interested in TinyStories-cleaner are comparing it to the libraries listed below
- MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…☆164Updated this week
- Latent Diffusion Language Models☆68Updated last year
- Focused on fast experimentation and simplicity☆65Updated last month
- ☆49Updated 11 months ago
- Collection of autoregressive model implementation☆81Updated this week
- Modeling code for a BitNet b1.58 Llama-style model.☆23Updated 9 months ago
- supporting pytorch FSDP for optimizers☆76Updated 2 months ago
- ☆21Updated 3 months ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆73Updated 6 months ago
- Token Omission Via Attention☆123Updated 4 months ago
- ☆19Updated 4 months ago
- If it quacks like a tensor...☆56Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆95Updated 3 months ago
- realtime latent world model inference demo☆39Updated 3 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆129Updated this week
- Evaluating the Mamba architecture on the Othello game☆44Updated 9 months ago
- ☆24Updated 2 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆122Updated 10 months ago
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆82Updated 7 months ago
- ☆17Updated 4 months ago
- JAX implementation of the Llama 2 model☆215Updated last year
- llm sampler that only allows words that are in the bible☆24Updated 2 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆28Updated last week
- Mixture of A Million Experts☆39Updated 6 months ago
- Efficient optimizers☆169Updated this week
- Full finetuning of large language models without large memory requirements☆93Updated last year
- WIP☆93Updated 6 months ago
- seqax = sequence modeling + JAX☆143Updated 7 months ago