shreyansh26 / LLM-SamplingLinks
A collection of various LLM sampling methods implemented in pure Pytorch
☆26Updated last year
Alternatives and similar repositories for LLM-Sampling
Users that are interested in LLM-Sampling are comparing it to the libraries listed below
Sorting:
- ☆48Updated last year
- ☆106Updated 8 months ago
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.☆62Updated 7 months ago
- Collection of autoregressive model implementation☆85Updated 3 weeks ago
- Official implementation of "GPT or BERT: why not both?"☆61Updated 6 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆135Updated 3 months ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆113Updated 3 months ago
- ☆82Updated last year
- ☆57Updated last month
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆179Updated last year
- Fast, Modern, and Low Precision PyTorch Optimizers☆124Updated last month
- QLoRA with Enhanced Multi GPU Support☆37Updated 2 years ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆102Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆41Updated last month
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated 2 years ago
- Code for Zero-Shot Tokenizer Transfer☆142Updated last year
- Code for NeurIPS LLM Efficiency Challenge☆60Updated last year
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆116Updated 2 years ago
- Let's build better datasets, together!☆269Updated last year
- ☆41Updated last year
- An introduction to LLM Sampling☆79Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆51Updated last year
- Supercharge huggingface transformers with model parallelism.☆78Updated 6 months ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆34Updated 10 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆76Updated 2 weeks ago
- Prune transformer layers☆74Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆92Updated last year
- We study toy models of skill learning.☆31Updated last week