euclaise / supertrainer2000
☆49Updated last year
Alternatives and similar repositories for supertrainer2000:
Users that are interested in supertrainer2000 are comparing it to the libraries listed below
- ☆53Updated last year
- [WIP] Transformer to embed Danbooru labelsets☆13Updated 11 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Collection of autoregressive model implementation☆83Updated last month
- QLoRA with Enhanced Multi GPU Support☆36Updated last year
- ☆76Updated 8 months ago
- ☆22Updated last year
- Code repository for the c-BTM paper☆106Updated last year
- supporting pytorch FSDP for optimizers☆79Updated 3 months ago
- Full finetuning of large language models without large memory requirements☆93Updated last year
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆87Updated 8 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- ☆60Updated last year
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year
- ☆79Updated 11 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆100Updated 4 months ago
- RWKV-7: Surpassing GPT☆82Updated 4 months ago
- Utilities for Training Very Large Models☆58Updated 5 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated this week
- ☆48Updated 4 months ago
- Experiments for efforts to train a new and improved t5☆77Updated 11 months ago
- An introduction to LLM Sampling☆77Updated 3 months ago
- ☆20Updated last year
- ☆47Updated 6 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆58Updated last month
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆115Updated 2 years ago
- ☆31Updated 9 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆137Updated last month
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆123Updated 3 months ago