Montinger / Transformer-WorkbenchLinks
Playground for Transformers
☆51Updated last year
Alternatives and similar repositories for Transformer-Workbench
Users that are interested in Transformer-Workbench are comparing it to the libraries listed below
Sorting:
- several types of attention modules written in PyTorch for learning purposes☆52Updated 8 months ago
- PyTorch implementation of moe, which stands for mixture of experts☆45Updated 4 years ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆98Updated 8 months ago
- ☆47Updated 9 months ago
- LoRA and DoRA from Scratch Implementations☆204Updated last year
- My fork os allen AI's OLMo for educational purposes.☆30Updated 6 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆177Updated 9 months ago
- Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…☆83Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated 3 weeks ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆57Updated last year
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆29Updated 4 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆73Updated last year
- Fast instruction tuning with Llama2☆11Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆95Updated 2 weeks ago
- Experiments on Multi-Head Latent Attention☆92Updated 10 months ago
- 📚 Text Classification with LoRA (Low-Rank Adaptation) of Language Models - Efficiently fine-tune large language models for text classifi…☆48Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 5 months ago
- FuseAI Project☆87Updated 5 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆45Updated 9 months ago
- ☆42Updated last year
- Training and Fine-tuning an llm in Python and PyTorch.☆42Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆158Updated 2 months ago
- minimal scripts for 24GB VRAM GPUs. training, inference, whatever☆40Updated last week
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆119Updated 8 months ago
- Unofficial implementation of AlpaGasus☆91Updated last year
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆90Updated last year
- Outlining techniques for improving the training performance of your PyTorch model without compromising its accuracy☆128Updated 2 years ago
- This is the official repository for Inheritune.☆111Updated 4 months ago
- Code for KaLM-Embedding models☆78Updated 3 months ago
- a curated list of the role of small models in the LLM era☆101Updated 9 months ago