Montinger / Transformer-Workbench
Playground for Transformers
☆50Updated last year
Alternatives and similar repositories for Transformer-Workbench
Users that are interested in Transformer-Workbench are comparing it to the libraries listed below
Sorting:
- several types of attention modules written in PyTorch for learning purposes☆52Updated 7 months ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 5 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 7 months ago
- Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…☆82Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated 3 weeks ago
- ☆47Updated 8 months ago
- PyTorch implementation of moe, which stands for mixture of experts☆43Updated 4 years ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆154Updated last month
- minimal GRPO implementation from scratch☆90Updated 2 months ago
- Set of scripts to finetune LLMs☆37Updated last year
- This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).☆11Updated 11 months ago
- ☆42Updated last year
- ☆20Updated 3 years ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated last year
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆44Updated 8 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆29Updated 2 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated last year
- minimal LLM scripts for 24GB VRAM GPUs. training, inference, whatever☆38Updated last month
- This is the code that went into our practical dive using mamba as information extraction☆54Updated last year
- a curated list of the role of small models in the LLM era☆100Updated 7 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆91Updated this week
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆65Updated last year
- Verifiers for LLM Reinforcement Learning☆50Updated last month
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated 11 months ago
- Experiments on Multi-Head Latent Attention☆89Updated 9 months ago
- PyTorch implementation of Retentive Network: A Successor to Transformer for Large Language Models☆14Updated last year
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆164Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆118Updated 7 months ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆34Updated 6 months ago
- ☆11Updated 7 months ago