Montinger / Transformer-WorkbenchLinks

Playground for Transformers

☆51

Alternatives and similar repositories for Transformer-Workbench

Users that are interested in Transformer-Workbench are comparing it to the libraries listed below

Sorting:

knotgrass / attention
several types of attention modules written in PyTorch for learning purposes
☆54Updated 9 months ago
YeonwooSung / Pytorch_mixture-of-experts
PyTorch implementation of moe, which stands for mixture of experts
☆45Updated 4 years ago
The-AI-Summer / pytorch-ddp
code for the ddp tutorial
☆32Updated 3 years ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 9 months ago
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆206Updated last year
fshnkarimi / Fine-tuning-an-LLM-using-LoRA
📚 Text Classification with LoRA (Low-Rank Adaptation) of Language Models - Efficiently fine-tune large language models for text classifi…
☆50Updated last year
fkodom / grouped-query-attention-pytorch
(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …
☆171Updated last year
aju22 / LLaMA2
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…
☆68Updated last year
jlamprou / Infini-Attention
Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…
☆83Updated last year
WalkerMitty / Fast-Llama2
Fast instruction tuning with Llama2
☆11Updated last year
RustamyF / clip-multimodal-ml
☆63Updated last year
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆94Updated 6 months ago
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆74Updated last year
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated 10 months ago
tigerchen52 / awesome_role_of_small_models
a curated list of the role of small models in the LLM era
☆102Updated 9 months ago
kyegomez / TTL
Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"
☆25Updated 2 weeks ago
ambisinister / mla-experiments
Experiments on Multi-Head Latent Attention
☆93Updated 10 months ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆119Updated 8 months ago
rasbt / faster-pytorch-blog
Outlining techniques for improving the training performance of your PyTorch model without compromising its accuracy
☆128Updated 2 years ago
jiahe7ay / infini-mini-transformer
This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…
☆58Updated last year
conceptofmind / t5-pytorch
Implementation of Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer in PyTorch.
☆51Updated last year
rasbt / cvpr2023
☆133Updated last year
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆64Updated last year
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆121Updated 6 months ago
Nicolas-BZRD / llm-recipes
☆29Updated last year
xyjigsaw / LLM-Pretrain-SFT
Scripts of LLM pre-training and fine-tuning (w/wo LoRA, DeepSpeed)
☆82Updated last year
hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆337Updated last year
mingyin0312 / RL4LLM
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆29Updated 4 months ago
MLGroupJLU / RWKV-Survey
The official GitHub page for the survey paper "A Survey of RWKV".
☆27Updated 6 months ago
lucidrains / CALM-pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
☆177Updated 10 months ago