eth-easl / fmengineLinks

Utilities for Training Very Large Models

☆58

Alternatives and similar repositories for fmengine

Users that are interested in fmengine are comparing it to the libraries listed below

Sorting:

shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆35Updated 2 years ago
luyug / magix
Supercharge huggingface transformers with model parallelism.
☆77Updated 4 months ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
srush / LLM-Talk
☆52Updated last year
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆22Updated 2 years ago
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated 2 years ago
UmerHA / triton_util
Make triton easier
☆49Updated last year
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆46Updated 2 years ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago
prateeky2806 / ComPEFT
☆26Updated 2 years ago
Zyphra / Zyda_processing
☆39Updated last year
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated 2 years ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆79Updated last year
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆35Updated 2 years ago
crypdick / timm-lr-scheduler-explorer
A dashboard for exploring timm learning rate schedulers
☆19Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
huggingface / peft-pytorch-conference
Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…
☆14Updated 2 years ago
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆62Updated 2 weeks ago
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆40Updated 2 years ago
jxiw / BiGS
Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …
☆115Updated last year
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
ahennequ / pytorch-custom-mma
☆29Updated 3 years ago
apoorvkh / torchrunx
Easily run PyTorch on multiple GPUs & machines
☆54Updated 2 weeks ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆102Updated last year
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆46Updated last year
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆58Updated this week
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year