foundation-model-stack / fms-extras
☆24Updated 6 months ago
Alternatives and similar repositories for fms-extras:
Users that are interested in fms-extras are comparing it to the libraries listed below
- Load compute kernels from the Hub☆107Updated this week
- ☆63Updated this week
- Train, tune, and infer Bamba model☆87Updated 2 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆92Updated this week
- RWKV-7: Surpassing GPT☆82Updated 4 months ago
- ☆102Updated 7 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆234Updated this week
- Make triton easier☆47Updated 9 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆59Updated 5 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 3 months ago
- ☆21Updated 3 weeks ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆107Updated this week
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆68Updated 10 months ago
- Work in progress.☆50Updated 2 weeks ago
- ☆50Updated 5 months ago
- Benchmark suite for LLMs from Fireworks.ai☆70Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsity☆71Updated 6 months ago
- ☆46Updated last year
- extensible collectives library in triton☆84Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆209Updated 4 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆54Updated this week
- Code for studying the super weight in LLM☆94Updated 3 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 5 months ago
- ☆13Updated this week
- Explore training for quantized models☆17Updated 2 months ago
- DPO, but faster 🚀☆40Updated 3 months ago
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last month