foundation-model-stack / fms-extras
☆20Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for fms-extras
- A safetensors extension to efficiently store sparse quantized tensors on disk☆50Updated this week
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆56Updated last month
- Example of applying CUDA graphs to LLaMA-v2☆10Updated last year
- ☆99Updated last month
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- ☆35Updated 3 weeks ago
- Make triton easier☆41Updated 5 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆139Updated this week
- ☆62Updated 3 months ago
- ☆49Updated 8 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆79Updated this week
- Experiments on speculative sampling with Llama models☆118Updated last year
- ☆122Updated 10 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆187Updated this week
- ☆45Updated 2 months ago
- LLM KV cache compression made easy☆64Updated last week
- A pipeline for LLM knowledge distillation☆78Updated 3 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆135Updated last month
- Utilities for Training Very Large Models☆56Updated last month
- Normalized Transformer (nGPT)☆66Updated this week
- PB-LLM: Partially Binarized Large Language Models☆148Updated last year
- ☆48Updated last month
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆74Updated last month
- ☆17Updated 3 weeks ago
- Implementation of Hyena Hierarchy in JAX☆10Updated last year
- Experiment of using Tangent to autodiff triton☆72Updated 10 months ago
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆61Updated last month
- Benchmark suite for LLMs from Fireworks.ai☆58Updated 2 weeks ago