foundation-model-stack / fms-accelerationLinks
π Collection of libraries used with fms-hf-tuning to accelerate fine-tuning and training of large models.
β13Updated 3 weeks ago
Alternatives and similar repositories for fms-acceleration
Users that are interested in fms-acceleration are comparing it to the libraries listed below
Sorting:
- Experimental scripts for researching data adaptive learning rate scheduling.β22Updated 2 years ago
- Utilities for Training Very Large Modelsβ58Updated last year
- β34Updated last year
- JAX Scalify: end-to-end scaled arithmeticsβ16Updated last year
- Fork of Flame repo for training of some new stuff in developmentβ18Updated this week
- PyTorch centric eager mode debuggerβ48Updated 10 months ago
- β21Updated 8 months ago
- Train, tune, and infer Bamba modelβ135Updated 5 months ago
- β22Updated 10 months ago
- β46Updated last year
- β26Updated last year
- Source-to-Source Debuggable Derivatives in Pure Pythonβ15Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundryβ42Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.β43Updated this week
- Using FlexAttention to compute attention with different masking patternsβ47Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS β¦β60Updated last year
- [ICLR'25] Code for KaSA, an official implementation of "KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models"β20Updated 9 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.β45Updated last year
- Implementation of Hyena Hierarchy in JAXβ10Updated 2 years ago
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.β18Updated last year
- FlexAttention w/ FlashAttention3 Supportβ27Updated last year
- β26Updated last month
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open dataβ21Updated last year
- Triton Implementation of HyperAttention Algorithmβ48Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heβ¦β31Updated 2 years ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for trainingβ14Updated 10 months ago
- Minimum Description Length probing for neural network representationsβ20Updated 9 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findingsβ45Updated 2 years ago
- β15Updated last year
- MEXMA: Token-level objectives improve sentence representationsβ42Updated 10 months ago