satabios / sconce
E2E AutoML Model Compression Package
☆46Updated 2 months ago
Alternatives and similar repositories for sconce
Users that are interested in sconce are comparing it to the libraries listed below
Sorting:
- ☆47Updated 10 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆67Updated last month
- NanoGPT-speedrunning for the poor T4 enjoyers☆65Updated 3 weeks ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆41Updated last year
- Collection of autoregressive model implementation☆85Updated 3 weeks ago
- Work in progress.☆62Updated last month
- Repository for CPU Kernel Generation for LLM Inference☆26Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- ☆44Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated 2 months ago
- This repository contains code for the MicroAdam paper.☆18Updated 5 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆126Updated 5 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆86Updated last month
- ☆49Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆106Updated 7 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 10 months ago
- Cray-LM unified training and inference stack.☆22Updated 3 months ago
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- ☆79Updated 10 months ago
- Mobile Viewer for W&B, built on top of Flutter.☆34Updated last year
- A really tiny autograd engine☆92Updated last year
- Code for the paper "Function-Space Learning Rates"☆20Updated last month
- FlexAttention w/ FlashAttention3 Support☆26Updated 7 months ago
- Evaluation Code repository for the paper "ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers". (2023…☆13Updated last year
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- ☆60Updated 6 months ago
- ☆27Updated 10 months ago
- ☆88Updated last year