satabios / sconceLinks

E2E AutoML Model Compression Package

☆46

Alternatives and similar repositories for sconce

Users that are interested in sconce are comparing it to the libraries listed below

Sorting:

joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆72Updated 6 months ago
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆111Updated last month
minyoungg / LTE
☆69Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
IST-DASLab / QuEST
Work in progress.
☆74Updated 4 months ago
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 7 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆102Updated 3 weeks ago
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 11 months ago
IST-DASLab / Quartet
☆105Updated last week
google-deepmind / asyncdiloco
☆46Updated last year
zaydzuhri / flame
Fork of Flame repo for training of some new stuff in development
☆18Updated this week
okarthikb / state-space-models
☆28Updated last year
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆156Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
ScalingIntelligence / good-kernels
Samples of good AI generated CUDA kernels
☆91Updated 5 months ago
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆63Updated last month
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆101Updated 6 months ago
vdesai2014 / inference-optimization-blog-post
☆89Updated last year
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆135Updated 5 months ago
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
zaydzuhri / softpick-attention
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
☆85Updated last month
Libraries-Openly-Fused / FusedKernelLibrary
Implementation of a methodology that allows all sorts of user defined GPU kernel fusion, for non CUDA programmers.
☆26Updated last week
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
gpu-mode / profiling-cuda-in-torch
☆176Updated last year
rasbt / pytorch-memory-optim
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…
☆91Updated 2 years ago
joey00072 / Tinytorch
A really tiny autograd engine
☆97Updated 5 months ago
main-horse / hnet-old
H-Net Dynamic Hierarchical Architecture
☆80Updated last month
Oxen-AI / BitNet-1.58-Instruct
Implementation of BitNet-1.58 instruct tuning
☆27Updated last year