gau-nernst / quantized-training
Explore training for quantized models
☆10Updated last week
Related projects ⓘ
Alternatives and complementary repositories for quantized-training
- ☆47Updated 2 months ago
- ☆17Updated 3 weeks ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- ☆12Updated last month
- Efficient, Flexible and Portable Structured Generation☆53Updated this week
- extensible collectives library in triton☆72Updated last month
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆35Updated 6 months ago
- Make triton easier☆41Updated 5 months ago
- Simple and fast low-bit matmul kernels in CUDA / Triton☆145Updated this week
- TORCH_LOGS parser for PT2☆22Updated last week
- ☆55Updated 5 months ago
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆50Updated this week
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated last week
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- ☆45Updated 2 weeks ago
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- ☆43Updated 4 months ago
- GPTQ inference TVM kernel☆36Updated 6 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆61Updated last month
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 5 months ago
- ☆24Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆87Updated last month
- ☆99Updated last month
- ring-attention experiments☆97Updated last month
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆187Updated this week