goodevening13 / aquakvLinks
☆15Updated 2 weeks ago
Alternatives and similar repositories for aquakv
Users that are interested in aquakv are comparing it to the libraries listed below
Sorting:
- ☆71Updated 2 weeks ago
- ☆112Updated last year
- A library for unit scaling in PyTorch☆125Updated this week
- Work in progress.☆70Updated 2 weeks ago
- Load compute kernels from the Hub☆207Updated this week
- supporting pytorch FSDP for optimizers☆82Updated 7 months ago
- ☆139Updated 3 weeks ago
- Code for studying the super weight in LLM☆113Updated 7 months ago
- This repository contains the experimental PyTorch native float8 training UX☆224Updated 11 months ago
- Efficient optimizers☆234Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆128Updated 7 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆17Updated this week
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆138Updated 11 months ago
- ☆80Updated last year
- Code for data-aware compression of DeepSeek models☆36Updated last month
- Fast low-bit matmul kernels in Triton☆330Updated last week
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆237Updated last month
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆243Updated 5 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆142Updated last month
- Prune transformer layers☆69Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆83Updated 3 weeks ago
- Fast, Modern, and Low Precision PyTorch Optimizers☆98Updated this week
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆78Updated last month
- Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"☆161Updated 6 months ago
- nanoGPT-like codebase for LLM training☆100Updated 2 months ago
- QuIP quantization☆54Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆188Updated last month
- Triton-based implementation of Sparse Mixture of Experts.☆225Updated 7 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆198Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆68Updated 11 months ago