goodevening13 / aquakvLinks

☆16

Alternatives and similar repositories for aquakv

Users that are interested in aquakv are comparing it to the libraries listed below

Sorting:

ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
huggingface / kernels
Load compute kernels from the Hub
☆308Updated this week
mgmalek / efficient_cross_entropy
☆121Updated last year
IST-DASLab / Quartet
☆102Updated last week
IST-DASLab / QuEST
Work in progress.
☆74Updated 4 months ago
Cornell-RelaxML / qtip
☆152Updated 4 months ago
cloneofsimo / min-fsdp
☆91Updated last year
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆147Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆92Updated 3 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆59Updated last year
IST-DASLab / sparseprop
☆15Updated 2 years ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆168Updated 4 months ago
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆119Updated 10 months ago
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆109Updated 5 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆102Updated 2 weeks ago
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated last month
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 3 months ago
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆100Updated 6 months ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆84Updated last year
melisa-writer / short-transformers
Prune transformer layers
☆69Updated last year
apple / ml-ademamix
☆68Updated 11 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
HomebrewML / HeavyBall
Efficient optimizers
☆275Updated 2 weeks ago
gpu-mode / ring-attention
ring-attention experiments
☆155Updated last year
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆69Updated last year
huggingface / picotron_tutorial
☆224Updated last week
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆63Updated this week
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆194Updated 4 months ago