Raincleared-Song/sparse_gpu_operator

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Raincleared-Song/sparse_gpu_operator)

Raincleared-Song / sparse_gpu_operator

GPU operators for sparse tensor operations

☆37

Alternatives and similar repositories for sparse_gpu_operator

Users that are interested in sparse_gpu_operator are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Raincleared-Song / DejaVu_predictor
View on GitHub
The codes for training sparsity predictor on LLaMA.
☆18May 12, 2024Updated 2 years ago
fw-ai / llama-cuda-graph-example
View on GitHub
Example of applying CUDA graphs to LLaMA-v2
☆11Aug 25, 2023Updated 2 years ago
FMInference / DejaVu
View on GitHub
☆359Apr 2, 2024Updated 2 years ago
smitkiri / news-qa
View on GitHub
Reading comprehension based question-answering model for news articles.
☆11Jun 22, 2022Updated 4 years ago
IST-DASLab / SparseFinetuning
View on GitHub
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆43Jan 15, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
uw-mad-dash / decoding-speculative-decoding
View on GitHub
☆16Aug 19, 2024Updated last year
VITA-Group / READ-ME
View on GitHub
[NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo …
☆16Dec 16, 2024Updated last year
ScalingIntelligence / CATS
View on GitHub
☆33Nov 11, 2024Updated last year
SqueezeBits / QUICK
View on GitHub
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆123Mar 6, 2024Updated 2 years ago
Hyaloid / AccSpMM
View on GitHub
Official implementation of Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.
☆17Nov 13, 2025Updated 8 months ago
uchuhimo / amanda
View on GitHub
☆18Apr 21, 2024Updated 2 years ago
keeeeenw / TinyLlama
View on GitHub
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
☆14Mar 30, 2024Updated 2 years ago
AniZpZ / AutoSmoothQuant
View on GitHub
An easy-to-use package for implementing SmoothQuant for LLMs
☆111Apr 7, 2025Updated last year
NickdeDycker / EstimoteIndoorAndroid
View on GitHub
Estimote Indoor Location finder
☆15Jan 29, 2015Updated 11 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
yanlinf / UXSenti
View on GitHub
Unsupervised Cross-lingual Sentiment Analysis (CoNLL 2019)
☆10Nov 4, 2019Updated 6 years ago
FasterDecoding / TEAL
View on GitHub
☆167Feb 15, 2025Updated last year
ChandlerGuan / Transkimmer
View on GitHub
Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim
☆22Aug 21, 2022Updated 3 years ago
jaepil / geometric-adam
View on GitHub
A Ray Tracing-Inspired Approach to Neural Network Optimization
☆17Jun 11, 2025Updated last year
krafton-ai / lexico
View on GitHub
KV cache compression via sparse coding
☆17Oct 26, 2025Updated 8 months ago
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆400Jul 10, 2025Updated last year
RoySegal / tvmcon23_byoc
View on GitHub
☆11Mar 15, 2023Updated 3 years ago
lol0963332320 / ICLAB
View on GitHub
☆19Mar 23, 2023Updated 3 years ago
joey00072 / ohara
View on GitHub
Collection of autoregressive model implementation
☆84Jun 10, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
View on GitHub
☆22Dec 15, 2023Updated 2 years ago
PKU-SEC-Lab / AdapMoE
View on GitHub
Code release for AdapMoE accepted by ICCAD 2024
☆39Apr 28, 2025Updated last year
cdj0311 / bert_distill_lstm
View on GitHub
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks.
☆15Aug 28, 2020Updated 5 years ago
microsoft / TransformerCompression
View on GitHub
For releasing code related to compression methods for transformers, accompanying our publications
☆461Jan 16, 2025Updated last year
escalab / SIMD2
View on GitHub
☆31Jun 15, 2022Updated 4 years ago
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆418Nov 20, 2025Updated 8 months ago
SqueezeAILab / KVQuant
View on GitHub
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
☆431Aug 13, 2024Updated last year
project-etalon / etalon
View on GitHub
LLM Serving Performance Evaluation Harness
☆84Feb 25, 2025Updated last year
PKU-ML / adainf
View on GitHub
Official code for ICLR 2024 paper "Do Generated Data Always Help Contrastive Learning?"
☆31Apr 4, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nanowell / Q-Sparse-LLM
View on GitHub
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆37Aug 14, 2024Updated last year
dilawar / sniffer
View on GitHub
A poor man MOSS (Measure of software similarity)
☆31Jun 27, 2017Updated 9 years ago
goliaro / specinfer-ae
View on GitHub
☆28Mar 14, 2024Updated 2 years ago
dcaox / MIT6.5940
View on GitHub
模型加速/模型压缩（已完成所有Lab）
☆11Dec 24, 2023Updated 2 years ago
SJTU-IPADS / Bamboo
View on GitHub
Bamboo-7B Large Language Model
☆95Mar 28, 2024Updated 2 years ago
mit-han-lab / omniserve
View on GitHub
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆850Mar 6, 2025Updated last year
ZO-Bench / ZO-LLM
View on GitHub
[ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".
☆128Jul 6, 2025Updated last year