GPU operators for sparse tensor operations
☆35Mar 11, 2024Updated 2 years ago
Alternatives and similar repositories for sparse_gpu_operator
Users that are interested in sparse_gpu_operator are comparing it to the libraries listed below
Sorting:
- The codes for training sparsity predictor on LLaMA.☆18May 12, 2024Updated last year
- Example of applying CUDA graphs to LLaMA-v2☆12Aug 25, 2023Updated 2 years ago
- ☆353Apr 2, 2024Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Jan 15, 2024Updated 2 years ago
- ☆15Aug 19, 2024Updated last year
- Official implementation of Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.☆14Nov 13, 2025Updated 4 months ago
- [COLING22] Text-to-Text Extraction and Verbalization of Biomedical Event Graphs☆10Nov 5, 2022Updated 3 years ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆120Mar 6, 2024Updated 2 years ago
- ☆18Apr 21, 2024Updated last year
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆14Mar 30, 2024Updated last year
- ☆26Feb 28, 2025Updated last year
- My tests and experiments with some popular dl frameworks.☆17Sep 11, 2025Updated 6 months ago
- Unsupervised Cross-lingual Sentiment Analysis (CoNLL 2019)☆10Nov 4, 2019Updated 6 years ago
- ☆162Feb 15, 2025Updated last year
- Estimote Indoor Location finder☆15Jan 29, 2015Updated 11 years ago
- 北京大学 2024 秋季学期编译原理课程 Lab 代码、笔记、经验☆16Sep 12, 2025Updated 6 months ago
- KV cache compression via sparse coding☆17Oct 26, 2025Updated 4 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆376Jul 10, 2025Updated 8 months ago
- ☆11Mar 15, 2023Updated 3 years ago
- ☆17Mar 23, 2023Updated 2 years ago
- ☆161Dec 27, 2024Updated last year
- 非雇员OD管理复盘与面试改进思考☆16Jul 2, 2025Updated 8 months ago
- Collection of autoregressive model implementation☆85Feb 23, 2026Updated 3 weeks ago
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks.☆15Aug 28, 2020Updated 5 years ago
- ☆22Dec 15, 2023Updated 2 years ago
- Code release for AdapMoE accepted by ICCAD 2024☆36Apr 28, 2025Updated 10 months ago
- ☆31Jun 15, 2022Updated 3 years ago
- ☆41Oct 15, 2025Updated 5 months ago
- Code for Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach☆14Jul 19, 2020Updated 5 years ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆359Nov 20, 2025Updated 4 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆34Aug 14, 2024Updated last year
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆408Aug 13, 2024Updated last year
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆49Feb 28, 2026Updated 3 weeks ago
- Python implementation of REMBO built on GPyTorch.☆18Jul 11, 2020Updated 5 years ago
- A poor man MOSS (Measure of software similarity)☆31Jun 27, 2017Updated 8 years ago
- Self-Distribution BNN☆10Mar 8, 2022Updated 4 years ago
- Source code for the paper "Source of Transfer in Multilingual Named Entity Recognition"☆12Dec 8, 2022Updated 3 years ago
- LLM Serving Performance Evaluation Harness☆83Feb 25, 2025Updated last year
- ☆26Mar 14, 2024Updated 2 years ago