ChaseLab-PKU/InstAttention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ChaseLab-PKU/InstAttention)

ChaseLab-PKU / InstAttention

InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

☆17

Alternatives and similar repositories for InstAttention

Users that are interested in InstAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AIS-SNU / Smart-Infinity
View on GitHub
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
☆53Jul 21, 2025Updated 10 months ago
Halifuda / Xerxes
View on GitHub
A standalone CXL-enabled system simulator.
☆21Apr 19, 2026Updated last month
ChaseLab-PKU / ScalaCache
View on GitHub
ScalaCache: Scalable User-Space Page Cache Management with Software-Hardware Coordination (USENIX ATC'24)
☆16Jul 19, 2024Updated last year
ChaseLab-PKU / ScalaAFA
View on GitHub
ScalaAFA: Constructing User-Space All-Flash Array Engine with Holistic Designs (USENIX ATC 2024).
☆16Nov 25, 2024Updated last year
KULeuven-COSIC / fpt-demo
View on GitHub
FPT: a Fixed-Point Accelerator for Torus Fully Homomorphic Encryption
☆29Sep 2, 2025Updated 8 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
RC4ML / CAM
View on GitHub
CAM: Asynchronous GPU-Initiated, CPU-Managed SSD Management for Batching Storage Access [ICDE'25]
☆19Mar 3, 2025Updated last year
zyqCSL / DiffKV
View on GitHub
☆43Oct 11, 2025Updated 7 months ago
optiq-lab / 3L-Cache
View on GitHub
☆16Aug 9, 2025Updated 9 months ago
RuokaiYin / LoAS
View on GitHub
LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks, MICRO 2024.
☆18Mar 19, 2025Updated last year
ZonedStorage / RAIZN-release
View on GitHub
Source code for RAIZN (ASPLOS '23)
☆15Oct 18, 2022Updated 3 years ago
JimZeyuYang / GPU_Power_Benchmark
View on GitHub
Microbenchmark that unveals the mechanisms behind power readings reported by nvidia-smi on your NVIDIA GPU.
☆14Dec 12, 2024Updated last year
DylanLIiii / XJTLU-manual
View on GitHub
前程似锦
☆19Nov 8, 2022Updated 3 years ago
toufique-morshed / CPU-GPU-TFHE
View on GitHub
A CPU and GPU accelerated framework for TFHE. The framework includes algebraic, vector, and matrix operations.
☆21Apr 15, 2020Updated 6 years ago
absmall / p2
View on GitHub
This program implements the P^2 algorithm as documented in "The P-Square Algorithm for Dynamic Calculation of Percentiles and Histograms …
☆25Jun 19, 2019Updated 6 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Xilinx / libdfx
View on GitHub
☆13May 14, 2026Updated last week
thu-nics / CLAP-triangle-counting
View on GitHub
[DATE'23] The official code for paper <CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory>
☆24Updated this week
SNU-DBXLab-papers / SmartSSD
View on GitHub
☆10Feb 22, 2023Updated 3 years ago
FCAS-LAB / LEGOSIM_MICRO
View on GitHub
☆30Aug 4, 2025Updated 9 months ago
momalab / CoPHEE
View on GitHub
CoPHEE is a Co-processor for Partially Homomorphic Encrypted Encryption.
☆36Feb 21, 2024Updated 2 years ago
tue-es / gpu-cache-model
View on GitHub
A GPU cache model for research purposes
☆32Nov 4, 2013Updated 12 years ago
adnansirajrakin / TBT-CVPR2020
View on GitHub
In the repository we provide a sample code to implement the Targeted Bit Trojan attack.
☆20Nov 7, 2020Updated 5 years ago
FFGGSSJJ / SmartSSD-Oriented-Work
View on GitHub
SmartSSD related benchmarks and toy applications
☆13Nov 1, 2023Updated 2 years ago
guanyilin428 / Dynamic-Speculative-Planning
View on GitHub
☆46Sep 13, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
hust-diangroup / LabelMark
View on GitHub
A photo and video marking tools for data generation.
☆16Nov 16, 2019Updated 6 years ago
TELOS-syslab / CortenMM-Artifact
View on GitHub
The artifact of the SOSP '25 paper "CortenMM: Efficient Memory Management with Strong Correctness Guarantees".
☆41Nov 12, 2025Updated 6 months ago
cornell-zhang / allo-pldi24-artifact
View on GitHub
Artifact evaluation of PLDI'24 paper "Allo: A Programming Model for Composable Accelerator Design"
☆34Apr 11, 2024Updated 2 years ago
zhuhanqing / Lightening-Transformer-AE
View on GitHub
Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…
☆11Mar 3, 2024Updated 2 years ago
linuslau / CXL-Emulator-QEMU
View on GitHub
Exploring CXL on QEMU Emulation
☆39Mar 4, 2025Updated last year
TrelisResearch / llama-2-setup
View on GitHub
Prompt format and padding guide for Llama 2
☆12Sep 18, 2023Updated 2 years ago
Matt-Dong123 / tools4szu
View on GitHub
Convenient tools for szu
☆92Jan 22, 2026Updated 4 months ago
dorpxam / einops-cpp
View on GitHub
C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notation
☆11Oct 16, 2023Updated 2 years ago
equation314 / minidecaf
View on GitHub
Web version of the MiniDecaf compiler.
☆13Sep 17, 2020Updated 5 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
OpenMOSS / MOSS-Video-Preview
View on GitHub
A real-time video understanding foundation model built on Llama-3.2-Vision, featuring comprehensively extended video processing and multi…
☆138Apr 13, 2026Updated last month
xupsh / Alveo_Chinese
View on GitHub
Chinese Guide for Alveo Getting Started
☆12May 18, 2020Updated 6 years ago
sharc-lab / GenGNN
View on GitHub
☆37Jan 20, 2022Updated 4 years ago
hyungyokim / LIA_AMXGPU
View on GitHub
[ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading
☆12Jun 28, 2025Updated 10 months ago
SAOHPRWHG / FindSimilarTPO
View on GitHub
This is a tool to find similar TPO reading and listening material. Hope it can help your TOEFL study.
☆10Mar 5, 2020Updated 6 years ago
RC4ML / RPCNIC
View on GitHub
RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]
☆15Dec 9, 2024Updated last year
yangyaojia / Bilibili_Bangumi_Download
View on GitHub
B站下载姬～召唤你喜爱的番剧吧！～
☆12Feb 23, 2020Updated 6 years ago