abdelfattah-lab / BitMoD-HPCA-25Links
☆114Updated 4 months ago
Alternatives and similar repositories for BitMoD-HPCA-25
Users that are interested in BitMoD-HPCA-25 are comparing it to the libraries listed below
Sorting:
- Some Hardware Architectures for GEMM☆283Updated 6 months ago
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆238Updated last year
- ☆24Updated last year
- ☆103Updated 4 years ago
- [Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models☆1,151Updated last month
- [ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2☆266Updated 3 months ago
- CXL remote offloading data movement aware compiler☆70Updated last week
- Official implementation of "REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving" (NeurIPS 2025)☆31Updated this week
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,283Updated 11 months ago
- LeNet5 on PYNQ via HLS☆37Updated 2 years ago
- UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…☆1,108Updated this week
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆94Updated last year
- Official Implementation of "Accel-GNN: High-Performance GPU Accelerator Design for Graph Neural Networks"☆51Updated 8 months ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆2,752Updated 2 weeks ago
- Step-by-step optimization of TPU MatMul Kernels☆85Updated 4 months ago
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆196Updated last year
- YiRage (Yield Revolutionary AGile Engine) - Multi-Backend LLM Inference Optimization. Extends Mirage with comprehensive support for CUDA,…☆32Updated last week
- [NeurIPS'25] KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems☆98Updated last month
- JittorGeometric is a Jittor-based graph machine learning library.☆434Updated 3 months ago
- Host shell scripts: configure FPGA's DMA-SG via PCIe XDMA.☆26Updated 5 months ago
- Vitis HLS 2022.2 projects source code: C design, C simulation, RTL simulation.【vitis_hls工程】☆23Updated 5 months ago
- SQuant [ICLR22]☆130Updated 3 years ago
- [ICLR 2025] BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments☆39Updated 9 months ago
- Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)☆272Updated last week
- Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate (NeurIPS 2024)☆32Updated last year
- [NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy☆73Updated 10 months ago
- A Tiny structure of pytorch for learning;☆60Updated last year
- A distributed framework for LLM agents☆171Updated this week
- [TMC 2025/NOSSDAV 2023] Official code for RepCaM++ and RepCaM: Re-parameterization Content-aware Modulation for Neural Video Delivery☆54Updated 7 months ago
- [NeurIPS 2025] Accelerating Parallel Diffusion Model Serving with Residual Compression☆39Updated last month