casys-kaist / DaCapoLinks

☆19

Alternatives and similar repositories for DaCapo

Users that are interested in DaCapo are comparing it to the libraries listed below

Sorting:

VIA-Research / vTrain
☆73Updated 5 months ago
swsnu / aisys2023
☆103Updated 2 years ago
junstar92 / nvidia-libraries-study
☆56Updated last year
SNU-ARC / any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆120Updated 4 months ago
hongsunjang / pipe-bd
[DATE 2023] Pipe-BD: Pipelined Parallel Blockwise Distillation
☆11Updated 2 years ago
XRBench / XRBench-MLSys2023
A version of XRBench-MAESTRO used for MLSys 2023 publication
☆25Updated 2 years ago
PingchengDong / GQA-LUT
The official implementation of the DAC 2024 paper GQA-LUT
☆20Updated 11 months ago
GATECH-EIC / ShiftAddLLM
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
☆111Updated last year
junstar92 / parallel_programming_study
Study parallel programming - CUDA, OpenMP, MPI, Pthread
☆60Updated 3 years ago
casys-kaist / LLMServingSim
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
☆158Updated 4 months ago
efficient-ai-study / efficient-ai-study
☆91Updated last year
eis-lab / sage
Experimental deep learning framework written in Rust
☆15Updated 3 years ago
xvyaward / owq
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆67Updated last year
eth-easl / deltazip
Compression for Foundation Models
☆34Updated 4 months ago
NoakLiu / PiKV
PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]
☆42Updated last month
etri / nest-compiler
NEST Compiler
☆119Updated 9 months ago
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
abdelfattah-lab / shadow_llm
☆10Updated last year
Qualcomm-AI-research / gptvq
☆37Updated last year
kaist-ina / stellatrain
Official Github repository for the SIGCOMM '24 paper "Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs"
☆72Updated last year
radha-patel / SySTeC
Performant kernels for symmetric tensors
☆16Updated last year
IntelLabs / Hardware-Aware-Automated-Machine-Learning
☆71Updated 3 months ago
thu-ml / Jetfire-INT8Training
☆60Updated last year
kakaobrain / trident
A performance library for machine learning applications.
☆184Updated 2 years ago
naver-aics / lut-gemm
☆80Updated last year
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆109Updated 6 months ago
ConstantPark / DL_Compiler
Study Group of Deep Learning Compiler
☆165Updated 2 years ago
ranggihwang / Pregated_MoE
☆57Updated last year
INT-FlashAttention2024 / INT-FlashAttention
☆83Updated 10 months ago
SqueezeBits / Torch-TRTLLM
Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
☆50Updated 4 months ago