☆26Oct 2, 2023Updated 2 years ago
Alternatives and similar repositories for cpm_kernels
Users that are interested in cpm_kernels are comparing it to the libraries listed below
Sorting:
- BMInf demos.☆16Oct 14, 2021Updated 4 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Mar 12, 2026Updated last week
- Artificial intelligence is used in drug development☆18Dec 23, 2019Updated 6 years ago
- Accelerated Computer Vision Lab (ACCV-Lab) is a systematic collection of packages with the common goal to facilitate end-to-end efficient…☆46Feb 15, 2026Updated last month
- Transformers components but in Triton☆34May 9, 2025Updated 10 months ago
- ☆11Jan 10, 2025Updated last year
- ☆19May 11, 2024Updated last year
- Model Compression for Big Models☆168Jun 30, 2023Updated 2 years ago
- Low-level Vision Model Deployment☆10May 27, 2023Updated 2 years ago
- Efficient Inference for Big Models☆587Jan 24, 2023Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆42Updated this week
- ☆14Feb 16, 2026Updated last month
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- two model of try/catch like mechanism for golang (Don't panic, it's not about panic :) )☆13Feb 8, 2016Updated 10 years ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- a list of recent papers on transfer learning☆24Dec 5, 2017Updated 8 years ago
- A brief tutorial for eBPF: Verifier, observability, networking, and security.☆12Sep 19, 2024Updated last year
- ☆19Sep 15, 2022Updated 3 years ago
- Adversarial Training and SFT for Bot Safety Models☆40Apr 18, 2023Updated 2 years ago
- ☆12Mar 18, 2024Updated 2 years ago
- ☆17Aug 5, 2025Updated 7 months ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆19Sep 18, 2025Updated 6 months ago
- vLLM performance dashboard☆43Apr 26, 2024Updated last year
- ☆12Mar 13, 2023Updated 3 years ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 7 months ago
- A RISC-V assembler library for Scala/Chisel HDL projects☆16Mar 5, 2026Updated 2 weeks ago
- ☆12Dec 15, 2022Updated 3 years ago
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Apr 7, 2023Updated 2 years ago
- ☆11Apr 5, 2021Updated 4 years ago
- ☆18Dec 3, 2024Updated last year
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated last year
- Enhanced version of original AutoGPTQ (https://github.com/PanQiWei/AutoGPTQ).☆10Nov 2, 2023Updated 2 years ago
- ☆150Jan 9, 2025Updated last year
- ☆15Nov 11, 2024Updated last year
- This repo holds the research projects of our lab.☆11Jan 20, 2024Updated 2 years ago
- Sequence-to-Sequence Model for User Simulation☆10Feb 6, 2017Updated 9 years ago