Benchmarks to capture important workloads.
☆32Mar 6, 2026Updated 2 weeks ago
Alternatives and similar repositories for FAMBench
Users that are interested in FAMBench are comparing it to the libraries listed below
Sorting:
- ☆12May 25, 2021Updated 4 years ago
- parser script to process pytorch autograd profiler result, convert json file to excel.☆15Oct 8, 2019Updated 6 years ago
- Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involveme…☆22Apr 25, 2024Updated last year
- ☆12Mar 28, 2023Updated 2 years ago
- ☆12Aug 26, 2025Updated 6 months ago
- Flexible memory allocation tool for multi-tiered memory systems☆13Jan 7, 2026Updated 2 months ago
- ☆20Nov 7, 2019Updated 6 years ago
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Feb 8, 2024Updated 2 years ago
- ☆18Jan 1, 2023Updated 3 years ago
- GPU-accelerated AES encryption project☆11Feb 13, 2015Updated 11 years ago
- Optimizing scheduler. Combinatorial instruction scheduling project.☆27Jan 7, 2026Updated 2 months ago
- ☆10Jul 16, 2020Updated 5 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆11May 6, 2023Updated 2 years ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆38Aug 29, 2025Updated 6 months ago
- ☆22Jul 31, 2019Updated 6 years ago
- A human-friendly implementation of the iRobot Open Interface version 2 API.☆14May 14, 2016Updated 9 years ago
- Issues related to MLPerf® Inference policies, including rules and suggested changes☆63Feb 4, 2026Updated last month
- ☆48Mar 5, 2024Updated 2 years ago
- EECS 151/251A FPGA Project Skeleton for Spring 2020☆12May 6, 2020Updated 5 years ago
- Auto-differentiation library for C++☆12Jan 16, 2022Updated 4 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆179Dec 16, 2025Updated 3 months ago
- Scalable radix top-k selection on GPUs.☆21Jan 27, 2025Updated last year
- Library for the Test-based Calibration Error (TCE) metric to quantify the degree to classifier calibration.☆13Sep 15, 2023Updated 2 years ago
- ☆12Mar 19, 2022Updated 4 years ago
- AutodiffEngine☆13Apr 1, 2019Updated 6 years ago
- ☆60Sep 15, 2023Updated 2 years ago
- Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.☆13Mar 20, 2025Updated last year
- diffusers with search engine☆12Jan 13, 2026Updated 2 months ago
- ☆14Mar 8, 2025Updated last year
- Framework for Algorithmic Correctness Testing of Operators☆16Mar 9, 2026Updated last week
- ☆12Mar 7, 2024Updated 2 years ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆12Jun 24, 2024Updated last year
- ☆191Jun 16, 2024Updated last year
- Notes on putting micropython on STM32F407VG bare board☆11Oct 7, 2019Updated 6 years ago
- N-body simulation based on CUDA.☆14Jun 20, 2019Updated 6 years ago
- DL Dataloader Benchmarks☆20Jan 27, 2025Updated last year
- Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-…☆11Jul 1, 2025Updated 8 months ago
- ☆42Dec 10, 2024Updated last year
- The only known (by 2022) open-source, easy-to-understand basic algorithm implementations in TD-CEM. (Please star and fork this project if…☆15Mar 1, 2022Updated 4 years ago