argonne-lcf / LLM-Inference-Bench
LLM-Inference-Bench
☆11Updated last week
Related projects ⓘ
Alternatives and complementary repositories for LLM-Inference-Bench
- ☆11Updated 3 years ago
- Unit Scaling demo and experimentation code☆16Updated 8 months ago
- ☆26Updated 3 years ago
- An Attention Superoptimizer☆20Updated 6 months ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆14Updated 5 years ago
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆13Updated 4 years ago
- An IR for efficiently simulating distributed ML computation.☆25Updated 10 months ago
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆61Updated last month
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- Packages and instructions for training and inference of LLMs on NVIDIA's new GH200 machines☆19Updated 2 months ago
- Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.☆17Updated 5 years ago
- ☆36Updated last year
- ☆46Updated 5 months ago
- ☆13Updated last year
- ☆55Updated 5 months ago
- Sparsity support for PyTorch☆31Updated this week
- A Python library transfers PyTorch tensors between CPU and NVMe☆98Updated last week
- benchmarking some transformer deployments☆26Updated last year
- ☆16Updated 7 months ago
- CUDA 12.2 HMM demos☆17Updated 3 months ago
- Benchmarks to capture important workloads.☆28Updated 5 months ago
- ☆38Updated 4 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 6 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆34Updated 2 years ago
- ☆70Updated 2 years ago
- Personal solutions to the Triton Puzzles☆16Updated 4 months ago
- extensible collectives library in triton☆72Updated last month
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆44Updated 5 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- ☆19Updated 8 months ago