OSU-Nowlab / Flover
A novel temporal fusion framework for propelling autoregressive model inference
☆11Updated this week
Related projects ⓘ
Alternatives and complementary repositories for Flover
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆14Updated 5 years ago
- A GPU-driven system framework for scalable AI applications☆109Updated last month
- A Python library transfers PyTorch tensors between CPU and NVMe☆98Updated last week
- An IR for efficiently simulating distributed ML computation.☆25Updated 10 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated last month
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆36Updated this week
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆75Updated 8 months ago
- A TensorFlow Extension: GPU performance tools for TensorFlow.☆25Updated last year
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆44Updated 3 years ago
- 🔮 Execution time predictions for deep neural network training iterations across different GPUs.☆56Updated last year
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆75Updated last week
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- Analysis for the traces from byteprofile☆29Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆43Updated this week
- Open source of an IBM Optimized version of the HPCG benchmark.☆14Updated 8 months ago
- LLM-Inference-Bench☆11Updated last week
- CUDA GPU Benchmark☆17Updated 4 months ago
- ☆26Updated 3 years ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆19Updated this week
- Bandwidth test for ROCm☆49Updated this week
- Fast and memory-efficient exact attention☆30Updated 3 weeks ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆57Updated 6 months ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆57Updated 5 months ago
- FTPipe and related pipeline model parallelism research.☆41Updated last year
- A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.☆22Updated 2 months ago
- GVProf: A Value Profiler for GPU-based Clusters☆47Updated 7 months ago
- ☆20Updated 9 months ago
- ☆33Updated 2 months ago
- Intel® SHMEM - Device initiated shared memory based communication library☆21Updated 2 weeks ago
- A Top-Down Profiler for GPU Applications☆13Updated 8 months ago