OSU-Nowlab / Flover
A novel temporal fusion framework for propelling autoregressive model inference
☆11Updated this week
Related projects ⓘ
Alternatives and complementary repositories for Flover
- A TensorFlow Extension: GPU performance tools for TensorFlow.☆25Updated last year
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆13Updated 5 years ago
- A GPU-driven system framework for scalable AI applications☆109Updated last month
- ☆15Updated 2 months ago
- ☆55Updated 5 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated last month
- An IR for efficiently simulating distributed ML computation.☆25Updated 10 months ago
- GVProf: A Value Profiler for GPU-based Clusters☆47Updated 7 months ago
- Magnum IO community repo☆80Updated 5 months ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆74Updated last week
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆18Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆41Updated this week
- ☆26Updated 3 years ago
- ☆12Updated this week
- An extension library of WMMA API (Tensor Core API)☆83Updated 4 months ago
- MLPerf™ logging library☆30Updated last week
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆35Updated last week
- Fast and memory-efficient exact attention☆28Updated 2 weeks ago
- CUDA GPU Benchmark☆17Updated 4 months ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆56Updated 5 months ago
- Intel® SHMEM - Device initiated shared memory based communication library☆18Updated last week
- An Attention Superoptimizer☆20Updated 6 months ago
- ☆33Updated 2 months ago
- Benchmarks to capture important workloads.☆28Updated 5 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆96Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆57Updated 2 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆101Updated 2 weeks ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆35Updated 6 months ago
- An I/O benchmark for deep Learning applications☆67Updated 2 weeks ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago