OSU-Nowlab / Flover
A novel temporal fusion framework for propelling autoregressive model inference
☆11Updated this week
Alternatives and similar repositories for Flover:
Users that are interested in Flover are comparing it to the libraries listed below
- CUDA Templates for Linear Algebra Subroutines☆14Updated this week
- A GPU-driven system framework for scalable AI applications☆112Updated 3 weeks ago
- A TensorFlow Extension: GPU performance tools for TensorFlow.☆25Updated last year
- Intel® SHMEM - Device initiated shared memory based communication library☆23Updated 4 months ago
- PyTorch distributed training acceleration framework☆43Updated 2 weeks ago
- Fast GPU based tensor core reductions☆13Updated 2 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated last month
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆79Updated last week
- ☆45Updated this week
- ☆20Updated 2 weeks ago
- Python bindings for OpenSHMEM☆15Updated this week
- An IR for efficiently simulating distributed ML computation.☆28Updated last year
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆58Updated 10 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆105Updated 5 months ago
- Benchmarks to capture important workloads.☆29Updated last month
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆23Updated this week
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 4 months ago
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆32Updated last year
- An extension library of WMMA API (Tensor Core API)☆90Updated 7 months ago
- Data-Centric MLIR dialect☆40Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆78Updated 3 months ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆45Updated 3 years ago
- Sky Computing: Accelerating Geo-distributed Computing in Federated Learning☆90Updated 2 years ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆106Updated 3 months ago
- FTPipe and related pipeline model parallelism research.☆41Updated last year
- 🔮 Execution time predictions for deep neural network training iterations across different GPUs.☆60Updated 2 years ago
- ☆11Updated 3 years ago
- Microsoft Collective Communication Library☆62Updated 3 months ago
- ☆36Updated 2 months ago