OSU-Nowlab / Flover
A novel temporal fusion framework for propelling autoregressive model inference
☆11Updated this week
Alternatives and similar repositories for Flover:
Users that are interested in Flover are comparing it to the libraries listed below
- Intel® SHMEM - Device initiated shared memory based communication library☆22Updated 2 months ago
- A GPU-driven system framework for scalable AI applications☆111Updated last week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆79Updated this week
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated 3 months ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆38Updated this week
- ☆15Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆48Updated this week
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆45Updated 3 years ago
- Bandwidth test for ROCm☆53Updated this week
- Open source of an IBM Optimized version of the HPCG benchmark.☆14Updated 11 months ago
- ☆39Updated last week
- Random number library that generate pseudo-random and quasi-random numbers.☆25Updated this week
- oneAPI Level Zero Conformance & Performance test content☆48Updated this week
- A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.☆29Updated last week
- AMD’s C++ library for accelerating tensor primitives☆38Updated this week
- Sky Computing: Accelerating Geo-distributed Computing in Federated Learning☆90Updated 2 years ago
- MLPerf™ logging library☆32Updated 3 weeks ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆17Updated this week
- OpenSHMEM Implementation on MPI☆25Updated 4 months ago
- Python bindings for OpenSHMEM☆15Updated last month
- A Python library transfers PyTorch tensors between CPU and NVMe☆102Updated 2 months ago
- PyTorch distributed training acceleration framework☆39Updated last week
- A hierarchical collective communications library with portable optimizations☆26Updated last month
- CUDA Templates for Linear Algebra Subroutines☆12Updated this week
- ☆18Updated 3 weeks ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆21Updated last year
- Multi-GPU communication profiler and visualizer☆22Updated 7 months ago
- Slides and exercises for persistent memory programming tutorial☆12Updated 2 years ago
- ☆58Updated 8 months ago
- GVProf: A Value Profiler for GPU-based Clusters☆48Updated 10 months ago