checkpoint-restore / criu-coordinatorLinks
A tool for coordinated checkpoint/restore of distributed applications with CRIU
☆31Updated 5 months ago
Alternatives and similar repositories for criu-coordinator
Users that are interested in criu-coordinator are comparing it to the libraries listed below
Sorting:
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆146Updated 10 months ago
- ☆38Updated 3 months ago
- An OS kernel module for fast **remote** fork using advanced datacenter networking (RDMA).☆70Updated 11 months ago
- Asynchronous Rust bindings for UCX☆78Updated 9 months ago
- ☆20Updated 7 months ago
- The official implementation of OSDI'25 paper BlitzScale☆39Updated 4 months ago
- Asynchronous Rust bindings for SPDK.☆17Updated 3 years ago
- FalconFS is a high-performance distributed file system (DFS) designed for AI workloads.☆54Updated last week
- ☆52Updated last year
- Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”☆46Updated 2 years ago
- A tool to detect infrastructure issues on cloud native AI systems☆52Updated 4 months ago
- Repository linking to the software artifacts used for the MigrOS ATC 2021 paper☆18Updated 4 years ago
- CUDA checkpoint and restore utility☆415Updated 4 months ago
- [NSDI '24] DINT: Fast In-Kernel Distributed Transactions with eBPF☆53Updated last year
- Lightning In-Memory Object Store☆47Updated 4 years ago
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆88Updated last year
- Enables building safer SPDK-based Rust applications☆85Updated last week
- ☆27Updated 2 years ago
- Cloud Native Benchmarking of Foundation Models☆45Updated 6 months ago
- Systematic and comprehensive benchmarks for LLM systems.☆50Updated 2 weeks ago
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆63Updated last month
- A file system over RDMA☆28Updated 3 years ago
- ☆18Updated 5 years ago
- ☆18Updated 2 years ago
- A user level library for applications to transparently use Intel DSA.☆42Updated 3 weeks ago
- Bindings for RDMA ibverbs through rdma-core☆197Updated last week
- GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)☆34Updated 2 years ago
- ☆11Updated 3 weeks ago
- Ths is a fast RDMA abstraction layer that works both in the kernel and user-space.☆59Updated last year
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆23Updated 4 months ago