abcdabcd987 / libfabric-efa-demo
☆19Updated 3 weeks ago
Alternatives and similar repositories for libfabric-efa-demo:
Users that are interested in libfabric-efa-demo are comparing it to the libraries listed below
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated 3 months ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆18Updated 3 years ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆49Updated 2 years ago
- Stateful LLM Serving☆44Updated 6 months ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 11 months ago
- A distributed KV store for disaggregated LLM inference☆25Updated this week
- ☆11Updated 3 years ago
- An Attention Superoptimizer☆21Updated last week
- ☆16Updated 2 years ago
- Thunder Research Group's Collective Communication Library☆31Updated 9 months ago
- Vector search with bounded performance.☆34Updated last year
- [OSDI 2024] Motor: Enabling Multi-Versioning for Distributed Transactions on Disaggregated Memory☆46Updated 10 months ago
- ☆38Updated 7 months ago
- Ensō is a high-performance streaming interface for NIC-application communication.☆70Updated 4 months ago
- PTX-EMU is a simple emulator for CUDA program.☆26Updated last year
- Selected Topics in Computer Networks @ Johns Hopkins University☆19Updated 4 years ago
- Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”☆37Updated last year
- SocksDirect code repository☆17Updated 2 years ago
- Demystifying Datapath Accelerator Enhanced Off-path SmartNIC [ICNP24]☆27Updated last month
- Implementation of the logging layer of our SOSP '23 paper Halfmoon☆11Updated last year
- ☆36Updated last month
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆22Updated 2 months ago
- SOTA Learning-augmented Systems☆34Updated 2 years ago
- Skyloft: A General High-Efficient Scheduling Framework in User Space (SOSP 2024)☆32Updated 4 months ago
- STREAMer: Benchmarking remote volatile and non-volatile memory bandwidth☆16Updated last year
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆90Updated 2 years ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆17Updated this week
- An IR for efficiently simulating distributed ML computation.☆25Updated last year
- This is the implementation repository of our SOSP'24 paper: Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value …☆16Updated 3 months ago