Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025
☆31Oct 22, 2025Updated 4 months ago
Alternatives and similar repositories for hoti-2025-gpu-comms-tutorial
Users that are interested in hoti-2025-gpu-comms-tutorial are comparing it to the libraries listed below
Sorting:
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆43Nov 19, 2025Updated 3 months ago
- ☆30Sep 13, 2025Updated 5 months ago
- A new query hardness measure for graph-based ANN indexes. Build unbiased workloads with this hardness to see the actual performance of yo…☆22Feb 7, 2025Updated last year
- DS SERVE: The Largest Open Vector Store over Pretain Data; A Framework for Efficient and Scalable Neural Retrieval☆45Jan 28, 2026Updated last month
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆52Updated this week
- ☆24Apr 4, 2024Updated last year
- Block-based Approximate Nearest Neighbor☆35Nov 1, 2021Updated 4 years ago
- ☆35Mar 7, 2025Updated 11 months ago
- ☆31Jan 22, 2025Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆163Feb 11, 2026Updated 2 weeks ago
- Ankama新游戏Waven Beta测试专用汉化启动器☆10Aug 20, 2023Updated 2 years ago
- This repo contains instructions, benchmarks, and files for running user space networking in gem5 simulator.☆11Aug 1, 2024Updated last year
- 软件学报LaTex模板,overleaf可用☆19Dec 30, 2024Updated last year
- 一起来数三角形吧!☆10Jun 27, 2024Updated last year
- A standalone CXL-enabled system simulator.☆18Jan 10, 2026Updated last month
- Advancing the 64-Bit Temple Operating System into the future.☆14Jan 8, 2019Updated 7 years ago
- Segmented Code Adjustment Quantization (SAQ)☆17Sep 22, 2025Updated 5 months ago
- ☆14Oct 30, 2024Updated last year
- A cross-modal vector index with fast construction on heterogeneous CPU-GPU environment. Published on DaMoN@SIGMOD 2025.☆16Jul 16, 2025Updated 7 months ago
- Open source simulator for porous media flow☆14Oct 15, 2022Updated 3 years ago
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- A demonstration of source code transformation to implement automatic differentiation, compatible with an operation overload style AD libr…☆13Jul 15, 2022Updated 3 years ago
- ☆115Jun 9, 2023Updated 2 years ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆466Dec 31, 2025Updated last month
- A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search☆21Jul 22, 2025Updated 7 months ago
- hints for xv6lab in installing and doing☆12Jan 28, 2021Updated 5 years ago
- RPG^2 is a pure-software system that operates on running C/C++ programs, profiling them, injecting prefetch instructions, and then tuning…☆12May 15, 2024Updated last year
- ☆19Jun 1, 2025Updated 8 months ago
- An Experimental Evaluation of the State-of-the-Art Graph-Based Vector Search Methods and Techniques☆19Jun 3, 2025Updated 8 months ago
- Code for the paper "Multi-Field Adaptive Retrieval," a research project on a semi-structured document retrieval☆16Feb 13, 2026Updated 2 weeks ago
- The official implementation of the W&B Models and Weave MCP server.☆33Updated this week
- The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) i…☆15Jan 17, 2024Updated 2 years ago
- An MLIR-based compiler that takes GPU kernels and compiles them to real hardware instructions. Interactive web visualizer included.☆49Updated this week
- Page of the course "Information Retrieval" at Department of Computer Science, University of Pisa☆20Dec 18, 2025Updated 2 months ago
- Vector Index Benchmark for Embeddings (VIBE) is an extensible benchmark for approximate nearest neighbor search methods, or vector index…☆35Dec 19, 2025Updated 2 months ago
- ☆24Oct 17, 2016Updated 9 years ago
- The Zaychik Power Controller server☆13Apr 13, 2024Updated last year
- My tests and experiments with some popular dl frameworks.☆17Sep 11, 2025Updated 5 months ago
- implement GPT-OSS 20B & 120B C++ inference from scratch on AMD GPUs☆169Oct 25, 2025Updated 4 months ago