MoFHeka / execution-ucxView external linksLinks
A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.
☆29Updated this week
Alternatives and similar repositories for execution-ucx
Users that are interested in execution-ucx are comparing it to the libraries listed below
Sorting:
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆37Feb 6, 2026Updated last week
- Will write CUDA for 100 days☆38May 25, 2025Updated 8 months ago
- All Resources from Stanford CS106B 2021☆23Jul 11, 2025Updated 7 months ago
- ☆51Apr 30, 2025Updated 9 months ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 4 months ago
- ☆20Sep 11, 2025Updated 5 months ago
- ☆105Sep 9, 2024Updated last year
- Search and launch your Playnite library.☆12Sep 12, 2022Updated 3 years ago
- ☆14Aug 8, 2016Updated 9 years ago
- paper and code for New Directions in Cloud Programming, CIDR 2021☆11Feb 17, 2021Updated 4 years ago
- DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery☆20Sep 24, 2025Updated 4 months ago
- Blogs that I'm actively following.☆13Sep 17, 2023Updated 2 years ago
- T22_034_han_shi_hao_CRDDC_2022_SourceCode☆11Dec 29, 2023Updated 2 years ago
- Halcyon Days is a modern and stylish HTML5/CSS3 template with a pixel-perfect design and smooth effects. It’s especially fitting for a po…☆10Feb 24, 2021Updated 4 years ago
- 使用torch.distributed实现DP/TP/PP☆12Dec 28, 2023Updated 2 years ago
- VSS: A Storage System for Video Analytics☆13Jul 9, 2021Updated 4 years ago
- GEMM☆10Aug 26, 2023Updated 2 years ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated this week
- 😎 Awesome papers on token redundancy reduction☆11Mar 12, 2025Updated 11 months ago
- ☆13Jul 23, 2025Updated 6 months ago
- Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.☆13Sep 29, 2024Updated last year
- ☆11Sep 21, 2022Updated 3 years ago
- Empowering LLM Agents for Real-World Computer System Optimization☆16Sep 10, 2025Updated 5 months ago
- ☆12Aug 31, 2023Updated 2 years ago
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Updated this week
- Fast and memory-efficient exact attention☆18Jan 23, 2026Updated 3 weeks ago
- DeepRec Extension is an easy-to-use, stable and efficient large-scale distributed training system based on DeepRec.☆12May 17, 2024Updated last year
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆26Jan 22, 2026Updated 3 weeks ago
- Awesome list for High Performance Computing / Parallel Computing resources.☆12Sep 20, 2017Updated 8 years ago
- Collected the world's best computer vision labs and lecture materials.☆14Feb 23, 2025Updated 11 months ago
- A Chinese Character BERT Trained with Multi-Level Masking☆11Sep 24, 2023Updated 2 years ago
- QnA是一个匿名的私人问答网站。QnA is an anonymous and private Q&A website.☆10Aug 13, 2020Updated 5 years ago
- TKDE-Towards Improving Embedding Based Models of Social Network Alignment via Pseudo Anchors☆14Aug 14, 2022Updated 3 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆17Updated this week
- ☆17Nov 22, 2025Updated 2 months ago
- Cute layout visualization☆30Jan 18, 2026Updated 3 weeks ago
- Computer Papers☆11Jun 15, 2025Updated 7 months ago
- GEMV implementation with CUTLASS☆19Aug 21, 2025Updated 5 months ago
- Wox plugin to search github repos, browse issues and PRs☆12Mar 28, 2021Updated 4 years ago